Skip to content

In this project I have performed analysis and prediction on 1,3,and 5 year returns on 1064 mutual funds in India. I have scraped data from a website which is the most visited website for mutual fund investments.I have tested regression models linear model,SGD Regressor , Random Forest Regressor,Decision Tree Regressor,Ridge,MLP Regressor and lin…

Notifications You must be signed in to change notification settings

srinivasRM/Mutual-funds-Analysis-and-prediction

Repository files navigation

Mutual Funds Analysis and Prediction

Hey guys! My name is R M Srinivas. In this project I have performed analysis and prediction on 1,3,and 5 year returns on 1064 mutual funds in India. I have scraped data from a website which is the most visited website for mutual fund investments.I have tested regression models linear model,SGD Regressor , Random Forest Regressor,Decision Tree Regressor,Ridge,MLP Regressor and linear model (Lasso).After which I have selected the best perorming model and performed Hyper parameter tuning and then deployed an interactive application which can generate the visualization and send an email with the visualization to the users email address.

Here is a gif of the application 📹

Animation

ETL

Extraction(https://github.com/srinivasRM/Mutual-funds-Analysis-and-prediction/blob/main/scraper%20and%20extraction.py): - In the current file I used Beautiful soup to extract data from the most visited site to study/analyse/invest into Mutual funds. Extracted 20 columns from the website with 1064 mutual funds. I tried extract in a way such that there should not be much data cleaning afterwards. After which I saved the file as raw_data.xlsx.

Transform(https://github.com/srinivasRM/Mutual-funds-Analysis-and-prediction/blob/main/Tranformation.py): - As the data did not have much steps to clean I have cleaned the raw data that was taken in the above step and removed few columns that had more than 30% missing values(np.nan). Changed the column with the funds AUM in cr to float. Saved the file as cleaned_data.xlsx

Load(https://github.com/srinivasRM/Mutual-funds-Analysis-and-prediction/blob/main/data_storage.py): - In this file I loaded/uploaded the data to Heroku server for storage of the data and I used Postgresql to send and save the data. Evertime the current file runs it takes the updated data drops the existing column if it exists and then add the updated table/data to the server.

EDA(Exploratory data analysis)

Go through the following links for individual ipynb files.

5 year retutns models testing - >

https://github.com/srinivasRM/Mutual-funds-Analysis-and-prediction/blob/main/Model%20testing%20for%205%20year%20analysis.ipynb

3 year retutns models testing - >

https://github.com/srinivasRM/Mutual-funds-Analysis-and-prediction/blob/main/Model%20testing%20for%203%20year%20analysis.ipynb

1 year retutns models testing - >

https://github.com/srinivasRM/Mutual-funds-Analysis-and-prediction/blob/main/Model%20testing%20for%201%20year%20analysis.ipynb

Let's first understand the relation of our target variable(returns over the perior of 1,3 and 5 years) with the remaining variables. Let's first understand some basic definitions. AUM or Assets Under Management is the total funds that a mutual fund scheme holds.

What does NAV mean? The performance of a particular scheme of a Mutual Fund is denoted by Net Asset Value (NAV). In simple words, NAV is the market value of the securities held by the scheme. Mutual Funds invest the money collected from investors in securities markets.

Risk of the fund Mutual Fund Schemes are not guaranteed or assured return products. Investment in Mutual Fund Units involves investment risks such as trading volumes, settlement risk, liquidity risk, default risk including the possible loss of principal all of this is considered and rated accordingly.

Minium Investment Its the minimum amount limit for investing in a mutual fund.

Type of the fund. There are different funds based on there diversification in the investments they are classified. Equity fund, Debt fund , hybrid fund, Solution based funds, etc...

Outlier analysis and treatment.

Here are few basic information regarding the columns using describe function. We can see that there outliers in few columns lets go ahead and investigate those columns and treat them.

5 YEAR describe

Here is box plot and dist plot of the AUM column before outlier treatment.

box plot of aum 5 year

Here is box plot and dist plot of the debt percentage column before outlier treatment.

box plot and dist plot dep_percentage 5 year

Here is box plot and dist plot of the 5 year returns column before outlier treatment.

box plot for 5 year returns

Here is box plot and dist plot of the equity percentage column before outlier treatment.

box plot for equity percentage

Here is box plot and dist plot of the 3 year returns before outlier treatment.

box plot for 3 year returns

Here is box plot and dist plot of the 3 year returns before outlier treatment.

after treatment of outlieres aum

Treatment of outliers

Tried removing the values greater than 0.85 with mean, median and normalized each column and compared the results which I have documented as a in bottom section of the table.

Here is box plot and dist plot of the AUM column after outlier treatment.

after treatment 2

Here is box plot and dist plot of the debt percentage column after outlier treatment.

after treatment

Here is box plot and dist plot of the 5 year returns column before after treatment.

after treatment 3

Here is box plot and dist plot of the equity percentage column after outlier treatment.

after treatment 4

Here is box plot and dist plot of the 3 year returns after outlier treatment.

after treatment 5

Here is the correlation matrix of the data after outlier treatment

correlation matrix 5 year

Here is an table which shows us testing scores of various models on the 5 Year returns target variable.

5 year outlier table

Here is an table which shows us testing scores of various models on the 3 Year returns target variable.

3 year outlier anyaluis results

Here is an table which shows us testing scores of various models on the 1 Year returns target variable.

1 year outlier anaylis and report

In of the above images for 1,3,5 retunrns model testing, the best model according to the scored obtained is the random forst regressor, and performed Hyper parameter tuning individually for the best results. After pickled the models for running it in the Deployment phase of the project.

Here is the final graphs Individually after hyper parameter optimization and feature importance graph.

Graphs for 1 year precitions

1 year model after hyper parameter

feature importance one year

Graphs for 3 year precitions

3 year predictions

feature importance 3 year

Graphs for 5 year precitions

5 year predictions

5 year returnis impportance

Using Streamlit Created the following application Animation

The above application has a sidebar that can be accessed for moving through the 5 different pages. Deinition page has the basic information about the various fund related information. After which there are series of 3 pages which can predict the returns based on inputs provided. In the back end after opening each page the respective models saved in pickle format is opened and the user inputs are normalized and converted for getting the prediction. The last page will have all the visualization and analysis with description. Created a requirements.txt for future deployment of the project onto a AWS or Heroku Cloud.

Let me know if you have any suggestions. You can contact me on this email - rmsrinivas199627@gmail.com

About

In this project I have performed analysis and prediction on 1,3,and 5 year returns on 1064 mutual funds in India. I have scraped data from a website which is the most visited website for mutual fund investments.I have tested regression models linear model,SGD Regressor , Random Forest Regressor,Decision Tree Regressor,Ridge,MLP Regressor and lin…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published