# Mutual Fund Investment Analysis

## Python (Flask) App, majorly for Backend APIs for Individual Mutual Fund Investment Analysis

### Problem Statement

Over the years, investors have tried and tested various methodologies to keep a track of all investements. The various problems faced for the same are -

1. Multi-vendor - The Mutual Funds are invested via multiple vendors, like Groww, Paytm Money, etc., hence no unified interface to track all investments.
2. The historic data is not accurately predicted - Over time the historic data loses importance and is overriden to be fitted only in Time Series Graph.
3. Periodic Tracking of data - Everyday net change in the invested amount.
4. Personalized Prediction - Currently all Mutual Fund Predictions are not personalized, only based on overall NAV changes.

### Data collection, Storage and Analysis Blueprint

Data from differen apps will be collected in the following way -

1. Per day data of return is manually entered in Google Sheet.
2. The data from the Google Sheet is fetched in Python, and stored in a MySQL DB.
3. Data fetching happens everyday, at 11:00 a.m. (APScheduler)
4. Success or failure mails for everyday update is triggered based on the storage of data in the respective database.
5. APIs expose the various data, grouped by various factors to be used in the UI.
6. Once the stored data crosses a significant volume, this data is splitted into train and test data for future analysis.
7. The predicted data is again exposed over APIs, grouped by various factors.

## Connecting to Google Sheet

Our primary data source is google sheet, where the daily changes for all the mutual funds, are recorded.

To connect the Google Sheet, we perform the following -

1. Go to https://console.cloud.google.com/ and create a new Project.
2. In the created project, enable Google Drive API
3. Create credentials to access the Google Drive API.
4. Enable the Google Sheets API
5. Share the Google Sheet with the dev ID generated in the credential

### All imports at one place

In [43]:
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import pandas as pd
from datetime import datetime,timedelta
import smtplib
from email.message import EmailMessage
import json

In [44]:
#Defining the scope of the OAuth Authentication
scope = ["https://spreadsheets.google.com/feeds",'https://www.googleapis.com/auth/spreadsheets',"https://www.googleapis.com/auth/drive.file","https://www.googleapis.com/auth/drive"]
         
#Getting the credentials
creds = ServiceAccountCredentials.from_json_keyfile_name("D:/Codebase/Mutual_Fund_Analysis/Backend/mutual-fund-analysis/config/google_credentials.json", scope)
#Connecting to the Google Spreadsheet Client
client = gspread.authorize(creds)

#Getting the spreadsheet
sheet = client.open("Daily_MF_Returns").sheet1


### Getting all the data from the sheet

Explore the sheet data

In [45]:
list_data = sheet.get_all_records()

#Creating the dataframe
data = pd.DataFrame(list_data)

In [46]:
#Rows and Column Count for the data
data.shape

(80, 6)

In [47]:
#Column Names of the data
data.columns

Index(['Date', 'Policy Name', 'App', 'Investment', 'Return', 'Net Change'], dtype='object')

In [48]:
#explore the data
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 80 entries, 0 to 79
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Date         80 non-null     object
 1   Policy Name  80 non-null     object
 2   App          80 non-null     object
 3   Investment   80 non-null     int64 
 4   Return       80 non-null     object
 5   Net Change   80 non-null     object
dtypes: int64(1), object(5)
memory usage: 3.9+ KB


In [49]:
data.head(5)

Unnamed: 0,Date,Policy Name,App,Investment,Return,Net Change
0,11/19/2020,IDFC Low Duration,Paytm Money,1600,1664.92,64.92
1,11/19/2020,BOI AXA,Paytm Money,200,204.82,4.82
2,11/19/2020,SBI Short Term,Paytm Money,1000,1004.51,4.51
3,11/19/2020,Edelweiss Banking and PSU,Paytm Money,500,500.28,0.28
4,11/19/2020,HDFC Gold Direct,Paytm Money,500,482.76,-17.24


### Adding Formatted Date column to the dataframe

The new column contains date in format DD-MMM-YYYY to avoid ambiguity

In [50]:
# Coverting all the values in proper Datetime format

def validate(date_text):
    try:
        return datetime.strptime(date_text, '%m/%d/%Y').strftime("%d-%b-%Y")
    except ValueError:
        print(date_text)
        return date_text
    
data['date_formatted'] = data['Date'].apply(validate)

## Update the Excel for next week (repetitive)

### Get the last date, and all the records for last date
We'll update the data for next week (Tuesday - Saturday), every previous Saturday
To do this the following has to be calculated -

1. Index of the current pointer in the excel
2. The last date
3. All the records for the last date (fetch Date, Policy Name, App and Investment)

In [51]:
current_pointer = len(list_data)

#Getting the last date
last_date = data.values[current_pointer-1][0]

#Getting all the rows corresponding to the last date
data_filtered = data.loc[data['Date'] == last_date]

data_filtered = data_filtered.drop(['Return','Net Change', 'date_formatted'],axis=1)

# List of dictionaries
list_to_insert = data_filtered.values.tolist()

print(list_to_insert)

[['11/28/2020', 'IDFC Low Duration', 'Paytm Money', 1600], ['11/28/2020', 'BOI AXA', 'Paytm Money', 200], ['11/28/2020', 'SBI Short Term', 'Paytm Money', 1000], ['11/28/2020', 'Edelweiss Banking and PSU', 'Paytm Money', 500], ['11/28/2020', 'HDFC Gold Direct', 'Paytm Money', 500], ['11/28/2020', 'Nippon India Liquid Fund', 'Paytm Money', 800], ['11/28/2020', 'ICICI Prudential Regular Gold', 'Groww', 500], ['11/28/2020', 'Axis Midcap Direct', 'Groww', 500], ['11/28/2020', 'Axis Bluechip', 'Groww', 500], ['11/28/2020', 'SBI Magnum', 'Groww', 500]]


### Create the final list of data to be inserted

1. Start from 2 days from the date
2. Create for 5 days
3. Total # rows = Current # rows * 5

In [52]:
final_insert_list = []

for i in range(0,5):
    for each_row in list_to_insert:        
        final_insert_list.append(each_row)

# Yield successive n-sized 
# chunks from l. 
def divide_chunks(l, n):       
    # looping till length l 
    for i in range(0, len(l), n):  
        yield l[i:i + n]
        
days_chunk = list(divide_chunks(final_insert_list, len(list_to_insert)))

## Update Records and Send Mail

1. Update the next weeks data in the sheet
2. Send success and failure mails accordingly

In [53]:
def email_alert(subject, body, to):
    msg=EmailMessage()
    msg.set_content(body)
    msg['subject']=subject
    msg['to'] = to
    
    gmail_credentials_file = open("D:/Codebase/Mutual_Fund_Analysis/Backend/mutual-fund-analysis/config/gmail_credentials.json")
    gmail_credentials = json.load(gmail_credentials_file)
    
    sender = gmail_credentials['email']
    password = gmail_credentials['password']    
    
    msg['from'] = sender
    
    server = smtplib.SMTP("smtp.gmail.com", 587)
    server.starttls()
    server.login(sender,password)
    
    server.send_message(msg)
    server.quit()
    print("Email sent successfully");

success_insert = True
total_rows = data.shape[0] + 2 # 1st Row is heading

days_delta = 3

for each_chunk in days_chunk:
    if(success_insert):
        date_to_add = (datetime.strptime(last_date, '%m/%d/%Y') + timedelta(days=days_delta)).strftime('%m/%d/%Y')
        for value in each_chunk:
            value[0] = date_to_add
            try:
                sheet.insert_row(value, total_rows, 'RAW')
                print("Updated Row - " + str(total_rows))
                total_rows += 1                
            except:
                print("Error in Row - " + str(total_rows))
                success_insert = False
                break
        days_delta +=1
    else:
        break

start_date = (datetime.strptime(last_date, '%m/%d/%Y') + timedelta(days=3)).strftime('%d-/%b-%Y')

if(success_insert):
    body = '''
    Hi Apratim,
    
    The weekly scheduled insert of data is successful. Data is inserted for the next 5 days, starting from Tuesday - {var}.
    Please update the returns accordingly.
    
    Best Regards,
    Dev Team
    Mutual Fund Analysis App'''.format(var=start_date)
else:
    body = '''
    Hi Apratim,
    
    There was some issue in updating your data. Rest assured our team is working on it.
    Meanwhile please update the data and returns manually, starting from - {var}.
    Sorry for the incovenience caused.
    
    Best Regards,
    Dev Team
    Mutual Fund Analysis App'''.format(var=start_date)
    
subject = "Mutual Funds - Weekly Insertion of Base Data"
to = "apratimnath7@gmail.com";

email_alert(subject, body, to)

Updated Row - 82
Updated Row - 83
Updated Row - 84
Updated Row - 85
Updated Row - 86
Updated Row - 87
Updated Row - 88
Updated Row - 89
Updated Row - 90
Updated Row - 91
Updated Row - 92
Updated Row - 93
Updated Row - 94
Updated Row - 95
Updated Row - 96
Updated Row - 97
Updated Row - 98
Updated Row - 99
Updated Row - 100
Updated Row - 101
Updated Row - 102
Updated Row - 103
Updated Row - 104
Updated Row - 105
Updated Row - 106
Updated Row - 107
Updated Row - 108
Updated Row - 109
Updated Row - 110
Updated Row - 111
Updated Row - 112
Updated Row - 113
Updated Row - 114
Updated Row - 115
Updated Row - 116
Updated Row - 117
Updated Row - 118
Updated Row - 119
Updated Row - 120
Updated Row - 121
Updated Row - 122
Updated Row - 123
Updated Row - 124
Updated Row - 125
Updated Row - 126
Updated Row - 127
Updated Row - 128
Updated Row - 129
Updated Row - 130
Updated Row - 131
Email sent successfully
