## Evaluating Allocation Funds

### Objective: Grow investment portfolio and find mutual funds with the best returns, lowest fees while reducing risk. 

It's important to consider returns and fees when selecting mutual funds because when we buy/sell. There will be some fees that we have to pay. It has been good practice to understand where that money goes and how much are we willing to pay when we sell our shares.

The scope of this report will cover mutual funds who are:
 - Actively Managed
 - Have 4 or 5 Stars in the Morningstar Rating
 - Categories such as:
     - Allocation Funds (Balanced Funds)

We are going to start with **Allocation Funds** (Balanced Funds).

- Allocation funds, also known as balanced funds, invest in a mix of asset classes, typically including stocks, bonds, and cash.

- Fund Purpose: To provide a diversified portfolio in a single investment.

- Risk/Return Profile: Varies based on the allocation strategy; can range from conservative (more bonds) to aggressive (more stocks).

### Data Dictionary:

Here is a dictionary with descriptions for each column in your dataset:

1. **Symbol**: The unique ticker symbol used to identify the mutual fund.

2. **Description**: A brief description of the mutual fund, including its primary investment strategy or objective.

3. **Product Type**: The category of the financial product, such as mutual fund, ETF, etc.

4. **OneSource List**: Indicates whether the fund is part of the Schwab OneSource list, which typically includes funds that are available without transaction fees.

5. **Transaction Fee**: The fee charged when buying or selling shares of the mutual fund.

6. **Select List**: Indicates if the fund is part of the Schwab Select List, a list of funds selected by Schwab based on specific criteria such as performance and fees.

7. **Schwab Social Funds List**: Indicates if the fund is part of Schwab's socially responsible investment (SRI) list, which includes funds that meet certain environmental, social, and governance (ESG) criteria.

8. **Net Expense Ratio**: The annual percentage fee that the fund charges its shareholders, net of any fee waivers or reimbursements.

9. **Gross Expense Ratio**: The annual percentage fee that the fund charges its shareholders, before any fee waivers or reimbursements.

10. **Inception Date**: The date on which the mutual fund was established.

11. **Monthly NAV Return (1 year)**: The fund's monthly return based on Net Asset Value (NAV) over the past year.

12. **Monthly NAV Return (5 year)**: The fund's monthly return based on NAV over the past five years.

13. **Monthly NAV Return (10 year)**: The fund's monthly return based on NAV over the past ten years.

14. **Monthly NAV Return (Since Inception)**: The fund's monthly return based on NAV since its inception.

15. **Monthly Market Return (1 year)**: The fund's monthly return based on market price over the past year.

16. **Monthly Market Return (5 year)**: The fund's monthly return based on market price over the past five years.

17. **Monthly Market Return (10 year)**: The fund's monthly return based on market price over the past ten years.

18. **Monthly Market Return (Since Inception)**: The fund's monthly return based on market price since its inception.

19. **Monthly Performance as of**: The date as of which the monthly performance metrics are reported.

20. **Quarterly NAV Return (1 year)**: The fund's quarterly return based on NAV over the past year.

21. **Quarterly NAV Return (5 year)**: The fund's quarterly return based on NAV over the past five years.

22. **Quarterly NAV Return (10 year)**: The fund's quarterly return based on NAV over the past ten years.

23. **Quarterly NAV Return (Since Inception)**: The fund's quarterly return based on NAV since its inception.

24. **Quarterly Market Return (1 year)**: The fund's quarterly return based on market price over the past year.

25. **Quarterly Market Return (5 year)**: The fund's quarterly return based on market price over the past five years.

26. **Quarterly Market Return (10 year)**: The fund's quarterly return based on market price over the past ten years.

27. **Quarterly Market Return (Since Inception)**: The fund's quarterly return based on market price since its inception.

28. **Quarterly Performance as of**: The date as of which the quarterly performance metrics are reported.

29. **Morningstar Category**: The category assigned to the fund by Morningstar based on its investment strategy and holdings.

30. **Morningstar Overall**: The overall rating assigned to the fund by Morningstar, based on its risk-adjusted performance.

31. **Morningstar 3 Year**: The three-year rating assigned to the fund by Morningstar, based on its risk-adjusted performance over the past three years.

32. **Morningstar 5 Year**: The five-year rating assigned to the fund by Morningstar, based on its risk-adjusted performance over the past five years.

33. **Morningstar 10 Year**: The ten-year rating assigned to the fund by Morningstar, based on its risk-adjusted performance over the past ten years.

34. **Ranking as of**: The date as of which the Morningstar rankings are reported.

This dictionary provides a clear understanding of each column in your dataset, which will help you analyze and interpret the data effectively.

In [9]:
import pandas as pd
import numpy as np
from sklearn.impute import KNNImputer

In [10]:
df = pd.read_csv("allocation_funds.csv")
df.head()

Unnamed: 0,Symbol,Description,Product Type,OneSource List,Transaction Fee,Select List,Schwab Social Funds List,Net Expense Ratio,Gross Expense Ratio,Inception Date,...,Quarterly Market Return (5 year),Quarterly Market Return (10 year),Quarterly Market Return (Since Inception),Quarterly Performance as of,Morningstar Category,Morningstar Overall,Morningstar 3 Year,Morningstar 5 Year,Morningstar 10 Year,Ranking as of
0,SWYDX,Schwab Target 2025 Index Fund,Mutual Fund,Mutual Fund OneSource,0.0,Mutual Fund OneSource Select List,No,0.0,0.0,8/25/2016,...,--,--,--,3/31/2024,Target-Date 2025,4 stars out of 197 funds,4 stars out of 197 funds,4 stars out of 172 funds,-- stars out of 110 funds,4/30/2024
1,SWYEX,Schwab Target 2030 Index Fund,Mutual Fund,Mutual Fund OneSource,0.0,Mutual Fund OneSource Select List,No,0.0,0.0,8/25/2016,...,--,--,--,3/31/2024,Target-Date 2030,4 stars out of 200 funds,4 stars out of 200 funds,4 stars out of 171 funds,-- stars out of 108 funds,4/30/2024
2,SWYFX,Schwab Target 2035 Index Fund,Mutual Fund,Mutual Fund OneSource,0.0,Mutual Fund OneSource Select List,No,0.0,0.0,8/25/2016,...,--,--,--,3/31/2024,Target-Date 2035,4 stars out of 191 funds,4 stars out of 191 funds,4 stars out of 170 funds,-- stars out of 108 funds,4/30/2024
3,SWYOX,Schwab Target 2065 Index Fund,Mutual Fund,Mutual Fund OneSource,0.0,Mutual Fund OneSource Select List,No,0.0,0.0,2/25/2021,...,--,--,--,3/31/2024,Target-Date 2065+,4 stars out of 138 funds,4 stars out of 138 funds,-- stars out of 9 funds,--,4/30/2024
4,SWYAX,Schwab Target 2010 Index Fund,Mutual Fund,Mutual Fund OneSource,0.0,Mutual Fund OneSource Select List,No,0.0,0.0,8/25/2016,...,--,--,--,3/31/2024,Target-Date 2000-2010,3 stars out of 107 funds,3 stars out of 107 funds,3 stars out of 100 funds,-- stars out of 51 funds,4/30/2024


# Step 1: Data Cleaning

- What are the data types
- How large is the dataset missing data?
- What values are for each column?

In [11]:
df.dtypes

Symbol                                        object
Description                                   object
Product Type                                  object
OneSource List                                object
Transaction Fee                              float64
Select List                                   object
Schwab Social Funds List                      object
Net Expense Ratio                            float64
Gross Expense Ratio                          float64
Inception Date                                object
Monthly NAV Return (1 year)                  float64
Monthly NAV Return (5 year)                   object
Monthly NAV Return (10 year)                  object
Monthly NAV Return (Since Inception)         float64
Monthly Market Return (1 year)                object
Monthly Market Return (5 year)                object
Monthly Market Return (10 year)               object
Monthly Market Return (Since Inception)       object
Monthly Performance as of                     

In [12]:
df.shape

(286, 34)

In [13]:
#Adding underscores to column titles
df.columns = df.columns.str.replace(' ', '_')

Keep these: 

Symbol (String)                                       object
Description (String)                                  object
Product_Type (String)                                object
OneSource_List (String)                              object
Transaction_Fee (String)                             object
Select_List (String)                                 object
Schwab_Social_Funds_List (Loop)                     object

Net_Expense_Ratio Float                           object
Gross_Expense_Ratio float                         object
Inception_Date  DATE                             object

Delete these columns:

Monthly_NAV_Return_(1_year)                  object
Monthly_NAV_Return_(5_year)                  object
Monthly_NAV_Return_(10_year)                 object
Monthly_NAV_Return_(Since_Inception)         object
Monthly_Market_Return_(1_year)               object
Monthly_Market_Return_(5_year)               object
Monthly_Market_Return_(10_year)              object
Monthly_Market_Return_(Since_Inception)      object

Keep columns:

Monthly_Performance_as_of                    object
Quarterly_NAV_Return_(1_year)                object
Quarterly_NAV_Return_(5_year)                object
Quarterly_NAV_Return_(10_year)               object
Quarterly_NAV_Return_(Since_Inception)       object

Delete these:

Quarterly_Market_Return_(1_year)             object
Quarterly_Market_Return_(5_year)             object
Quarterly_Market_Return_(10_year)            object
Quarterly_Market_Return_(Since_Inception)    object

Keep these columns:

Quarterly_Performance_as_of                  object
Morningstar_Category                         object
Morningstar_Overall                          object
Morningstar_3_Year                           object
Morningstar_5_Year                           object
Morningstar_10_Year                          object
Ranking_as_of                                object

Net_Expense_Ratio                         
Gross_Expense_Ratio                          
Quarterly_NAV_Return_(1_year)                
Quarterly_NAV_Return_(5_year)                
Quarterly_NAV_Return_(10_year)               
Quarterly_NAV_Return_(Since_Inception)       



In [14]:
#There are 34 columns thus far and we are going to delete the ones from above.

columns_to_drop = [                 
    'Monthly_NAV_Return_(Since_Inception)',         
    'Monthly_Market_Return_(1_year)',               
    'Monthly_Market_Return_(5_year)',               
    'Monthly_Market_Return_(10_year)',              
    'Monthly_Market_Return_(Since_Inception)',     
    'Quarterly_Market_Return_(1_year)',             
    'Quarterly_Market_Return_(5_year)',             
    'Quarterly_Market_Return_(10_year)',            
    'Quarterly_Market_Return_(Since_Inception)'
]

df = df.drop(columns=columns_to_drop)

In [15]:
df.shape

(286, 25)

## Let's work on validating missing values by running k nearest algorithm to predict values that have '--' on them.

In [16]:
# Replace zeros with NaN in the specified columns
columns_with_dashes = [
    'Monthly_NAV_Return_(1_year)',                  
    'Monthly_NAV_Return_(5_year)',                 
    'Monthly_NAV_Return_(10_year)',
    'Quarterly_NAV_Return_(10_year)',
    'Quarterly_NAV_Return_(5_year)',
    'Gross_Expense_Ratio',
]

# df[columns_with_zeros] = df[columns_with_zeros].replace(0, np.nan)
df[columns_with_dashes] = df[columns_with_dashes].replace('--', np.nan)

# Initialize KNN Imputer with a specified number of neighbors
knn_imputer = KNNImputer(n_neighbors=5)

# Impute the values in the specified columns
#df[columns_with_zeros] = knn_imputer.fit_transform(df[columns_with_zeros])
df[columns_with_dashes] = knn_imputer.fit_transform(df[columns_with_dashes])

# Check the results
#print(df[columns_with_zeros].head())
print(df[columns_with_dashes].head())


   Monthly_NAV_Return_(1_year)  Monthly_NAV_Return_(5_year)  \
0                         0.08                        0.050   
1                         0.10                        0.060   
2                         0.12                        0.070   
3                         0.15                        0.086   
4                         0.06                        0.040   

   Monthly_NAV_Return_(10_year)  Quarterly_NAV_Return_(10_year)  \
0                         0.050                           0.050   
1                         0.062                           0.068   
2                         0.064                           0.072   
3                         0.070                           0.072   
4                         0.042                           0.050   

   Quarterly_NAV_Return_(5_year)  Gross_Expense_Ratio  
0                          0.060                  0.0  
1                          0.080                  0.0  
2                          0.080                  

I want to see what happens when we interpolate the missing values.

In [17]:
# Assuming 'df' is your DataFrame
Net_Expense_Ratio = df['Net_Expense_Ratio'].value_counts().sort_index()
print("These are the values for Net_Expense_Ratio: \n", Net_Expense_Ratio)

These are the values for Net_Expense_Ratio: 
 0.00     32
0.01    237
0.02     14
0.03      3
Name: Net_Expense_Ratio, dtype: int64


In [18]:
# Assuming 'df' is your DataFrame
Gross_Expense_Ratio = df['Gross_Expense_Ratio'].value_counts().sort_index()
print("These are the values for Gross_Expense_Ratio: \n", Gross_Expense_Ratio)

These are the values for Gross_Expense_Ratio: 
 0.00     32
0.01    227
0.02     23
0.03      3
0.08      1
Name: Gross_Expense_Ratio, dtype: int64


In [19]:
#Quarterly_NAV_Return_(1_year)
Quarterly_NAV_Return_one_year = df['Quarterly_NAV_Return_(1_year)'].value_counts().sort_index()
print("These are the values for Gross_Expense_Ratio: \n", Quarterly_NAV_Return_one_year)

These are the values for Gross_Expense_Ratio: 
 0.03     1
0.04     1
0.05     2
0.06     3
0.07     9
0.08     6
0.09    15
0.10    32
0.11    20
0.12    17
0.13    15
0.14    16
0.15    14
0.16    19
0.17    17
0.18    12
0.19    14
0.20    12
0.21    14
0.22    16
0.23    17
0.25     4
0.28     2
0.29     2
0.30     3
0.31     1
0.46     1
0.64     1
Name: Quarterly_NAV_Return_(1_year), dtype: int64


In [20]:
# #Checking for 'Quarterly_NAV_Return_(10_year)' this is a value worth watching
Quarterly_NAV_Return_10_year = df['Quarterly_NAV_Return_(10_year)'].value_counts().sort_index()
Quarterly_NAV_Return_10_year
# # we have 42 numbers that need to be switched from -- to a number, 

0.030    11
0.036     2
0.036     1
0.040    31
0.042     1
0.046     2
0.050    50
0.052     2
0.060    35
0.062     3
0.068     1
0.070    52
0.072     3
0.074     1
0.076     1
0.078     1
0.078     1
0.080    45
0.082     2
0.082     2
0.086     1
0.088     3
0.090    28
0.092     4
0.100     2
0.110     1
Name: Quarterly_NAV_Return_(10_year), dtype: int64

In [21]:
Quarterly_NAV_Return_Since_Inception = df['Quarterly_NAV_Return_(Since_Inception)'].value_counts().sort_index()
Quarterly_NAV_Return_Since_Inception
#perfect, these are all numbers and we don't have any --

0.02     5
0.03     4
0.04    13
0.05    39
0.06    50
0.07    61
0.08    57
0.09    31
0.10    19
0.11     3
0.12     1
0.16     2
0.17     1
Name: Quarterly_NAV_Return_(Since_Inception), dtype: int64

In [22]:
#Converting these columns to float type
columns_to_convert = [
    'Net_Expense_Ratio',                         
    'Gross_Expense_Ratio',                          
    'Quarterly_NAV_Return_(1_year)',                
    'Quarterly_NAV_Return_(5_year)',                
    'Quarterly_NAV_Return_(10_year)',               
    'Quarterly_NAV_Return_(Since_Inception)'
]

# Converting the specified columns to float
df[columns_to_convert] = df[columns_to_convert].astype(float)

In [23]:
# Convert specified columns to datetime
df['Inception_Date'] = pd.to_datetime(df['Inception_Date'])
df['Monthly_Performance_as_of'] = pd.to_datetime(df['Monthly_Performance_as_of'])
df['Quarterly_Performance_as_of'] = pd.to_datetime(df['Quarterly_Performance_as_of'])
df['Ranking_as_of'] = pd.to_datetime(df['Ranking_as_of'])

# Display the DataFrame to check the changes
print("\nConverted Data Types:")
print(df.dtypes)


Converted Data Types:
Symbol                                            object
Description                                       object
Product_Type                                      object
OneSource_List                                    object
Transaction_Fee                                  float64
Select_List                                       object
Schwab_Social_Funds_List                          object
Net_Expense_Ratio                                float64
Gross_Expense_Ratio                              float64
Inception_Date                            datetime64[ns]
Monthly_NAV_Return_(1_year)                      float64
Monthly_NAV_Return_(5_year)                      float64
Monthly_NAV_Return_(10_year)                     float64
Monthly_Performance_as_of                 datetime64[ns]
Quarterly_NAV_Return_(1_year)                    float64
Quarterly_NAV_Return_(5_year)                    float64
Quarterly_NAV_Return_(10_year)                   float64
Quarterl

In [24]:
df.head()

Unnamed: 0,Symbol,Description,Product_Type,OneSource_List,Transaction_Fee,Select_List,Schwab_Social_Funds_List,Net_Expense_Ratio,Gross_Expense_Ratio,Inception_Date,...,Quarterly_NAV_Return_(5_year),Quarterly_NAV_Return_(10_year),Quarterly_NAV_Return_(Since_Inception),Quarterly_Performance_as_of,Morningstar_Category,Morningstar_Overall,Morningstar_3_Year,Morningstar_5_Year,Morningstar_10_Year,Ranking_as_of
0,SWYDX,Schwab Target 2025 Index Fund,Mutual Fund,Mutual Fund OneSource,0.0,Mutual Fund OneSource Select List,No,0.0,0.0,2016-08-25,...,0.06,0.05,0.07,2024-03-31,Target-Date 2025,4 stars out of 197 funds,4 stars out of 197 funds,4 stars out of 172 funds,-- stars out of 110 funds,2024-04-30
1,SWYEX,Schwab Target 2030 Index Fund,Mutual Fund,Mutual Fund OneSource,0.0,Mutual Fund OneSource Select List,No,0.0,0.0,2016-08-25,...,0.08,0.068,0.08,2024-03-31,Target-Date 2030,4 stars out of 200 funds,4 stars out of 200 funds,4 stars out of 171 funds,-- stars out of 108 funds,2024-04-30
2,SWYFX,Schwab Target 2035 Index Fund,Mutual Fund,Mutual Fund OneSource,0.0,Mutual Fund OneSource Select List,No,0.0,0.0,2016-08-25,...,0.08,0.072,0.08,2024-03-31,Target-Date 2035,4 stars out of 191 funds,4 stars out of 191 funds,4 stars out of 170 funds,-- stars out of 108 funds,2024-04-30
3,SWYOX,Schwab Target 2065 Index Fund,Mutual Fund,Mutual Fund OneSource,0.0,Mutual Fund OneSource Select List,No,0.0,0.0,2021-02-25,...,0.098,0.072,0.07,2024-03-31,Target-Date 2065+,4 stars out of 138 funds,4 stars out of 138 funds,-- stars out of 9 funds,--,2024-04-30
4,SWYAX,Schwab Target 2010 Index Fund,Mutual Fund,Mutual Fund OneSource,0.0,Mutual Fund OneSource Select List,No,0.0,0.0,2016-08-25,...,0.05,0.05,0.05,2024-03-31,Target-Date 2000-2010,3 stars out of 107 funds,3 stars out of 107 funds,3 stars out of 100 funds,-- stars out of 51 funds,2024-04-30


In [25]:
# Save the imputed data
#df.to_csv('imputed_knn_dataset.csv', index=False)

We are going to start our analysis with **Risk-Adjusted Return Metrics** 

**Sharpe Ratio:** This measures the fund’s excess return per unit of risk. It’s calculated by subtracting the risk-free rate from the fund’s return and then dividing by the standard deviation of the return. Higher Sharpe ratios indicate better risk-adjusted performance.

**Sortino Ratio:** Similar to the Sharpe ratio, but it only considers downside risk (negative returns). This can provide a more accurate measure of a fund’s performance relative to its risk of loss.

**Treynor Ratio:** This measures returns earned in excess of that which could have been earned on a risk-free investment per unit of market risk. It’s particularly useful for funds that are part of a diversified portfolio.