**Part 2: Create a stock-picking portfolio based on fundamental stock characteristics**
    
The Excel sheet contains 13 time series of fundamental stock characteristics (“factors”), such as Momentum, Valuation, Profitability, Risk. It also contains 2 time series of monthly returns, one in EUR and one in the local currency of each stock.

The objective is to build a predictive model that creates a portfolio at the end of each month (i.e. a vector of weights) based on the factor values known at that point in time. The model should maximize the time series hit rate of local currency returns, i.e. the proportion of months where the portfolio’s local currency return is higher than the mean local currency return of the (non-delisted) stocks. Transaction costs can be ignored. Your model may be evaluated on a different list of stocks but with the same factors and dates, so make sure your code is flexible enough to handle a different number of stocks.

Your report should contain a clear description of your model and document its performance by simulating the implied portfolio and monthly return series. You can do all transformations to the factors (normalizing, filing missing data, etc.) that you think make sense. Discuss the model you built in a few sentences. Does it work well? What are its weaknesses? What would you change if you had more time? Do you think it will work well on a different set of stocks? Use figures or tables to illustrate your findings.

Remember that the goal is to show that you know how to work with Python, so the focus is more on the quality of the code than on the model.

In [None]:
import pandas as pd

In [2]:
# Please select the tab you would like to be considered for the ideal portfolio. This data will be loaded from 
# provided xlsx
tab = 'F15'  #since goal is to maximize local currency returns

# Please select number of stocks your portfolio should contain. Default is 100
number_of_stocks = 100

# Please select a point in time that you want the portfolio to be calculated for. Choose format 'YYYY-MM-DD'. For an
# overall portfolio considering the entire timeframe leave it as None

date_of_portfolio = '2004-12-31'

In [None]:
def load_returns(sheet_name = 'datenaufgabe.xlsx', tab = 'Stocks'):
    """[Returns Dataframe of specified tab in xlsx sheet]
    
    Arguments:
        sheet_name {[string]} -- [Name of sheet or file path to sheet. Needs to be xlsx passed as string. finalproject by default]
        tab {[string]} -- [Respective tab within selected sheet. 'Stocks' as default tab. Parse string]
    
    Returns:
        [Returns dataframe containing dataframe of selected tab in xlsx file]
    """
    
    return pd.read_excel(sheet_name, tab)
    
data = load_returns(tab = tab)

In [None]:
def load_list_of_dates(raw_return_df, date_of_portfolio):
    
    list_of_dates = list(raw_return_df[raw_return_df[tab] <= date_of_portfolio][tab])
    
    return list_of_dates

In [None]:
def calculate_mean_and_std(raw_return_df, date_of_portfolio):
    """[Takes dataframe and returns mean and standard deviation for each column]
    
    Arguments:
        raw_return_df {[dataframe]} -- [Dataframe containing stocks (columns) over time (rows)]
    
    Returns:
        [iso_metrics] -- [Returns dataframe containing avg return (mean) and standard deviation (std) per stock as dataframe]
    """
        
    iso_metrics_mean_std_df = pd.DataFrame({
                        'avg_return': raw_return_df.mean(axis=0),  
                        'std': raw_return_df.std(axis=0)
                        })
    
    return iso_metrics_mean_std_df


In [None]:
def get_best_performers_mean_std(raw_stocks_dataframe, number_of_stocks = 100, date_of_portfolio = None):
    
    mean_variance_stocks_dataframe = calculate_mean_and_std(raw_stocks_dataframe, date_of_portfolio)
    
    return mean_variance_stocks_dataframe.nlargest(number_of_stocks, 'avg_return')


In [None]:
highest_avg_return_df = get_best_performers_mean_std(data, date_of_portfolio = date_of_portfolio)

In [None]:
#highest overall average performance of stocks without for the whole timeperiod
highest_avg_return_df

In [None]:
# Hier sind die 100 besten Stocks pro Monat. Müsste man jetzt noch mal einen count über die häufigkeit machen (empfehle hierzu ein dictionary)
# pick stocks and weight them higher based on a count how often the stocks have been over the mean (looking back 3 months with top 100 performance and in local currency compared to the mean of local currency)
# at the end some tables and figure of the portfolio, maybe compare it to a better performing portfolio and a worse perf. Portfolio
x = data.set_index(tab).T

for column in x:
    print(x[column].nlargest(100).index)

In [None]:
highest_avg_return_df.plot()