## PortfolioCrossSection_AG_profit_combi.ipynb

Code for the Chicago Booth course on Quantitative Portfolio Management by Ralph S.J. Koijen and Federico Mainardi.

### Preliminaries

This code builds cross-sectional portfolio strategies and produces relevant analytics.
- As always, the data can be found in the dropbox folder: https://www.dropbox.com/scl/fo/hrjspow2cpstfnoeqb23v/h?rlkey=j4fohf1s4e6fdy49p7bs71b7l&dl=0.
- Please download the file `MasterData_small.parquet`. 

In [None]:
import qpm_download
import qpm
import pandas as pd
import numpy as np
import wrds
import statsmodels.api as sm

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

We now choose whether to import data directly from WRDS (`import_data` = True) or to load data from dropbox (`import_data` = False). If you choose to load data from dropbox, make sure to define the data directoy (`_DATA_DIR`). Also, define a directory where you'd like to store the strategy returns (`_STRATEGY_DIR`). In future versions, we'll use a much larger data set, named `MasterData.parquet` (it is already in dropbox), but the core strategies will be constructed using `MasterData_small.parquet`. 

In [None]:
import_data = False             # <-- Edit this line
_DATA_DIR = '../Data'           # <-- Edit this line
_STRATEGY_DIR = '../Strategy'   # <-- Edit this line

Next, we specify the strategy settings.
- First, we select the strategy name. At this point, we have `Value` and `Size`. We'll add more later.
- We can sort every month (`Monthly`) or only in June (`June`). As accounting data mostly comes out quarterly or annually, sorting once a year is often sufficient. As most companies have their fiscal year end in December, and we wait six months to make sure the data are available to investors, we sort in June.
- We can set `_REMOVE_MICRO_CAPS` to either `False` or `True`. If `True`, then we drop the smallest stocks. 
- The next two lines select the sample. 
- The final line selects the number of portfolios. We typically set this to 5 or 10.

In [None]:
_STRATEGY_NAME = 'Momentum'      
_SORT_FREQUENCY = 'Monthly'        # Either "Monthly" or "June"

_REMOVE_MICRO_CAPS = False      # Either "True" or "False"
_SAMPLE_START = '2001-01-01'
_SAMPLE_END = '2023-07-31'
_NUM_PORT = 5

### Step 1. Construct Signal

Let's first see which variables are available in the data (running the next block requires you to download 'MasterData.parquet'. You can skip this line if you havem't done so):

In [None]:
qpm.list_variables(data_dir = _DATA_DIR, file_name = 'MasterData.parquet')

We will work initially with a smaller data set `MasterData_small.parquet`, which is less demanding in terms of your computer's memory.

In [None]:
qpm.list_variables(data_dir = _DATA_DIR, file_name = 'MasterData_small.parquet')

#### Load Data

Startegies differ by the signals that they use, and the signals use differ input data. Thus, we first construct the list of fundamentals that we need to load given the strategy that we specified in `_STRATEGY_NAME`. If you change the strategy, then you can change the list of variables you need here.

In [None]:
signal_variables = qpm.return_signal(_STRATEGY_NAME)
signal_variables

Given the list of fundamentals needed to construct the strategy in `_STRATEGY_NAME`, which is in `signal_variables`, we proceed by loading the relevant data.

In [None]:
if import_data == True:

    df_full = qpm_download.cross_section_compact(_SAMPLE_START, _SAMPLE_END, _STRATEGY_NAME, signal_variables)
    
if import_data == False:
    
    df_full = qpm.load_data(data_dir = _DATA_DIR, file_name = 'MasterData_small.parquet', variable_list = signal_variables)


#### Plot Key Variables

We first list the columns in our data

In [None]:
df_full.columns

Next, we plot a variable of interest, such as in this case a stock's market capitalization.

In [None]:
qpm.plot_variables(df_full, variable_list = ['me'],  id_type = 'ticker', id_list = ['AAPL', 'AMZN', 'TSLA'],
                     start_date = '1999-01-01', end_date = '2023-07-31')

#### Construct Signal

In [None]:
if _STRATEGY_NAME == 'Size':

	df_full['signal'] = - df_full['me']
	
elif _STRATEGY_NAME == 'Value':

	df_full['signal'] = df_full['be'] / df_full['me']	
		
elif _STRATEGY_NAME == 'AssetGrowth':

	df_full['signal'] = -df_full['at'] / qpm.create_lag(df_full, var_name = 'at', lag = 12)	

elif _STRATEGY_NAME == 'Quality':

	# Signal 1 : Rank low beta
	df_full['beta_inv'] = -df_full['beta']
	df_full['signal_1'] = qpm.rank(df_full, var_name = 'beta_inv')

	# Signal 2 : Rank profitability
	df_full['profitability'] = (df_full['revt'] - df_full['cogs']) / df_full['at']
	df_full['signal_2'] = qpm.rank(df_full, var_name = 'profitability')

	# Final Signal
	df_full['signal'] = (df_full['signal_1'] + df_full['signal_2']) / 2

elif _STRATEGY_NAME == 'Momentum':

	# Sort data
	df_full.sort_values(['permno', 'ldate'], ascending = [True, True], inplace = True)      

	# Check that we have 12 observations for a stock
	df_full['ldate12'] = df_full[['ldate','permno']].groupby('permno')['ldate'].shift(12)
	df_full['signal'] = (df_full['ldate'] == df_full['ldate12'] + pd.DateOffset(months = 12)).map(lambda x : 0 if x else np.nan)

	# In month t, add the returns from t-1, t-2, ..., t-12. Hence, we skip one month
	for i in range(1, 11 + 1):
		# Create a variable for each lag of the returns
		df_full['daret%d' %(i)] = df_full[['daret', 'permno']].groupby('permno')['daret'].shift(i)
		
		# Sum the returns over the last year
		df_full['signal'] += df_full['daret%d' %(i)].notnull() * np.log(1 + df_full['daret%d' %(i)]).fillna(0.0)
		
		# Drop the variable that we created in the previous step
		df_full.drop(columns = ['daret%d' %(i)], inplace = True)    
        
else:
	
	raise Exception('Please provide a valid _STRATEGY_NAME..')  
    
df_sum = df_full.sort_values(['ldate','ticker'])	
print(df_sum[['ldate','ticker','me','signal','daret']].loc[df_sum['ticker'].isin(['AAPL', 'AMZN', 'TSLA'])].tail(3))    

#### Sample Selection

We lag the signal by one month and select the relevant sample (i.e., the sample period and whether we include micro caps or not).

In [None]:
# Lag signal by one period so that the signal value is known at the time of portfolio creation
df_full['signal'] = qpm.create_lag(df_full, var_name = 'signal', lag = 1)

# Select the relevant sample
df = qpm.select_sample(df_full, sample_start = _SAMPLE_START, sample_end = _SAMPLE_END, remove_micro_caps = _REMOVE_MICRO_CAPS)

### Step 2. Portfolio Construction

Next, we sort the stocks into portfolios:
- retP_rank_longonly: Rank-based long-only portfolio
- retP_rank_longshort: Rank-based long-short portfolio
- retP_vw_P1, ..., retP_vw_P5: The returns on the 5 portfolios sorted by the signal (value or size) and weighted by market capitalization
- retF_vw: The return on the factor, which is retP_vw_P5-retP_vw_P1

In [None]:
df, df_rets = qpm.create_portfolios(df, sort_frequency = _SORT_FREQUENCY, num_port = _NUM_PORT)
print(df_rets.tail())

### Step 3. Portfolio Analytics

We first plot the average returns on the portfolios. Then, we plot the cumulative returns on various strategies. For the long-only strategy, we use the market as a simple benchmark. For the long-short strategies, we use the risk-free rate as a benchmark. Later, we will use regression analysis to properly correct for the factors.

In [None]:
qpm.analyze_strategy(df_rets, analysis_type = 'Performance')

In [None]:
qpm.analyze_strategy(df_rets, analysis_type = 'Summary')

In [None]:
qpm.analyze_strategy(df_rets, analysis_type = 'Factor Regression')

In [None]:
if _REMOVE_MICRO_CAPS:
    save_dir = '%s/StrategyReturns_%s_%s_noMicroCaps.csv' %(_STRATEGY_DIR, _STRATEGY_NAME, _SORT_FREQUENCY)
else:
    save_dir = '%s/StrategyReturns_%s_%s_withMicroCaps.csv' %(_STRATEGY_DIR, _STRATEGY_NAME, _SORT_FREQUENCY)

df_rets.to_csv(save_dir)
print('Saved Strategy Returns to %s' %(save_dir))