# ADAPT Pro - Topic 2 - Financial Analysis with Python

**Before we get Started**

- materials also on github: https://github.com/TheMarqueeGroup/ADAPTPro-Topic2/
- run the getting started file:
    - launch jupyter: `go/jupyter`
    - `jupyter/notebooks/lob/core/ADAPT/GettingStarted.ipynb`
- demo codes located in: `jupyter/notebooks/lob/core/ADAPT_Pro/FinAnalysis/`


## Linear Regressions with Python

In [2]:
import pandas as pd #data manipulation
import matplotlib.pyplot as plt #viz package
import numpy as np #more advanced math/stats formulas (that pandas doesn't have)
import statsmodels.api as sm

### Linear Regression - Student Hours
- linea regression that predicts the exam score of a student based off the # of hours studies

In [20]:
hours = [0, 5, 10, 2, 6, 5, 15, 4]
scores = [25 + 5 * x for x in hours]
df = pd.DataFrame({'Hours':hours,'Scores':scores})
df

In [21]:
df.plot(x='Hours',y='Scores', kind='scatter')

**Steps to Running OLS Model (Linear Regression)**
- model = sm.OLS(y, x)
    - OLS = Ordinary Least Square
    - minimizies the error terms (y predict vs y actual)
        - takes the squares of errors, sum them up, and tries to minimize
- fit the model, model.fit() - this runs the permutations to find line of best fit
- start looking at the results, r2, p-values, coefficient, etc.
- predict new results

**Warning** - by default the OLS in statsmodels, assumes there is no y-intercept:
- if x = 0 --> y = 0
- if you want the model to have a y-intercept, provide a list of X's that are just 1s (constant)

In [22]:
# model = sm.OLS(scores, hours) #lists
model = sm.OLS(df['Scores'], df['Hours']) #pandas df col as inputs
        #y, x they can be numpy arrays, lists, pandas columns, etc.
results = model.fit()

In [24]:
results.summary()
# results.summary2()
    # y = 7.7262 * x

In [25]:
# df['Predict'] = 7.7262 * df['Hours'] #not ideal, don't copy paste the coeff
df['Predict'] = results.predict(df['Hours'])
df

In [26]:
results.predict(9.5)
results.predict([9.5, 5.5, 10.25])

In [28]:
#Look at results
plt.scatter(df['Hours'], df['Scores']) #actual - what really happened
plt.plot(df['Hours'], df['Predict'], 'red') #linear regression model
plt.show()

line equation:   y= mx + b
   - m = slope
   - b = intercept
   - to "force" the model to show an intercept, come up with a dummy X for the b
   
y = coeff1 * x1 + coeff0 * x0
- x0 = [1, 1,1 ,1 1, 1,1 ,1]
- x1 = student hours matrix

In [30]:
df['Constant'] = 1
model2 = sm.OLS(df['Scores'], df[['Constant','Hours']])
results2 = model2.fit()
results2.summary()

In [33]:
df['Predict2'] = results2.predict(df[['Constant','Hours']]) # 
df

In [35]:
results2.predict([1, 5])
results2.predict([[1, 5],[1,6],[1,7]])

### Linear Regression - Stock Beta Calc

capm
- rStock = = rf + beta * MRP
- MRP = returns of index - rf

simplify ---> rf = 0
- returns of Apple = beta * returns S&P

In [5]:
sp500 = pd.read_csv("ADAPT2021/StockData/SP500.csv",parse_dates=['Date'],index_col=['Date'])
aapl = pd.read_csv("ADAPT2021/StockData/AAPL.csv",parse_dates=['Date'],index_col=['Date'])

In [6]:
aapl['Returns'] = aapl['Close'].pct_change() #daily returns
sp500['Returns'] = sp500['Close'].pct_change()

In [7]:
mergedData = sp500.merge(aapl, how='inner',
                         left_index=True, right_index=True,
                         suffixes=("_SP","_AAPL"))  
mergedData.dropna(inplace=True)
    #same as: mergedData = mergedData.dropna()
mergedData

### Running the CAPM Model

In [8]:
model = sm.OLS(mergedData['Returns_AAPL'], mergedData['Returns_SP']) #no y-intercept

mergedData['Const'] = 1
model2 = sm.OLS(mergedData['Returns_AAPL'], mergedData[['Const','Returns_SP']]) #with y-intercept