### LSE Data Analytics Online Career Accelerator 
# Course 301: Data Analytics with Python

## Practical activity: Finding the return for S&P500 stocks

**This is the solution to the activity.**

As you have learned so far, CAPM describes the relationship between systematic risk and expected return for assets, primarily stocks. Using the CAPM, you can – given that investment’s characteristics, which are often values to plug into an equation or model – find the expected return of an investment. 

Yuki has started a position as a data analyst at a trusted investment bank. She’s been tasked with finding the expected returns of two of the top S&P 500 listed companies: Microsoft and Tesla. The bank's clientele want to know which of the two companies has stronger expected stock returns based on historical data, namely 2020 and 2021. 

In this activity you will use CAPM to help Yuki find the expected returns for Microsoft (MSFT) and Tesla Inc. (TSLA), based on data from 2020 and 2021. You will retrieve the historical data from Yahoo! Finance and find the variables for the CAPM equation, including:

- the current average excess annual return of the US stocks on the S&P 500 (SPY)
- the return on 10-year US Treasury bonds 
- the beta value for each stock. (Hint: You will need to find the first two values online.)

After calculating the CAPM of each stock, make a statement on which has better-expected returns and which has a lower level of volatility.

## 1. Prepare your workstation

In [None]:
#import all the necessary packages
import numpy as np
import pandas as pd
import statsmodels.api as sm
import sklearn
import matplotlib.pyplot as plt
import datetime
import time

from sklearn.linear_model import LinearRegression
from pandas_datareader import data as web

## 2. Set the start and end date

In [None]:
start = datetime.datetime(2020,1,1)
end = datetime.datetime(2022,1,1)

## 3.1 Pull data for Tesla from Yahoo! Finance and save as DataFrame

In [None]:
df_spy = web.DataReader('SPY','yahoo',start,end)

df_spy.head()

In [None]:
df_tesla = web.DataReader('TSLA','yahoo',start,end)

df_tesla.head()

## 3.2 Check relationships and cumulative returns

In [None]:
# plot the data (Close) for both stocks
df_tesla['Close'].plot(label = 'Tesla', figsize=(10,8))
df_spy['Close'].plot(label = 'SPY')
plt.legend()

In [None]:
df_tesla['Cumu'] = df_tesla['Close']/df_tesla['Close'].iloc[0] 
df_spy['Cumu'] = df_spy['Close']/df_spy['Close'].iloc[0]

df_tesla['Cumu'].plot(label = 'Tesla', figsize=(10,8))
df_spy['Cumu'].plot(label = 'SPY')
plt.legend()

In [None]:
df_tesla['daily_ret'] = df_tesla['Close'].pct_change(1)
df_spy['daily_ret'] = df_spy['Close'].pct_change(1)
plt.scatter(df_tesla['daily_ret'],df_spy['daily_ret'])

## 3.3 Drawing the trend line

In [None]:
daily_prices = pd.concat([df_tesla['Close'], df_spy['Close']], axis=1)
daily_prices.columns = ['Tesla', 'SPY']

print(daily_prices.head())

In [None]:
daily_returns = daily_prices.pct_change(1)
clean_daily_returns = daily_returns.dropna(axis=0) 
print(clean_daily_returns.head())

# 3.4 Linear regression analysis

In [None]:
X = clean_daily_returns['SPY'].values.reshape(-1, 1)
y = clean_daily_returns['Tesla'].values.reshape(-1, 1)

In [None]:
lr = LinearRegression()
lr.fit(X, y)

In [None]:
lr.coef_

In [None]:
lr.intercept_

## 3.5 Drawing the line of best fit

In [None]:
# x coef: 1.34497477
# Constant coef: 0.00489288
# Linear equation: 𝑦 = 𝑎𝑥 + 𝑏
y_pred =  lr.coef_ * X + lr.intercept_
y_pred

In [None]:
plt.scatter(X,y)
plt.plot(X, y_pred, color='red')

## 4 Pull data for Microsoft from Yahoo! Finance and save as DataFrame

In [None]:
df_msoft = web.DataReader('MSFT','yahoo',start,end)
df_msoft.head()

In [None]:
df_msoft['Close'].plot(label = 'Microsoft', figsize=(10,8))
df_spy['Close'].plot(label = 'SPY')
plt.legend()

In [None]:
df_msoft['Cumu'] = df_msoft['Close']/df_msoft['Close'].iloc[0] 
df_spy['Cumu'] = df_spy['Close']/df_spy['Close'].iloc[0]

df_msoft['Cumu'].plot(label = 'Tesla', figsize=(10,8))
df_spy['Cumu'].plot(label = 'SPY')
plt.legend()

In [None]:
df_msoft['daily_ret'] = df_msoft['Close'].pct_change(1)
df_spy['daily_ret'] = df_spy['Close'].pct_change(1)
plt.scatter(df_msoft['daily_ret'],df_spy['daily_ret'])

In [None]:
daily_prices = pd.concat([df_msoft['Close'], df_spy['Close']], axis=1)
daily_prices.columns = ['Microsoft', 'SPY']

print(daily_prices.head())

In [None]:
daily_returns = daily_prices.pct_change(1)
print(daily_returns.head())

In [None]:
daily_returns = daily_prices.pct_change(1)
clean_daily_returns = daily_returns.dropna(axis=0) 
print(clean_daily_returns.head())

In [None]:
X = clean_daily_returns['SPY'].values.reshape(-1, 1)
y = clean_daily_returns['Microsoft'].values.reshape(-1, 1)

In [None]:
lr = LinearRegression()
lr.fit(X, y)

In [None]:
lr.coef_

In [None]:
lr.intercept_

In [None]:
# x coef: 1.14398066
# Constant coef: 0.00069205
# Linear equation: 𝑦 = 𝑎𝑥 + 𝑏
y_pred =  lr.coef_ * X + lr.intercept_
y_pred

In [None]:
plt.scatter(X,y)
plt.plot(X, y_pred, color='red')