# **TSLA Stock Data Pre-Processing**

In [1]:
!pip install ta




[notice] A new release of pip available: 22.2.2 -> 23.2.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
import warnings
warnings.filterwarnings("ignore")

import ta
import numpy as np
import pandas as pd
import yfinance as yf
import seaborn as sns
import statsmodels as smt
import statsmodels.api as sm
from datetime import datetime
import matplotlib.pyplot as plt
import plotly.graph_objects as go

### Downloading TSLA Stock data from Yahoo Finance package

Yahoo Finance is a widely used platform for accessing financial information, including stock market data, company profiles, historical stock prices, and more. To facilitate easy access to this data, there is a Python package called "yfinance" (short for Yahoo Finance) that provides a simple and convenient way to interact with Yahoo Finance's API.

The yfinance package allows users to fetch financial data for stocks, exchange-traded funds (ETFs), indices, and more. It provides functions to download historical stock data, real-time stock quotes, company information, and financial statements. The package is built on top of the popular data manipulation library, pandas, making it easy to integrate with existing data analysis workflows.

With yfinance, users can specify the desired stock ticker symbols, specify a date range, and retrieve a wide range of financial data. This includes daily historical prices, dividend information, stock splits, and more. The package also supports the ability to download data for multiple symbols at once, making it efficient for analyzing and comparing multiple stocks or assets.

In addition to historical data, yfinance allows users to obtain real-time stock quotes, including the current price, volume, market capitalization, and other relevant metrics. This can be useful for monitoring real-time market activity or building trading algorithms.

Overall, the yfinance package provides a convenient and efficient way to access and analyze financial data from Yahoo Finance using Python. It is a valuable tool for financial analysts, researchers, and developers who need to retrieve and analyze stock market data for various purposes.

In [4]:
start_date = datetime(2017,1,1)
end_date = datetime(2023,4,28)

TSLA = yf.download('TSLA',start_date ,end_date)
TSLA.reset_index(inplace=True)

[*********************100%%**********************]  1 of 1 completed


In [5]:
TSLA

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2017-01-03,14.324000,14.688667,14.064000,14.466000,14.466000,88849500
1,2017-01-04,14.316667,15.200000,14.287333,15.132667,15.132667,168202500
2,2017-01-05,15.094667,15.165333,14.796667,15.116667,15.116667,88675500
3,2017-01-06,15.128667,15.354000,15.030000,15.267333,15.267333,82918500
4,2017-01-09,15.264667,15.461333,15.200000,15.418667,15.418667,59692500
...,...,...,...,...,...,...,...
1585,2023-04-21,164.800003,166.000000,161.320007,165.080002,165.080002,123539000
1586,2023-04-24,164.649994,165.649994,158.610001,162.550003,162.550003,140006600
1587,2023-04-25,159.820007,163.470001,158.750000,160.669998,160.669998,121999300
1588,2023-04-26,160.289993,160.669998,153.139999,153.750000,153.750000,153364100


### Importing F-F Research Data Factors dataset

The F-F Research Data Factors daily dataset, also known as the Fama-French factors dataset, is a widely used financial dataset that provides daily risk factors for asset pricing models. It is named after its creators, Eugene Fama and Kenneth French, who developed the dataset to study and analyze stock returns.

The dataset includes several factors that are commonly used in empirical finance research. These factors include the market risk premium (Mkt-RF), which represents the excess return of the overall market over the risk-free rate, and three additional factors: the size premium (SMB), the value premium (HML), and the risk-free rate (RF).

The size premium (SMB) captures the excess return of small-cap stocks over large-cap stocks, while the value premium (HML) represents the excess return of value stocks over growth stocks. The risk-free rate (RF) is the return on a risk-free investment, such as Treasury bills.

Researchers and analysts often use the F-F Research Data Factors daily dataset to investigate the relationship between these factors and stock returns. By incorporating these factors into asset pricing models, researchers can assess the performance and risk characteristics of various investment strategies and study the impact of different market factors on stock returns.

The dataset is typically provided in a tabular format, such as a CSV file, allowing for easy integration into data analysis workflows using tools like Python's pandas library. Analysts can load the dataset, explore the factors' time series, and conduct various statistical analyses and modeling techniques to gain insights into the relationship between these factors and asset returns.

Overall, the F-F Research Data Factors daily dataset is a valuable resource for researchers, academics, and financial professionals interested in studying asset pricing models, evaluating investment strategies, and analyzing stock market data.

In [6]:
df_ff_features = pd.read_csv("F-F_Research_Data_Factors_daily.CSV", skiprows=4)
df_ff_features.rename(columns = {'Unnamed: 0':'Date'},inplace=True)

In [7]:
df_ff_features.tail()

Unnamed: 0,Date,Mkt-RF,SMB,HML,RF
25476,20230425,-1.76,-0.99,0.1,0.018
25477,20230426,-0.41,0.15,-0.75,0.018
25478,20230427,1.85,-0.56,0.0,0.018
25479,20230428,0.77,0.14,0.17,0.018
25480,Copyright 2023 Kenneth R. French,,,,


In [8]:
df_ff_features.reset_index(drop = True, inplace= True)
df_ff_features = df_ff_features[(df_ff_features.Date > '20170101') & (df_ff_features.Date <= '20230427')]
df_ff_features['Date']=pd.to_datetime(df_ff_features['Date'])
df_ff_features

Unnamed: 0,Date,Mkt-RF,SMB,HML,RF
23889,2017-01-03,0.83,-0.13,0.05,0.002
23890,2017-01-04,0.79,0.95,-0.16,0.002
23891,2017-01-05,-0.21,-0.88,-0.79,0.002
23892,2017-01-06,0.29,-0.66,-0.31,0.002
23893,2017-01-09,-0.37,-0.29,-1.04,0.002
...,...,...,...,...,...
25474,2023-04-21,0.07,0.26,-0.92,0.018
25475,2023-04-24,0.00,-0.40,0.47,0.018
25476,2023-04-25,-1.76,-0.99,0.10,0.018
25477,2023-04-26,-0.41,0.15,-0.75,0.018


In [9]:
TSLA_Final = TSLA.merge(df_ff_features, how='left', on='Date')

In [10]:
TSLA_Final

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Mkt-RF,SMB,HML,RF
0,2017-01-03,14.324000,14.688667,14.064000,14.466000,14.466000,88849500,0.83,-0.13,0.05,0.002
1,2017-01-04,14.316667,15.200000,14.287333,15.132667,15.132667,168202500,0.79,0.95,-0.16,0.002
2,2017-01-05,15.094667,15.165333,14.796667,15.116667,15.116667,88675500,-0.21,-0.88,-0.79,0.002
3,2017-01-06,15.128667,15.354000,15.030000,15.267333,15.267333,82918500,0.29,-0.66,-0.31,0.002
4,2017-01-09,15.264667,15.461333,15.200000,15.418667,15.418667,59692500,-0.37,-0.29,-1.04,0.002
...,...,...,...,...,...,...,...,...,...,...,...
1585,2023-04-21,164.800003,166.000000,161.320007,165.080002,165.080002,123539000,0.07,0.26,-0.92,0.018
1586,2023-04-24,164.649994,165.649994,158.610001,162.550003,162.550003,140006600,0.00,-0.40,0.47,0.018
1587,2023-04-25,159.820007,163.470001,158.750000,160.669998,160.669998,121999300,-1.76,-0.99,0.10,0.018
1588,2023-04-26,160.289993,160.669998,153.139999,153.750000,153.750000,153364100,-0.41,0.15,-0.75,0.018


In [11]:
# Yesterday's Close Price
TSLA_Final['Yest_Close']  = TSLA_Final['Adj Close'].shift(1)

In [12]:
# Tesla Stock Returns
TSLA_Final['stock_return'] = (TSLA_Final['Adj Close']-TSLA_Final['Yest_Close'])/(TSLA_Final['Yest_Close'])

In [13]:
TSLA_Final

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Mkt-RF,SMB,HML,RF,Yest_Close,stock_return
0,2017-01-03,14.324000,14.688667,14.064000,14.466000,14.466000,88849500,0.83,-0.13,0.05,0.002,,
1,2017-01-04,14.316667,15.200000,14.287333,15.132667,15.132667,168202500,0.79,0.95,-0.16,0.002,14.466000,0.046085
2,2017-01-05,15.094667,15.165333,14.796667,15.116667,15.116667,88675500,-0.21,-0.88,-0.79,0.002,15.132667,-0.001057
3,2017-01-06,15.128667,15.354000,15.030000,15.267333,15.267333,82918500,0.29,-0.66,-0.31,0.002,15.116667,0.009967
4,2017-01-09,15.264667,15.461333,15.200000,15.418667,15.418667,59692500,-0.37,-0.29,-1.04,0.002,15.267333,0.009912
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1585,2023-04-21,164.800003,166.000000,161.320007,165.080002,165.080002,123539000,0.07,0.26,-0.92,0.018,162.990005,0.012823
1586,2023-04-24,164.649994,165.649994,158.610001,162.550003,162.550003,140006600,0.00,-0.40,0.47,0.018,165.080002,-0.015326
1587,2023-04-25,159.820007,163.470001,158.750000,160.669998,160.669998,121999300,-1.76,-0.99,0.10,0.018,162.550003,-0.011566
1588,2023-04-26,160.289993,160.669998,153.139999,153.750000,153.750000,153364100,-0.41,0.15,-0.75,0.018,160.669998,-0.043070


In [14]:
TSLA_Final = TSLA_Final[TSLA_Final['Yest_Close'].notna()]
TSLA_Final = TSLA_Final[TSLA_Final['Mkt-RF'].notna()]
TSLA_Final = TSLA_Final[TSLA_Final['SMB'].notna()]
TSLA_Final = TSLA_Final[TSLA_Final['HML'].notna()]
TSLA_Final = TSLA_Final[TSLA_Final['RF'].notna()]

### Adding Extra Additional Stock Features of Moving Averages, MACD & RSI

In [15]:
# Calculate additional features
TSLA_Final["SMA_20"] = ta.trend.sma_indicator(TSLA_Final["Close"], window=20, fillna = True)
TSLA_Final["SMA_50"] = ta.trend.sma_indicator(TSLA_Final["Close"], window=50, fillna = True)

TSLA_Final["EMA_12"] = ta.trend.ema_indicator(TSLA_Final["Close"], window=12, fillna = True)
TSLA_Final["EMA_26"] = ta.trend.ema_indicator(TSLA_Final["Close"], window=26, fillna = True)
TSLA_Final["MACD"] = ta.trend.macd(TSLA_Final["Close"], fillna = True)

TSLA_Final["RSI"] = ta.momentum.rsi(TSLA_Final["Close"], fillna = True)

In [17]:
TSLA_Final.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Mkt-RF,SMB,HML,RF,Yest_Close,stock_return,SMA_20,SMA_50,EMA_12,EMA_26,MACD,RSI
1,2017-01-04,14.316667,15.2,14.287333,15.132667,15.132667,168202500,0.79,0.95,-0.16,0.002,14.466,0.046085,15.132667,15.132667,15.132667,15.132667,0.0,100.0
2,2017-01-05,15.094667,15.165333,14.796667,15.116667,15.116667,88675500,-0.21,-0.88,-0.79,0.002,15.132667,-0.001057,15.124667,15.124667,15.130205,15.131481,-0.001276,0.0
3,2017-01-06,15.128667,15.354,15.03,15.267333,15.267333,82918500,0.29,-0.66,-0.31,0.002,15.116667,0.009967,15.172222,15.172222,15.151302,15.141545,0.009757,91.024248
4,2017-01-09,15.264667,15.461333,15.2,15.418667,15.418667,59692500,-0.37,-0.29,-1.04,0.002,15.267333,0.009912,15.233833,15.233833,15.192435,15.162072,0.030363,95.477309
5,2017-01-10,15.466667,15.466667,15.126,15.324667,15.324667,54900000,0.16,0.88,0.44,0.002,15.418667,-0.006096,15.252,15.252,15.212778,15.174116,0.038662,71.686831


### Importing ADS Index

In [19]:
ads_df = pd.read_excel('ADS_Index_Most_Current_Vintage.xlsx')

In [20]:
ads_df.tail()

Unnamed: 0.1,Unnamed: 0,ADS_Index
23114,2023-06-13,-0.134883
23115,2023-06-14,-0.132031
23116,2023-06-15,-0.129347
23117,2023-06-16,-0.126831
23118,2023-06-17,-0.124481


In [21]:
ads_df.rename(columns = {'Unnamed: 0':'Date'},inplace=True)
ads_df['Date'] = pd.to_datetime(ads_df['Date'])

In [22]:
TSLA_Final = TSLA_Final.merge(ads_df, how='left', on='Date')

In [23]:
TSLA_Final.to_csv('TSLA_Final.csv', index=False)

In [24]:
TSLA_Final

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Mkt-RF,SMB,HML,RF,Yest_Close,stock_return,SMA_20,SMA_50,EMA_12,EMA_26,MACD,RSI,ADS_Index
0,2017-01-04,14.316667,15.200000,14.287333,15.132667,15.132667,168202500,0.79,0.95,-0.16,0.002,14.466000,0.046085,15.132667,15.132667,15.132667,15.132667,0.000000,100.000000,0.147667
1,2017-01-05,15.094667,15.165333,14.796667,15.116667,15.116667,88675500,-0.21,-0.88,-0.79,0.002,15.132667,-0.001057,15.124667,15.124667,15.130205,15.131481,-0.001276,0.000000,0.127719
2,2017-01-06,15.128667,15.354000,15.030000,15.267333,15.267333,82918500,0.29,-0.66,-0.31,0.002,15.116667,0.009967,15.172222,15.172222,15.151302,15.141545,0.009757,91.024248,0.108194
3,2017-01-09,15.264667,15.461333,15.200000,15.418667,15.418667,59692500,-0.37,-0.29,-1.04,0.002,15.267333,0.009912,15.233833,15.233833,15.192435,15.162072,0.030363,95.477309,0.051808
4,2017-01-10,15.466667,15.466667,15.126000,15.324667,15.324667,54900000,0.16,0.88,0.44,0.002,15.418667,-0.006096,15.252000,15.252000,15.212778,15.174116,0.038662,71.686831,0.033339
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1584,2023-04-21,164.800003,166.000000,161.320007,165.080002,165.080002,123539000,0.07,0.26,-0.92,0.018,162.990005,0.012823,186.435500,190.798200,179.692929,183.898405,-4.205476,35.309424,0.156027
1585,2023-04-24,164.649994,165.649994,158.610001,162.550003,162.550003,140006600,0.00,-0.40,0.47,0.018,165.080002,-0.015326,185.042500,189.902800,177.055556,182.317042,-5.261486,33.974557,0.142498
1586,2023-04-25,159.820007,163.470001,158.750000,160.669998,160.669998,121999300,-1.76,-0.99,0.10,0.018,162.550003,-0.011566,183.485500,189.178400,174.534701,180.713557,-6.178856,32.976902,0.136458
1587,2023-04-26,160.289993,160.669998,153.139999,153.750000,153.750000,153364100,-0.41,0.15,-0.75,0.018,160.669998,-0.043070,181.713499,188.360600,171.337055,178.716257,-7.379202,29.538571,0.129873


### Adding Competitor Stock Data for Analysis

* Nio Inc. (NIO)
* Li Auto Inc. (LI)
* Rivian Automotive Inc. (RIVN)
* General Motors Co. (GM)
* Toyota Motor Corp. (TM)
* Ford Motor Co. (F)
* RACE NV (RACE)

In [25]:
NIO = yf.download('NIO',start_date ,end_date)
NIO.reset_index(inplace=True)

# LI = yf.download('LI',start_date ,end_date)
# LI.reset_index(inplace=True)

# RIVN = yf.download('RIVN',start_date ,end_date)
# RIVN.reset_index(inplace=True)

GM = yf.download('GM',start_date ,end_date)
GM.reset_index(inplace=True)

TM = yf.download('TM',start_date ,end_date)
TM.reset_index(inplace=True)

F = yf.download('F',start_date ,end_date)
F.reset_index(inplace=True)

RACE = yf.download('RACE',start_date ,end_date)
RACE.reset_index(inplace=True)

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


In [27]:
NIO['NIO Yest_Close'] = NIO['Adj Close'].shift(periods=1)
# LI['LI Yest_Close'] = LI['Adj Close'].shift(periods=1)
# RIVN['RIVN Yest_Close'] = RIVN['Adj Close'].shift(periods=1)
GM['GM Yest_Close'] = GM['Adj Close'].shift(periods=1)
TM['TM Yest_Close'] = TM['Adj Close'].shift(periods=1)
F['F Yest_Close'] = F['Adj Close'].shift(periods=1)
RACE['RACE Yest_Close'] = RACE['Adj Close'].shift(periods=1)

In [28]:
TSLA_Final = TSLA_Final.merge(NIO[['Date', 'NIO Yest_Close']], how='left', on='Date')
# TSLA_Final = TSLA_Final.merge(LI_Final, how='left', on='Date')
# TSLA_Final = TSLA_Final.merge(RIVN_Final, how='left', on='Date')
TSLA_Final = TSLA_Final.merge(GM[['Date', 'GM Yest_Close']], how='left', on='Date')
TSLA_Final = TSLA_Final.merge(TM[['Date', 'TM Yest_Close']], how='left', on='Date')
TSLA_Final = TSLA_Final.merge(F[['Date', 'F Yest_Close']], how='left', on='Date')
TSLA_Final = TSLA_Final.merge(RACE[['Date', 'RACE Yest_Close']], how='left', on='Date')

In [29]:
TSLA_Final

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Mkt-RF,SMB,HML,...,EMA_12,EMA_26,MACD,RSI,ADS_Index,NIO Yest_Close,GM Yest_Close,TM Yest_Close,F Yest_Close,RACE Yest_Close
0,2017-01-04,14.316667,15.200000,14.287333,15.132667,15.132667,168202500,0.79,0.95,-0.16,...,15.132667,15.132667,0.000000,100.000000,0.147667,,30.451347,118.550003,9.082819,55.978443
1,2017-01-05,15.094667,15.165333,14.796667,15.116667,15.116667,88675500,-0.21,-0.88,-0.79,...,15.130205,15.131481,-0.001276,0.000000,0.127719,,32.132019,121.190002,9.501247,56.424828
2,2017-01-06,15.128667,15.354000,15.030000,15.267333,15.267333,82918500,0.29,-0.66,-0.31,...,15.151302,15.141545,0.009757,91.024248,0.108194,,31.525587,120.440002,9.212676,56.377338
3,2017-01-09,15.264667,15.461333,15.200000,15.418667,15.418667,59692500,-0.37,-0.29,-1.04,...,15.192435,15.162072,0.030363,95.477309,0.051808,,31.179066,120.129997,9.205462,55.978443
4,2017-01-10,15.466667,15.466667,15.126000,15.324667,15.324667,54900000,0.16,0.88,0.44,...,15.212778,15.174116,0.038662,71.686831,0.033339,,31.196383,119.739998,9.111676,55.351612
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1584,2023-04-21,164.800003,166.000000,161.320007,165.080002,165.080002,123539000,0.07,0.26,-0.92,...,179.692929,183.898405,-4.205476,35.309424,0.156027,8.28,33.446861,133.800003,11.597337,274.025513
1585,2023-04-24,164.649994,165.649994,158.610001,162.550003,162.550003,140006600,0.00,-0.40,0.47,...,177.055556,182.317042,-5.261486,33.974557,0.142498,8.33,33.456833,134.669998,11.538715,276.040985
1586,2023-04-25,159.820007,163.470001,158.750000,160.669998,160.669998,121999300,-1.76,-0.99,0.10,...,174.534701,180.713557,-6.178856,32.976902,0.136458,8.29,34.194778,134.839996,11.880674,280.190002
1587,2023-04-26,160.289993,160.669998,153.139999,153.750000,153.750000,153364100,-0.41,0.15,-0.75,...,171.337055,178.716257,-7.379202,29.538571,0.129873,7.90,32.818611,133.509995,11.643260,277.209991


In [30]:
TSLA_Final = TSLA_Final[TSLA_Final['NIO Yest_Close'].notna()]
# TSLA_Final = TSLA_Final[TSLA_Final['LI Yest_Close'].notna()]
# TSLA_Final = TSLA_Final[TSLA_Final['RIVN Yest_Close'].notna()]
TSLA_Final = TSLA_Final[TSLA_Final['GM Yest_Close'].notna()]
TSLA_Final = TSLA_Final[TSLA_Final['TM Yest_Close'].notna()]
TSLA_Final = TSLA_Final[TSLA_Final['F Yest_Close'].notna()]
TSLA_Final = TSLA_Final[TSLA_Final['RACE Yest_Close'].notna()]

In [31]:
TSLA_Final

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Mkt-RF,SMB,HML,...,EMA_12,EMA_26,MACD,RSI,ADS_Index,NIO Yest_Close,GM Yest_Close,TM Yest_Close,F Yest_Close,RACE Yest_Close
426,2018-09-13,19.201332,19.666668,19.011999,19.297333,19.297333,95104500,0.45,-0.44,-0.40,...,19.456098,20.246712,-0.790613,43.193862,-0.158008,6.60,31.649935,119.910004,7.492328,128.253830
427,2018-09-14,19.250668,19.822001,19.101334,19.680000,19.680000,101484000,0.12,0.27,0.25,...,19.490545,20.204733,-0.714188,46.042424,-0.174923,11.60,31.826496,121.959999,7.516393,128.891281
428,2018-09-17,19.336000,20.058001,19.208668,19.656000,19.656000,103314000,-0.71,-0.45,0.84,...,19.516000,20.164086,-0.648087,45.887007,-0.222019,9.90,32.179604,122.279999,7.580566,127.626030
429,2018-09-18,19.779333,20.176001,18.366667,18.997334,18.997334,248212500,0.57,0.05,-0.48,...,19.436205,20.077660,-0.641455,41.724400,-0.236714,8.50,32.542011,122.480003,7.660785,128.263474
430,2018-09-19,18.700666,20.000000,18.700001,19.934668,19.934668,124423500,0.06,-0.66,1.28,...,19.512891,20.067068,-0.554177,48.837241,-0.250940,7.68,32.616344,125.029999,7.684850,133.594925
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1584,2023-04-21,164.800003,166.000000,161.320007,165.080002,165.080002,123539000,0.07,0.26,-0.92,...,179.692929,183.898405,-4.205476,35.309424,0.156027,8.28,33.446861,133.800003,11.597337,274.025513
1585,2023-04-24,164.649994,165.649994,158.610001,162.550003,162.550003,140006600,0.00,-0.40,0.47,...,177.055556,182.317042,-5.261486,33.974557,0.142498,8.33,33.456833,134.669998,11.538715,276.040985
1586,2023-04-25,159.820007,163.470001,158.750000,160.669998,160.669998,121999300,-1.76,-0.99,0.10,...,174.534701,180.713557,-6.178856,32.976902,0.136458,8.29,34.194778,134.839996,11.880674,280.190002
1587,2023-04-26,160.289993,160.669998,153.139999,153.750000,153.750000,153364100,-0.41,0.15,-0.75,...,171.337055,178.716257,-7.379202,29.538571,0.129873,7.90,32.818611,133.509995,11.643260,277.209991


In [32]:
TSLA_Final.to_csv('TSLA_Final_Dataset.csv', index = False, sep = ',')

In [33]:
y = TSLA_Final.iloc[:, -2]
print(y)

426      7.492328
427      7.516393
428      7.580566
429      7.660785
430      7.684850
          ...    
1584    11.597337
1585    11.538715
1586    11.880674
1587    11.643260
1588    11.415736
Name: F Yest_Close, Length: 1163, dtype: float64


### Correlation Matrix of Final Dataframe to check the multi-collinearity between variables

In [36]:
x = TSLA_Final.iloc[1:, 1:]
y = TSLA_Final.iloc[:, -2]
corrmat = TSLA_Final.corr()
top_corr_features = corrmat.index
'''plt.figure(figsize = (16,12))
hm = sns.heatmap(TSLA_Final[top_corr_features].corr(), annot = True, cmap = "YlOrRd")
'''
import plotly.express as px
z= TSLA_Final[top_corr_features].corr()
fig = px.imshow(z, text_auto=True)
fig.show()

In [37]:
X = TSLA_Final[['Mkt-RF', 'SMB', 'HML', 'ADS_Index']]/100
y = TSLA_Final['stock_return'] - TSLA_Final['RF']
X = sm.add_constant(X)

ff_model = sm.OLS(y, X).fit()
print(ff_model.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.315
Model:                            OLS   Adj. R-squared:                  0.312
Method:                 Least Squares   F-statistic:                     133.0
Date:                Mon, 28 Aug 2023   Prob (F-statistic):           1.67e-93
Time:                        16:59:03   Log-Likelihood:                 2228.2
No. Observations:                1163   AIC:                            -4446.
Df Residuals:                    1158   BIC:                            -4421.
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.0029      0.001     -2.732      0.0

In [38]:
X

Unnamed: 0,const,Mkt-RF,SMB,HML,ADS_Index
426,1.0,0.0045,-0.0044,-0.0040,-0.001580
427,1.0,0.0012,0.0027,0.0025,-0.001749
428,1.0,-0.0071,-0.0045,0.0084,-0.002220
429,1.0,0.0057,0.0005,-0.0048,-0.002367
430,1.0,0.0006,-0.0066,0.0128,-0.002509
...,...,...,...,...,...
1584,1.0,0.0007,0.0026,-0.0092,0.001560
1585,1.0,0.0000,-0.0040,0.0047,0.001425
1586,1.0,-0.0176,-0.0099,0.0010,0.001365
1587,1.0,-0.0041,0.0015,-0.0075,0.001299


In [39]:
y

426    -0.011717
427     0.011830
428    -0.009220
429    -0.041510
430     0.041340
          ...   
1584   -0.005177
1585   -0.033326
1586   -0.029566
1587   -0.061070
1588    0.023886
Length: 1163, dtype: float64

In [40]:
ff_model.params

const       -0.002871
Mkt-RF       1.393967
SMB          0.749854
HML         -0.716656
ADS_Index    0.014625
dtype: float64