# TSLA Stock Prices: Analysis & Predictions

## Table of Contents
- [Importing required libraries](#Importing-required-libraries)
- [Load data](#Load-data)
- [Data Cleaning and Preparation](#Data-Cleaning-and-Preparation)
- [Data Exploration & Visualization](#Data-Exploration-&-Visualization)
- [Machine Learning](#Machine-Learning)

## Importing required libraries

In [None]:
import pandas as pd
import numpy as np
import yfinance as yf
from yahoofinancials import YahooFinancials
import plotly.graph_objects as go
import plotly.express as px
import chart_studio
import chart_studio.plotly as py

In [1]:
USER = 'XXX'
API_KEY = 'XXXX'
chart_studio.tools.set_credentials_file(username=USER, api_key=API_KEY)

## Loading the Data from Yahoo Finance

In [2]:
df = yf.download('TSLA', progress=False)

## Data Cleaning and Preparation

### Check the dimensions 

In [3]:
df.shape

(2954, 6)

### Check columns names

In [4]:
df.columns

Index(['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume'], dtype='object')

* Open: The price of the stock when the market opens in the morning
* Close: The price of the stock when the market closed in the evening
* High: Highest price the stock reached during that day
* Low: Lowest price the stock is traded on that day
* Volume: The total amount of stocks traded on that day

In [5]:
df.head(5)

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2010-06-29,3.8,5.0,3.508,4.778,4.778,93831500
2010-06-30,5.158,6.084,4.66,4.766,4.766,85935500
2010-07-01,5.0,5.184,4.054,4.392,4.392,41094000
2010-07-02,4.6,4.62,3.742,3.84,3.84,25699000
2010-07-06,4.0,4.0,3.166,3.222,3.222,34334500


In [6]:
df.tail(5)

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2022-03-16,809.0,842.0,802.26001,840.22998,840.22998,28009600
2022-03-17,830.98999,875.0,825.719971,871.599976,871.599976,22194300
2022-03-18,874.48999,907.849976,867.390015,905.390015,905.390015,33408500
2022-03-21,914.97998,942.849976,907.090027,921.159973,921.159973,27327200
2022-03-22,930.0,997.799927,921.75,993.97998,993.97998,35114038


### Check for Nulls

In [7]:
df.isnull().values.any()

False

### Keep only from 2020-2022 for recent values

In [8]:
df = df.loc[df.index > '2020-01-01']

In [9]:
df.shape

(560, 6)

In [10]:
Min_date = df.index.min()
Max_date = df.index.max()
print ("First date is", Min_date)
print ("Last date is", Max_date)
print ("The period of time is", Max_date - Min_date)

First date is 2020-01-02 00:00:00
Last date is 2022-03-22 00:00:00
The period of time is 810 days 00:00:00


## Data Exploration & Visualization

### Plot Historic Values

### Candlestick Chart

In [11]:
fig = go.Figure(data=[go.Candlestick(x=df.index,
                open=df['Open'],
                high=df['High'],
                low=df['Low'],
                close=df['Close'])])

fig.update_layout(xaxis_rangeslider_visible=True, title="TSLA Candlestick Chart 2020-2022")
fig.update_yaxes(title_text="Stock_Price")
fig.update_xaxes(title_text="Date")
#fig.show()

py.iplot(fig, filename='tsla_candle')

## Summary of the data

In [12]:
df.describe()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
count,560.0,560.0,560.0,560.0,560.0,560.0
mean,571.976986,585.388638,557.616068,572.275191,572.275191,49027190.0
std,304.882229,311.428528,297.432757,304.687301,304.687301,36985310.0
min,74.940002,80.972,70.101997,72.244003,72.244003,9800600.0
25%,298.02449,304.976997,288.407005,297.473488,297.473488,23575620.0
50%,630.945007,650.580017,619.25,640.365021,640.365021,34593300.0
75%,785.89502,801.549988,775.430008,791.50499,791.50499,66608300.0
max,1234.410034,1243.48999,1217.0,1229.910034,1229.910034,304694000.0


## How much has Tesla stock growth this two years?

In [13]:
(df['Open'].iloc[-1]*100)/df['Open'].iloc[0]

1095.4063407366616

### What is the all-time high stock price?

In [14]:
print ("The all time High value was:", df['High'].max(), 'USD [$]')

The all time High value was: 1243.489990234375 USD [$]


In [15]:
df[df['High'] == df['High'].max()]

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-11-04,1234.410034,1243.48999,1217.0,1229.910034,1229.910034,25397400


### What is the all-time low stock price?

In [16]:
print ("The all time Low value was:", df['Low'].min(), 'USD [$]')

The all time Low value was: 70.10199737548828 USD [$]


In [17]:
df[df['Low'] == df['Low'].min()]

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-03-18,77.800003,80.972,70.101997,72.244003,72.244003,118931000


## Plot Only Historic Adjusted Close values

Adjusted close is the closing price after adjustments for all applicable splits and dividend distributions.

In [18]:
fig = px.line(df, x=df.index, y=df['Adj Close'], title = 'Adjusted Close values')
#fig.show()

py.iplot(fig, filename='tsla_adj_close')

## Volume of Stock traded

Volume measures the number of shares traded in a stock or contracts traded in futures or options. Volume can indicate market strength, as rising markets on increasing volume are typically viewed as strong and healthy. When prices fall on increasing volume, the trend is gathering strength to the downside.

In [19]:
fig = px.line(df, x=df.index , y=df['Volume'], title = 'Volume of Stock traded')
#fig.show()

py.iplot(fig, filename='tsla_volume')

In [20]:
df[df['Volume'] == df['Volume'].max()]

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-02-04,176.591995,193.798004,166.776001,177.412003,177.412003,304694000


## Market capitalization

Market capitalization (or market cap) is the total dollar value of all the shares of a company's stock. To calculate a company's market capitalization, multiply its stock's current price by the total number of outstanding shares. 

In [21]:
df_add = df.copy()
df_add['MarketCap'] = df['Open'] * df['Volume']
df_add.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,MarketCap
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-01-02,84.900002,86.139999,84.342003,86.052002,86.052002,47660500,4046377000.0
2020-01-03,88.099998,90.800003,87.384003,88.601997,88.601997,88892500,7831429000.0
2020-01-06,88.094002,90.311996,88.0,90.307999,90.307999,50665000,4463283000.0
2020-01-07,92.279999,94.325996,90.671997,93.811996,93.811996,89410500,8250801000.0
2020-01-08,94.739998,99.697998,93.646004,98.428001,98.428001,155721500,14753050000.0


In [22]:
fig = go.Figure()

market_cap = go.Scatter(x=df_add.index , y=df_add['MarketCap'], name = 'Market Capitalization')

fig.add_trace(market_cap)

# Add title
fig.update_layout(title="Market Capitalization")  
fig.update_yaxes(title_text="MarketCap")
fig.update_xaxes(title_text="Date")
##fig.show()

py.iplot(fig, filename='tsla_mrketcp')

In [23]:
df_add[df_add['MarketCap'] == df_add['MarketCap'].max()]

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,MarketCap
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-12-18,668.900024,695.0,628.539978,695.0,695.0,222126200,148580200000.0


## Moving Averages

In statistics, a moving average is a calculation to analyze data points by creating a series of averages of different subsets of the full data set. It is also called a moving mean or rolling mean and is a type of finite impulse response filter. Variations include: simple, cumulative, or weighted forms.

In [24]:
df_add['MA50'] = df['Open'].rolling(50).mean()
df_add['MA200'] = df['Open'].rolling(200).mean()

In [25]:
fig = go.Figure()

adj_close = go.Scatter(x=df_add.index , y = df_add["Adj Close"], name = 'Adjusted Close')
ma_50 = go.Scatter(x=df_add.index , y = df_add["MA50"], name = 'Moving 50 day Averages')
ma_200 = go.Scatter(x=df_add.index , y = df_add["MA200"], name = 'Moving 200 day Averages')

# Add plots to the figure
fig.add_trace(adj_close)
fig.add_trace(ma_50)
fig.add_trace(ma_200)

# Add title
fig.update_layout(title="Moving Averages")
fig.update_xaxes(title_text="Date")
fig.update_yaxes(title_text="Stock_Price")
#fig.show()

py.iplot(fig, filename='tsla_mov')

## Returns per day

In [26]:
df_add['Returns'] = (df['Close']/df['Close'].shift(1)) -1

In [27]:
fig = px.histogram(df_add, x="Returns", marginal="box", hover_data=df.columns)
fig.update_yaxes(title_text="Frequency")
#fig.show()

py.iplot(fig, filename='tsla_voltile')

## Exponential Moving Average (EMA)

The exponential moving average (EMA) is a technical chart indicator that tracks the price of an investment (like a stock or commodity) over time. The EMA is a type of weighted moving average (WMA) that gives more weighting or importance to recent price data.

### Exponential moving average calculated over a 10-day period.

In [28]:
import pandas_ta

df_add.ta.ema(close='Adj Close', length=10, append=True)

Date
2020-01-02           NaN
2020-01-03           NaN
2020-01-06           NaN
2020-01-07           NaN
2020-01-08           NaN
                 ...    
2022-03-16    821.691818
2022-03-17    830.766029
2022-03-18    844.334026
2022-03-21    858.302380
2022-03-22    882.971035
Name: EMA_10, Length: 560, dtype: float64

In [29]:
df_add.head(15)

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,MarketCap,MA50,MA200,Returns,EMA_10
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2020-01-02,84.900002,86.139999,84.342003,86.052002,86.052002,47660500,4046377000.0,,,,
2020-01-03,88.099998,90.800003,87.384003,88.601997,88.601997,88892500,7831429000.0,,,0.029633,
2020-01-06,88.094002,90.311996,88.0,90.307999,90.307999,50665000,4463283000.0,,,0.019255,
2020-01-07,92.279999,94.325996,90.671997,93.811996,93.811996,89410500,8250801000.0,,,0.038801,
2020-01-08,94.739998,99.697998,93.646004,98.428001,98.428001,155721500,14753050000.0,,,0.049205,
2020-01-09,99.419998,99.760002,94.573997,96.267998,96.267998,142202000,14137720000.0,,,-0.021945,
2020-01-10,96.358002,96.987999,94.739998,95.629997,95.629997,64797500,6243758000.0,,,-0.006627,
2020-01-13,98.699997,105.125999,98.400002,104.972,104.972,132588000,13086440000.0,,,0.097689,
2020-01-14,108.851997,109.482002,104.980003,107.584,107.584,144981000,15781470000.0,,,0.024883,
2020-01-15,105.952003,107.568001,103.358002,103.699997,103.699997,86844000,9201296000.0,,,-0.036102,96.535599


### Drop First 10 days of missing values

In [30]:
df.dropna(inplace=True)



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [31]:
df_add.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,MarketCap,MA50,MA200,Returns,EMA_10
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2020-01-02,84.900002,86.139999,84.342003,86.052002,86.052002,47660500,4046377000.0,,,,
2020-01-03,88.099998,90.800003,87.384003,88.601997,88.601997,88892500,7831429000.0,,,0.029633,
2020-01-06,88.094002,90.311996,88.0,90.307999,90.307999,50665000,4463283000.0,,,0.019255,
2020-01-07,92.279999,94.325996,90.671997,93.811996,93.811996,89410500,8250801000.0,,,0.038801,
2020-01-08,94.739998,99.697998,93.646004,98.428001,98.428001,155721500,14753050000.0,,,0.049205,


In [32]:
df_add.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,MarketCap,MA50,MA200,Returns,EMA_10
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-03-16,809.0,842.0,802.26001,840.22998,840.22998,28009600,22659770000.0,918.899401,848.279251,0.047812,821.691818
2022-03-17,830.98999,875.0,825.719971,871.599976,871.599976,22194300,18443240000.0,911.728199,849.333551,0.037335,830.766029
2022-03-18,874.48999,907.849976,867.390015,905.390015,905.390015,33408500,29215400000.0,906.284999,850.697001,0.038768,844.334026
2022-03-21,914.97998,942.849976,907.090027,921.159973,921.159973,27327200,25003840000.0,903.044598,852.373351,0.017418,858.30238
2022-03-22,930.0,997.799927,921.75,993.97998,993.97998,35114038,32656060000.0,900.037198,854.0642,0.079053,882.971035


In [33]:
fig = go.Figure()
adj_close = go.Scatter(x=df_add.index , y = df_add["Adj Close"], name = 'Adjusted Close')
ema_10 = go.Scatter(x=df_add.index , y = df_add["EMA_10"], name = 'Exponential Moving Average (EMA)')
# Add plots to the figure
fig.add_trace(adj_close)
fig.add_trace(ema_10)
# Add title
fig.update_layout(title="Ajusted Close & Exponential Moving Average (EMA)")
fig.update_xaxes(title_text="Date")
fig.update_yaxes(title_text="Stock_Price")
#fig.show()
py.iplot(fig, filename='tsla_em')

## Machine Learning

In [34]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

### Define Output and Inputs

In [35]:
y = df['Adj Close']
X = df.drop(['Adj Close', 'Close'], axis = 1)

### Split dataset

In [36]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [37]:
X_train.shape, X_test.shape

((448, 4), (112, 4))

In [38]:
y_train.shape, y_test.shape

((448,), (112,))

### Instantiate Linear Regression Model

In [39]:
model = LinearRegression().fit(X_train, y_train)

In [40]:
y_test = y_test.sort_index()
X_test = X_test.sort_index()
y_pred = model.predict(X_test)

### Score

In [41]:
model.score(X_test, y_test)

0.998853624961664

In [42]:
model.score(X_train, y_train)

0.999136520282238

In [43]:
r2_score(y_test, y_pred)

0.998853624961664

### Plot Results 

In [44]:
fig = go.Figure()
real_values = go.Scatter(x = X_test.index, y = y_test, name="Real Values")
pred_values = go.Scatter(x = X_test.index, y = y_pred, name="Predict Values")
# Add plots to the figure
fig.add_trace(real_values)
fig.add_trace(pred_values) 
# Add title
fig.update_layout(title="Tesla Stock Prices")                                           
#fig.show()
py.iplot(fig, filename='tsla_regression')