# Tesla Stock Price Prediction

Author: Xiangyi Zhu

## Introduction

The stock market is very volatila, but with a large amount of data, we are able to find some trend. In the project, I imported the data of the stock price of Tesla from 2011 to 2017. In the project, I aim to use linear regression and K_Neatest neighbor to predict the future stock price and see whether the methods work for the volatila market. Also, I aim to see whether we can use the prediction to make the correct decision on trading.

## Section1: Read and Organize the Dataset

In [None]:
import pandas as pd

In [None]:
df=pd.read_csv("Tesla.csv").dropna()

In [None]:
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
0,6/29/2010,19.0,25.0,17.540001,23.889999,18766300,23.889999
1,6/30/2010,25.790001,30.42,23.299999,23.83,17187100,23.83
2,7/1/2010,25.0,25.92,20.27,21.959999,8218800,21.959999
3,7/2/2010,23.0,23.1,18.709999,19.200001,5139800,19.200001
4,7/6/2010,20.0,20.0,15.83,16.110001,6866900,16.110001


In [None]:
df["Date"]=pd.to_datetime(df["Date"])

Create new columns in the dataframe which will be used in the following section

In [None]:
# the column contain the difference between high price and open price
df["Diff_high"]=df["High"]-df["Open"]

In [None]:
# the column contain the difference between low price and open price
df["Diff_low"]=df["Open"]-df["Low"]

In [None]:
# if the stock is worthing trading, the value is 1, otherwise the value is 0
df["Trade"]=df["Close"]-df["Open"]

df["Worth"]=0
for n in df.index:
    if df.loc[n,"Trade"]>=0:
        df.loc[n,"Worth"]=1 

In [None]:
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close,Diff_high,Diff_low,Trade,Worth
0,2010-06-29,19.0,25.0,17.540001,23.889999,18766300,23.889999,6.0,1.459999,4.889999,1
1,2010-06-30,25.790001,30.42,23.299999,23.83,17187100,23.83,4.629999,2.490002,-1.960001,0
2,2010-07-01,25.0,25.92,20.27,21.959999,8218800,21.959999,0.92,4.73,-3.040001,0
3,2010-07-02,23.0,23.1,18.709999,19.200001,5139800,19.200001,0.1,4.290001,-3.799999,0
4,2010-07-06,20.0,20.0,15.83,16.110001,6866900,16.110001,0.0,4.17,-3.889999,0


In order to predict the stock price, we need to first analyze the original data. Close price is the last price at which a security traded during the regular trading day is the standard benchmark used by investors to track its performance over time. So, we will first use chart to see the change of close over time. 

In [None]:
import altair as alt

In [None]:
type(df["Close"])

pandas.core.series.Series

In [None]:
interval = alt.selection_interval()

big = alt.Chart(df).mark_line().encode(
    x="Date",
    y="Close",
    tooltip=["Date","Close"]
).add_selection(
    interval
)

small = alt.Chart(df).mark_line().encode(
    x="Date",
    y="Close"
).transform_filter(
    interval
)

big|small

## Section2: Linear Regression

I plan to use the data in 2013 and 2014 to be the train data to predict the stock price after 2014 using linear regression to check the accuracy.

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
reg = LinearRegression()

In [None]:
# get the rows that contain the stock price in 2013 and 2014
train_index=[x for x in df.index if df.loc[x,"Date"].year==2013 or df.loc[x,"Date"].year==2014]

# create the sub dataframe that only contain the stock price of 2013 and 2014
df_train=df.loc[train_index]

I plan to use the value of open price, the highest price, and the lowest price to predict the close price. So, I create the train data to contain the columns of "Open", "High", and "Low".

In [None]:
# create the train data
X_train=df_train[["Open","High","Low"]]

# the true value of close price in 2013 and 2014
y_train=df_train["Close"]

In [None]:
# train the data
reg.fit(X_train, y_train)

LinearRegression()

In [None]:
df_train["Pred"]=reg.predict(X_train)

Then, I plan to compare the precited close value for the year of 2015 and the true close value of 2015

In [None]:
# get the rows that contain the stock price in 2015
test_index=[x for x in df.index if df.loc[x,"Date"].year==2015]

# create the sub dataframe that only contain the stock price in 2015
df_test=df.loc[test_index]

In [None]:
X_test=df_test[["Open","High","Low"]]
y_test=df_test["Close"]

In [None]:
df_test["Pred"]=reg.predict(X_test)

In [None]:
# visualize the close price for 2015
c_true=alt.Chart(df_test).mark_line().encode(
    x="Date",
    y=alt.Y("Close", scale=alt.Scale(domain = [150,300])),
    tooltip=["Date","Close"]
)
c_true.properties(title = "Close Price for 2015")

In [None]:
# visualize the predicted close price for 2015
c_pred=alt.Chart(df_test).mark_line().encode(
    x="Date",
    y=alt.Y("Pred", scale=alt.Scale(domain = [150,300])),
    tooltip=["Date","Pred"],
    color=alt.value("#FFAA00"),
)
c_pred.properties(title = "Predicted Close Price for 2015")

In [None]:
df_test["difference"]=abs(df_test["Close"]-df_test["Open"])

In the end, I put the two graphs together to make a more direct comparision between the true price and predicted price.

In [None]:
c_together=c_true+c_pred

c_together=c_together.add_selection(
    interval
)

difference = alt.Chart(df_test).mark_line().encode(
    x="Date",
    y="difference"
).transform_filter(
    interval
)

c_together|difference

We can see from the chart that the true value of the close price and the predicted close is vary close since the two lines overlap each other a lot. Also, most of the difference is below 14, which is small compared the true close price, which value is between 150-200. But we will then use mean value error to evaluate the performance.

In [None]:
from sklearn.metrics import mean_squared_error

In [None]:
mean_squared_error(df_test["Close"],df_test["Pred"])

3.5567349637513797

In [None]:
mean_squared_error(df_train["Close"],df_train["Pred"])

3.2347263243425592

The mean suqared error again shows that the well performance since the MSE for the train data is less than the MSE for the test data and they are very close. 

## Section3 K – Nearest Neighbor (KNN) Classification

In the section, I aim to use the difference between open price and high price, and the difference between open price and low price to classify stock into two categories: one is worth trading, which means the close price is higher than open price, and another is do not worth trade, which means the close price is lower than open price.

Similarly, I will use the data of 2013 and 2014 to predict the data of 2015

In [None]:
from sklearn.neighbors import KNeighborsClassifier

In [None]:
# create the train data
X_train2=df_train[["Diff_high","Diff_low"]]

# the true value of close price in 2013 and 2014
y_train2=df_train["Worth"]

In [None]:
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)

classifier.fit(X_train2, y_train2)

KNeighborsClassifier()

In [None]:
# the test data contains the stock price in 2015
X_test2=df_test[["Diff_high","Diff_low"]]

y_test2=df_test["Worth"]

In [None]:
df_test.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close,Diff_high,Diff_low,Trade,Worth,Pred,difference
1136,2015-01-02,222.869995,223.25,213.259995,219.309998,4764400,219.309998,0.380005,9.61,-3.559997,0,215.74841,3.559997
1137,2015-01-05,214.550003,216.5,207.160004,210.089996,5368500,210.089996,1.949997,7.389999,-4.460007,0,210.402004,4.460007
1138,2015-01-06,210.059998,214.199997,204.210007,211.279999,6261900,211.279999,4.139999,5.849991,1.220001,1,208.881669,1.220001
1139,2015-01-07,213.350006,214.779999,209.779999,210.949997,2968400,210.949997,1.429993,3.570007,-2.400009,0,211.647192,2.400009
1140,2015-01-08,212.809998,213.800003,210.009995,210.619995,3442500,210.619995,0.990005,2.800003,-2.190003,0,211.323862,2.190003


In [None]:
df_test["Pred2"] = classifier.predict(X_test2)

In order to better visualize whether the stock is worth trading, I hightlight the value that is 1, which means worth trading.

In [None]:
df_test.style.highlight_max(color = 'lightgreen', axis = 0)

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close,Diff_high,Diff_low,Trade,Worth,Pred,difference,Pred2
1136,2015-01-02 00:00:00,222.869995,223.25,213.259995,219.309998,4764400,219.309998,0.380005,9.61,-3.559997,0,215.74841,3.559997,0
1137,2015-01-05 00:00:00,214.550003,216.5,207.160004,210.089996,5368500,210.089996,1.949997,7.389999,-4.460007,0,210.402004,4.460007,0
1138,2015-01-06 00:00:00,210.059998,214.199997,204.210007,211.279999,6261900,211.279999,4.139999,5.849991,1.220001,1,208.881669,1.220001,0
1139,2015-01-07 00:00:00,213.350006,214.779999,209.779999,210.949997,2968400,210.949997,1.429993,3.570007,-2.400009,0,211.647192,2.400009,0
1140,2015-01-08 00:00:00,212.809998,213.800003,210.009995,210.619995,3442500,210.619995,0.990005,2.800003,-2.190003,0,211.323862,2.190003,0
1141,2015-01-09 00:00:00,208.919998,209.979996,204.960007,206.660004,4668300,206.660004,1.059998,3.959991,-2.259994,0,206.623287,2.259994,0
1142,2015-01-12 00:00:00,203.050003,204.470001,199.25,202.210007,5950300,202.210007,1.419998,3.800003,-0.839996,0,201.176774,0.839996,0
1143,2015-01-13 00:00:00,203.320007,207.610001,200.910004,204.25,4477300,204.25,4.289994,2.410003,0.929993,1,204.859632,0.929993,0
1144,2015-01-14 00:00:00,185.830002,195.199997,185.0,192.690002,11513900,192.690002,9.369995,0.830002,6.86,1,192.767266,6.86,1
1145,2015-01-15 00:00:00,194.490005,195.75,190.0,191.869995,5216500,191.869995,1.259995,4.490005,-2.62001,0,191.974819,2.62001,0


In [None]:
from sklearn.metrics import confusion_matrix,accuracy_score

In [None]:
ac = accuracy_score(y_test2,classifier.predict(X_test2))
ac

0.8373015873015873

We can see from above that the accuracy is 0.837, which is high.

## Summary

So, in the project, I use linear regression to predict the future close price for Tesla and find the prediction is very close to the true price. Also, I use K-Nearest Neighbor to classify the data into worthing trading, where the close price is higher than the open price and not worth trading, where the close price is lower than the open price. The K-Nearest Neighbor also has high accuracy. Therefore, we can see although the stock market is volatile, we can still use some methods from Python to figure out the trend and the relationships between different prices.

## References

https://www.kaggle.com/datasets/rpaguirre/tesla-stock-price
Tesla Stock Price Dataset

https://www.analyticsvidhya.com/blog/2021/01/a-quick-introduction-to-k-nearest-neighbor-knn-classification-using-python/

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=1104847b-64fc-4edd-bccf-2b2d72fbf101' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>