#         TIME SERIES FORECASTING USING LSTM

Time series forecasting occurs when you make scientific predictions based on historical time stamped data. It involves building models through historical analysis and using them to make observations and drive future strategic decision-making. An important distinction in forecasting is that at the time of the work, the future outcome is completely unavailable and can only be estimated through careful analysis and evidence-based priors.

                          NETFLIX STOCK PRICE PREDICTION :
                          
          The Dataset contains data for 5 years ie. from 5th Feb 2018 to 5th Feb 2022

The art of forecasting stock prices has been a difficult task for many of the researchers and analysts. In fact, 
investors are highly interested in the research area of stock price prediction. For a good and successful investment,
many investors are keen on knowing the future situation of the stock market. Good and effective prediction systems for the 
stock market help traders, investors, and analyst by providing supportive information like the future direction of the stock 
market.


# STEPS INVOLVED:

1.Import Dependencies/Librries that is required

2.preprocessing the data ie,feature extraction

3.select the training and testing features

4.select suitable model

5.prediction

In [2]:
!pip install pandas seaborn matplotlib tensorflow keras sklearn plotly

In [3]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.io as pio
import plotly.graph_objects as go
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,LSTM,Dropout

In [42]:
data=pd.read_csv('C:/Users/Aarathy Sha/Downloads/NFLX.csv')
data

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2018-02-05,262.000000,267.899994,250.029999,254.259995,254.259995,11896100
1,2018-02-06,247.699997,266.700012,245.000000,265.720001,265.720001,12595800
2,2018-02-07,266.579987,272.450012,264.329987,264.559998,264.559998,8981500
3,2018-02-08,267.079987,267.619995,250.000000,250.100006,250.100006,9306700
4,2018-02-09,253.850006,255.800003,236.110001,249.470001,249.470001,16906900
...,...,...,...,...,...,...,...
1004,2022-01-31,401.970001,427.700012,398.200012,427.140015,427.140015,20047500
1005,2022-02-01,432.959991,458.480011,425.540009,457.130005,457.130005,22542300
1006,2022-02-02,448.250000,451.980011,426.480011,429.480011,429.480011,14346000
1007,2022-02-03,421.440002,429.260010,404.279999,405.600006,405.600006,9905200


In [43]:
import plotly.express as px
fig = px.line(data, x='Date', y="Close")
fig.update_traces(marker=dict(size=12,
                              line=dict(width=2,
                                        color='DarkSlateGrey')))

fig.show()

In [44]:
from matplotlib import pyplot as plt
import seaborn as sns


In [45]:
data.value_counts()

Date        Open        High        Low         Close       Adj Close   Volume  
2018-02-05  262.000000  267.899994  250.029999  254.259995  254.259995  11896100    1
2020-10-14  562.609985  572.489990  541.000000  541.450012  541.450012  9499000     1
2020-09-25  474.390015  484.869995  468.029999  482.880005  482.880005  3769400     1
2020-09-28  489.109985  492.000000  477.880005  490.649994  490.649994  4773500     1
2020-09-29  489.500000  496.290009  486.529999  493.480011  493.480011  3541500     1
                                                                                   ..
2019-06-14  341.630005  343.399994  336.160004  339.730011  339.730011  5019000     1
2019-06-17  342.690002  351.769989  342.059998  350.619995  350.619995  5358200     1
2019-06-18  355.570007  361.500000  353.750000  357.119995  357.119995  5428500     1
2019-06-19  361.720001  364.739990  356.119995  363.519989  363.519989  5667200     1
2022-02-04  407.309998  412.769989  396.640015  410.170013 

In [46]:
new_data=data[['Close']]
new_data=np.array(new_data)


In [47]:
data.isnull().sum()

Date         0
Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64

In [51]:
#create a new data frame with only the 'closing'column
new_data=data['Close']

#converting new ata to np array
new_data=np.array(new_data)
new_data_reshape=new_data.reshape(-1,1)

#get the no:of rows to train 
train_len=int(np.ceil(len(new_data)* 0.8))

#scale the data
scaler=MinMaxScaler(feature_range=(0,1))
scale_data=scaler.fit_transform(new_data_reshape)


#create training dataset
train_data=scale_data[0:int(train_len),:]

#split data into x_train and y_train
x_train=[]
y_train=[]

for i in range(60, len(train_data)):
    x_train.append(train_data[i-60:i,0])
    y_train.append(train_data[i,0])

In [52]:
print(x_train)

[array([0.04451626, 0.06954849, 0.06701469, 0.03542955, 0.03405342,
       0.05257641, 0.05327534, 0.0701601 , 0.10133021, 0.09750767,
       0.09757319, 0.10301218, 0.09667768, 0.11369343, 0.13167034,
       0.12391599, 0.12559796, 0.12343551, 0.14672022, 0.1771914 ,
       0.19951508, 0.19064677, 0.18156003, 0.2131015 , 0.19095254,
       0.17911361, 0.19149862, 0.19049385, 0.18472731, 0.17387127,
       0.18265218, 0.18042421, 0.15906164, 0.14647998, 0.18887749,
       0.1459339 , 0.11334393, 0.13426968, 0.10137394, 0.10875693,
       0.12026823, 0.13125532, 0.12007165, 0.12243068, 0.14021101,
       0.15244317, 0.16463161, 0.16987394, 0.16142066, 0.22319301,
       0.21982915, 0.21585376, 0.20508505, 0.18525152, 0.15976057,
       0.15700838, 0.17496343, 0.17011425, 0.17164323, 0.17347804]), array([0.06954849, 0.06701469, 0.03542955, 0.03405342, 0.05257641,
       0.05327534, 0.0701601 , 0.10133021, 0.09750767, 0.09757319,
       0.10301218, 0.09667768, 0.11369343, 0.13167034, 0.12

In [53]:
print(y_train)

[0.17360909661393864, 0.16996133223364263, 0.18830954230997266, 0.20178677968013004, 0.2031629073403567, 0.21061135325098623, 0.20908237397009044, 0.20222360063491573, 0.20674514453645698, 0.20150280816170107, 0.20600248491297135, 0.1995150752463799, 0.19724337299694217, 0.21393154942398507, 0.2134946607555186, 0.2421091647764957, 0.2520914893870261, 0.2564601139542174, 0.25305259116043277, 0.26137481659230777, 0.25713724420919526, 0.27533253713548766, 0.27943905515020884, 0.2881544393186327, 0.2917586070100605, 0.27854347837668547, 0.2767305275773607, 0.2786527333084864, 0.2838513419356371, 0.31901878280740115, 0.3472837881257538, 0.3453397851423501, 0.34188853460096014, 0.37373584482909394, 0.3994670413455388, 0.39658373165669414, 0.3870819601171792, 0.32895744301538254, 0.3615255479010432, 0.3418667373486821, 0.35285382376654373, 0.3441383740687515, 0.3588824819830223, 0.3421506411534302, 0.3593412356174476, 0.3808785219690166, 0.4042943518334746, 0.3969987575435141, 0.4035953566125

In [54]:
#convert x_train,y_train to numpy arrays
x_train,y_train=np.array(x_train),np.array(y_train)

In [55]:
#reshape the data in to the shape accepted by lstm
x_train=np.reshape(x_train, (x_train.shape[0],x_train.shape[1],1))

In [56]:
from keras.layers import Dropout

In [57]:
model= Sequential()
model.add(LSTM(units = 50, return_sequences = True, input_shape = (x_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(units = 50, return_sequences = True))
model.add(Dropout(0.2))
model.add(LSTM(units = 50, return_sequences = True))
model.add(Dropout(0.2))
model.add(LSTM(units = 50))
model.add(Dropout(0.2))
model.add(Dense(units = 1))
model.compile(optimizer = 'adam', loss = 'mean_squared_error')
model.fit(x_train, y_train, epochs = 100, batch_size = 32)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 100/100


<keras.src.callbacks.History at 0x22af288ef20>

In [58]:
#prepare test data
test_data=scale_data[train_len-60:,:]

#prepare input sequence an target vaues for testing
x_test,y_test=[], []
for i in range(60,len(test_data)):
    x_test.append(test_data[i-60:i,0])
    y_test.append(test_data[i,0])

In [59]:
#convert to np array
x_test,y_test=np.array(x_test),np.array(y_test)

In [60]:
#reshape the data in to the shape accepted by lstm
x_test=np.reshape(x_test, (x_test.shape[0],x_test.shape[1],1))

In [61]:
#use the model to make predictions on test data
predictions=model.predict(x_test)



In [62]:
#transforms the actual vaues back to their orginal scale
predictions=scaler.inverse_transform(predictions)
y_test=np.asarray(y_test).reshape(-1,1)
y_test=scaler.inverse_transform(y_test)

In [63]:
#calculating rmse
rmse=np.sqrt(np.mean(((predictions - y_test)**2)))

In [64]:
print(f'rmse:{rmse}')

rmse:30.413055877328215


In [65]:
#splitting the data an preictions to vaidations data
train=data[:train_len]
valid=data[train_len:]
valid['predictions']=predictions



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [66]:
#creating the plot
fig=go.Figure()
fig.add_trace(go.Scatter(x=train.index,y=train['Close'],name='train'))
fig.add_trace(go.Scatter(x=valid.index,y=valid['Close'],name='Val'))
fig.add_trace(go.Scatter(x=valid.index,y=valid['predictions'],name='predictions'))

#set layout
fig.update_layout(
    title='LSTM MODEL -NFLX- TRAINED MODEL',
    xaxis=dict(title='Date'),
    yaxis=dict(title='close-price'),
)
fig.show()

In [75]:
#get the last 100 ays preiction
last_100=new_data[-100:].reshape(-1,1)

#scale the  last_100
last_100_scale=scaler.fit_transform(last_100)

In [76]:
#create an empty list
X_test=[]
X_test.append(last_100_scale)

#convert to numpy array
X_test=np.array(X_test)

#reshape
X_test=np.reshape(X_test, (X_test.shape[0],X_test.shape[1],1))

#prediction scaled
pred_price=model.predict(X_test)

#undo scaing
pred_prices=scaler.inverse_transform(np.array(pred_price).reshape(-1,1))

print(pred_prices)

[[435.71237]]


In [77]:
#Create traces 
trace1= go.Scatter( 
    x=train.index,
    y=train['Close'], 
    mode='lines', 
    name='Train'
)    

trace2 = go.Scatter(
    x=valid.index,
    y=valid['Close'], 
    mode='lines', 
    name='val'
)

trace3= go.Scatter(
    x=valid.index,
    y=valid['predictions'], 
    mode='lines', 
    name='Predictions'
)

#Create a list of dates for the x-axis

dates=pd.date_range(start=pd.Timestamp.today(), 
                    end=len(pred_prices),freq='M')



#Add a scatter plot for the predicted prices

trace4= go.Scatter(
    x=dates,
    y=pred_prices.flatten(),
    mode="lines",
    name= 'Predicted Prices'
)

data = [trace1, trace2, trace3, trace4]

#edit the layout

layout=dict(
    title='LSTM model PETROL Price Prediction',
    xaxis=dict(title='Date'),
    yaxis=dict(title='predicted prices')

)
go. Figure(fig).show()