## Stock Price Prediction

### Today, so many people are making money staying at home trading in the stock market. It is a plus point for you if you use your experience in the stock market and your machine learning skills for the task of stock price prediction.


### Let’s see how to predict stock prices using Machine Learning and the python programming language.

### I will start this task by importing all the necessary python libraries that we need for this task:


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

### Reading the data

In [2]:
df = pd.read_csv("price.csv") 

In [3]:
df.head(10)

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,1/3/2012,325.25,332.83,324.97,663.59,7380500
1,1/4/2012,331.27,333.87,329.08,666.45,5749400
2,1/5/2012,329.83,330.75,326.89,657.21,6590300
3,1/6/2012,328.34,328.77,323.68,648.24,5405900
4,1/9/2012,322.04,322.29,309.46,620.76,11688800
5,1/10/2012,313.7,315.72,307.3,621.43,8824000
6,1/11/2012,310.59,313.52,309.4,624.25,4817800
7,1/12/2012,314.43,315.26,312.08,627.92,3764400
8,1/13/2012,311.96,312.3,309.37,623.28,4631800
9,1/17/2012,314.81,314.81,311.67,626.86,3832800


#### for summary of data


In [4]:
df.info() # Summary of data

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1258 entries, 0 to 1257
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    1258 non-null   object 
 1   Open    1258 non-null   float64
 2   High    1258 non-null   float64
 3   Low     1258 non-null   float64
 4   Close   1258 non-null   object 
 5   Volume  1258 non-null   object 
dtypes: float64(3), object(3)
memory usage: 59.1+ KB


In [5]:
# Convert the "Date" column to datetime
df["Date"] = pd.to_datetime(df["Date"])

#  Clean and convert "Close" and "Volume" columns to numeric
df["Close"] = df["Close"].str.replace(",", "").astype(float)
df["Volume"] = df["Volume"].str.replace(",", "").astype(float)

# Extract useful features from the "Date" column
df["Year"] = df["Date"].dt.year
df["Month"] = df["Date"].dt.month
df["Day"] = df["Date"].dt.day
df["DayOfWeek"] = df["Date"].dt.dayofweek  # Monday=0, Sunday=6
df["ElapsedDays"] = (df["Date"] - df["Date"].min()).dt.days

# Display the updated dataframe
print(df.info())
print(df.head())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1258 entries, 0 to 1257
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   Date         1258 non-null   datetime64[ns]
 1   Open         1258 non-null   float64       
 2   High         1258 non-null   float64       
 3   Low          1258 non-null   float64       
 4   Close        1258 non-null   float64       
 5   Volume       1258 non-null   float64       
 6   Year         1258 non-null   int32         
 7   Month        1258 non-null   int32         
 8   Day          1258 non-null   int32         
 9   DayOfWeek    1258 non-null   int32         
 10  ElapsedDays  1258 non-null   int64         
dtypes: datetime64[ns](1), float64(5), int32(4), int64(1)
memory usage: 88.6 KB
None
        Date    Open    High     Low   Close      Volume  Year  Month  Day  \
0 2012-01-03  325.25  332.83  324.97  663.59   7380500.0  2012      1    3   
1 2012-01-04  3

#### Restructuring The DataFrame Into Dependent And Independent Data Frame


In [6]:
x = df[["Open","High", "Low","Volume","ElapsedDays"]]  # Example features
y = df["Close"]  # Target variable

#### Split data

In [7]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

#### Call Linear Regression Model And us fit() Method With Train DataSet For Training

In [8]:
# Train model
model = LinearRegression()
model.fit(x_train, y_train)

#### Printing R2 Value , Coeffcient And Intercept

In [9]:
from sklearn.metrics import r2_score

In [10]:
print("R2 Value",model.score(x_test,y_test))
print("\ncoefficient:\n",model.coef_)
print("\nintercept:",model.intercept_)

R2 Value 0.4931204838221701

coefficient:
 [-9.14023279e-01 -4.91895270e+00  8.25935245e+00  2.08499568e-05
 -6.01348015e-01]

intercept: -41.5130323619145


#### Printing MSE

In [11]:
y_pred = model.predict(x_test)

In [12]:
print("MSE:", mean_squared_error(y_test, y_pred))

MSE: 14493.358904735853


#### Using Train Model Predict For Test Data And Than Compare With Original Test Data


In [13]:
pred = pd.DataFrame({"Actual": y_test, "Predict":y_pred})
df1 = pred.head(10)
df1

Unnamed: 0,Actual,Predict
561,558.46,780.702708
101,592.71,660.304275
51,623.33,776.957197
63,640.86,735.352318
1073,736.1,781.362129
424,893.74,698.069801
793,569.78,593.614754
1045,718.81,690.625544
1143,741.19,750.668991
1037,697.35,687.304749
