# **Project Name**    -

# **YES Bank Stock Closing Price Prediction**




  # Ritika Sharma  

# **Project Summary -**

Financial forecasting plays a crucial role in stock market analysis, investment planning, and risk assessment. This project focuses on predicting closing prices using historical data and machine learning techniques. The dataset includes key financial indicators such as Open, High, Low, Close prices and Date. The primary objective is to develop an accurate regression model that generalizes well to unseen data while minimizing prediction errors.

Data Cleaning & Preprocessing

Before training the models, extensive data cleaning was performed to enhance model performance. This included:

Removing outliers using Interquartile Range (IQR) to prevent extreme values from skewing predictions.

Handling skewness using log transformation, ensuring normally distributed features.

Feature scaling using StandardScaler to maintain consistency across different models.

The cleaned dataset was then split into training (80%) and testing (20%) sets for model evaluation.

Models Implemented & Performance Analysis

Tested multiple regression models to identify the best-performing algorithm:

Linear Regression.

Decision Tree Regressor.




# **GitHub Link -**

# **Problem Statement**


Since 2018, It has been in the news because of the fraud case invloving Rana Kapoor. Owing to this fact, it was intresting to see how that impacted the stock prices of the company and whether time series model or any other predictive model can do justice to such situations. Stock price prediction is a complex and highly dynamic task due to various market factors. One such major factor is 2018 fraud case, which might have significantly fluctuated stock prices. The goal is to analyze historical stock price data and develop a model that can provide reliable future price estimates.

 # **Objective**

The main objective is to predict the stocks's closing price of the month

### Import Libraries

In [None]:
! pip install mplfinance

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import datetime as dt
import warnings
warnings.filterwarnings('ignore')
from scipy.stats import norm
import mplfinance as mpf

### Dataset Loading

In [None]:
# store the data in variable "df".
df = pd.read_csv("/content/sample_data/data_YesBank_StockPrices.csv")

### Dataset First View

In [None]:
 # Displays all rows
pd.set_option('display.max_rows', None) # if want to show specific number of rows, provide the number instead of 'None'

In [None]:
# First appearnace of the data
df

### Dataset Rows & Columns count

In [None]:
df.head()

In [None]:
# provide the column name,nonnull count and datatype of the column with the important info function
df['Date'] = pd.to_datetime(df['Date'], format='%b-%y', errors='coerce')# Convert 'Date' column to datetime, assuming years are in the 2000s
df.info()

In [None]:
# set date column as the index of the dataset
df.set_index("Date",inplace = True)

In [None]:
# give all the columns name present in the data
df.columns

### Dataset Information

In [None]:
# Mini statistics of the data give by function dot describe
df.describe()

#### Duplicate Values

In [None]:
# finding duplicate and then drop it permanently
df.drop_duplicates(inplace=True)

In [None]:
df.info()

#### Missing Values/Null Values

## Handle Missing Data (if any)

If any missing values are found in Open, High, Low, or Close, fill them using:

Forward-fill (ffill())

Backward-fill (bfill())

Interpolation (interpolate())

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values

# there is no missing values in the dataset

### What did you know about your dataset?

The dataset is a time series dataset. A time series dataset is a collection of data points ordered in time, where each data entry corresponds to a specific timestamp. Time series data is used to analyze trends, patterns, and future predictions.

Key Characteristics of Time Series Data

Time Dependency → The order of data matters, as past values influence future ones.

Regular Intervals → Data is often recorded at consistent intervals (daily, hourly, monthly, etc.).

Trend & Seasonality → Time series data often exhibits trends (long-term growth/decline) and seasonal patterns (recurring fluctuations).

Autocorrelation → Future values can be correlated with past values.



## ***2. Understanding The Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

"Variables Description" refers to explaining each column (feature) in the dataset.It helps better understand the data

Date  -The date of the record	DateTime	-

Open  -Opening price of the stock on that month

High  -Highest price of the stock on that month

Low   -Lowest price of the stock on that month

Close -Closing price of the stock on that month

### Check Unique Values for each variable.

In [None]:
for col in df.columns :
  print(col,df[col].nunique())
  print()

## 3. ***Data Wrangling***

##### 1. Feature engineering:

create  new features like month and year

##### 2. compute  moving averages:

##### 3. compute volatility:

##### 4. compute percentage change:


In [None]:
# write your code to make your dataset analysis ready.
df["Month"] = df.index.month
df["Year"] = df.index.year

In [None]:
# extracting five sample data point to see the code result in our data.
df.sample(5)

In [None]:
# computing moving average over two month
# it smooths out price fluctuations and helps identify trends
# if price is above the moving avg ,its an uptrend otherwise is a low trend
df["Moving_Average"] = df["Close"].rolling(window=5).mean()

In [None]:
round(df["Close"].mean(),2)

In [None]:
#fill NaN values using backward fill
df = df.fillna(method='bfill')
df

##### Volatility: Measures how much the price fluctuates over time.

* High volatility means rapid price movements (riskier but good for traders).

* Low volatility means stable prices (safer for long-term investors).



In [None]:
# check volatility of the stock
df['Volatility'] = df['High'] - df['Low']
df.sample(10)

##### Percentage change: Measures the percentage increase or decrease in price from the previous period.

* A positive percentage change → Price is increasing.

* A negative percentage change → Price is decreasing.

* Consistently high % change → Strong trend or breakout.


In [None]:
# compute percentage change in the stock over the periods.
df['Return'] = df['Close'].pct_change()

In [None]:
df = df.fillna(method='bfill')
df.head()

### What all manipulations have you done and insights you found?

The trend we observe in the Moving Average (MA) column suggests an overall rise, peak, and decline in the stock's price over the selected period.

1.During the opening time of the bank there was increase in the stock's prices indicating bullish trend that there are more buyers and high popularity of the stock.

2.In the middle , the stock get to its highest prices shows market peak .

3.During the end periods or say the fraud time the stocks prices again starts decreasing simply idicating a bearish trend.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
sns.lineplot(x=df.index,y=df['Close'])
plt.title('Closing Price Over Time')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.show()

##### 2. What is/are the insight(s) found from the chart?

Initially , there is a increasing trend in the stock closing price. then there is a saturation around 17-19 and then there is sudden decrease in the year 2019 and then there is continuous descrese in the price.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The positive impact is that, We can find the reason for the downfall and simply work on that to get back to bullish trend.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
sns.barplot(x=df['Year'],y=df['Moving_Average'])#ci= none will remove the confidence interval from the bars
plt.title('Moving Average Over Time')
plt.xlabel('Year')
plt.xticks(rotation = 45)
plt.ylabel('Moving Average')
plt.show()

##### 1. Why did you pick the specific chart?

To find the insights over every year i need to create bin of every year hence use the bar plot for the same.there is high fluctuation during 2016 ,17,18 and 19.

##### 2. What is/are the insight(s) found from the chart?

The black line you see over each bar in your bar chart represents the error bars or confidence intervals.

Longer black lines → More variability (higher standard deviation) in the moving average values for that year.

Shorter black lines → Less variability (more stable values).

No black lines → Indicates no variation (or error bars are disabled).



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

From the year of 2016 , public loose confidence in the stock of the bank and till end it reamain the same and get highest in the year 2019.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
sns.barplot(x=df['Year'],y=df['Volatility'])
plt.title('Volatility Over Time')
plt.xlabel('Year')
plt.xticks(rotation= 45)
plt.ylabel('Volatility')
plt.show()

##### 1. Why did you pick the specific chart?

Over time , I want to compare the volatility in the closing price of the stock

##### 2. What is/are the insight(s) found from the chart?

In the year 2018 has the highest volatility in the stock price.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

There is something wrong in the year 2018 with the business which is need to be figured out and then be resolved to get the business back to normal trend.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
import mplfinance as mpf

mpf.plot(df, type='candle', volume=False, style='charles', title='Stock Price Candlestick Chart')



##### 1. Why did you pick the specific chart?

To visualizes market trends or  bullish/bearish pattern in the stock performance over the years.

##### 2. What is/are the insight(s) found from the chart?

The highest bearish behaviour is shown in year 2018 around in July.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Excatly in 2018, we can find what exactly has happened in this time with the company.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
monthly_avg = df.groupby("Month")["Close"].mean()
sns.barplot(x=monthly_avg.index, y=monthly_avg.values, palette="coolwarm")
plt.xlabel("Month")
plt.ylabel("Avg Closing Price")
plt.title("Average Monthly Closing Price")
plt.show()


##### 1. Why did you pick the specific chart?

 TO Compare stock performance across different months.

##### 2. What is/are the insight(s) found from the chart?

Average closing pyaar is highest in the first quater of the year.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Here , we can decide and take some actions for the other three quater to make the consistency in the stock price.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
plt.scatter(df['Volatility'], df['Return'], color='purple', alpha=0.6)
plt.xlabel("Volatility")
plt.ylabel("Return")
plt.title("Stock Return vs. Volatility")
plt.show()


##### 1. Why did you pick the specific chart?

Shows risk-return tradeoff; higher volatility often means higher returns

##### 2. What is/are the insight(s) found from the chart?

Here the correlation in the two column is negative. showing that if volatility is less then the returns are also low and as volatility increases the return start decreases.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

We could work on the stagnancy in the volatitlity of the stock price.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
plt.hist(df['Return'], bins=30, color='blue', alpha=0.7)
plt.xlabel("Return")
plt.ylabel("Frequency")
plt.title("Distribution of Stock Returns")
plt.show()

##### 1. Why did you pick the specific chart?

To show how stock returns are distributed, identifying skewness & risk.

##### 2. What is/are the insight(s) found from the chart?

Stock returns are normally distributed.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
plt.figure(figsize=(12,6))
plt.plot(df.index, df['Close'], label='Closing Price', color='blue')
plt.plot(df.index,df['Moving_Average'], label='Moving Average', color='orange')
plt.plot(df.index, df['Volatility'], label='Volatility', color='red', linestyle='dashed')
plt.legend()
plt.title('Stock Price, Moving Average & Volatility Over Time')
plt.xlabel('Date')
plt.ylabel('Values')
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

To find relationships between different stock indicators.

##### 2. What is/are the insight(s) found from the chart?

#### Chart - 9

In [None]:
# Chart - 9 visualization code
sns.boxplot(x=df["Month"], y=df["Volatility"], palette="pastel")
plt.xlabel("Month")
plt.ylabel("Volatility")
plt.title("Monthly Volatility Distribution")
plt.show()


##### 1. Why did you pick the specific chart?

 To shows how volatility varies by month and detects outliers

##### 2. What is/are the insight(s) found from the chart?

there is the highest volatility in the nineth month of the year.


Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code
fig, ax1 = plt.subplots(figsize=(10,5))

ax1.set_xlabel("Date")
ax1.set_ylabel("Moving Average", color='g')
ax1.plot(df.index, df['Moving_Average'], color='g', label="Moving Average")
ax1.tick_params(axis='y', labelcolor='g')

ax2 = ax1.twinx()
ax2.set_ylabel("Volatility", color='r')
ax2.plot(df.index, df['Volatility'], color='r', linestyle="dashed", label="Volatility")
ax2.tick_params(axis='y', labelcolor='r')

fig.suptitle("Moving Average & Volatility Trend")
plt.legend()
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

To compare trends of volatility and stock trend movements.

##### 2. What is/are the insight(s) found from the chart?

in the end year of the data there is more volatility than the stock price trend indicating the wekaness in the stock.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code
plt.figure(figsize=(6,5))
sns.boxplot(y=df['Volatility'])
plt.title('Box Plot of Stock Volatility')
plt.ylabel('Volatility')
plt.show()


##### 1. Why did you pick the specific chart?

 To detect outliers in volatility, returns, or prices

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code
plt.figure(figsize=(10,5))
plt.plot(df.index, df['Moving_Average'], label='Moving Average', color='orange')
plt.plot(df.index, df['Close'], label='Closing Price', color='blue', alpha=0.6)
plt.legend()
plt.title('Stock Prices & Moving Average Trend')
plt.xlabel('Date')
plt.ylabel('Price')
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

To observe the trend of stock prices & moving averages

##### 2. What is/are the insight(s) found from the chart?

Bullish/bearish phases & price momentum.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
yearly_avg = df.groupby('Year')['Close'].mean()
plt.figure(figsize=(8,5))
sns.barplot(x=yearly_avg.index, y=yearly_avg.values)
plt.title('Average Closing Price per Year')
plt.xlabel('Year')
plt.ylabel('Average Closing Price')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

 To compare average stock prices for different year

##### 2. What is/are the insight(s) found from the chart?

Highlights overall price movement over years.
Data is left skewed

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(8,5))
sns.heatmap(df[['Close', 'Volatility', 'Return', 'Moving_Average']].corr(), annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Stock Data Correlation Matrix')
plt.show()

##### 1. Why did you pick the specific chart?

To find relationships between different stock indicators.

##### 2. What is/are the insight(s) found from the chart?

Identifies how volatility, returns, and prices are related

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df[['Close', 'Volatility', 'Return', 'Moving_Average']])
plt.show()

##### 1. Why did you pick the specific chart?

To visually explore correlations between key stock metrics.

##### 2. What is/are the insight(s) found from the chart?

 Helps identify relationships between returns, prices, and volatility

# ***Histograms with KDE to detect skewness.***

In [None]:
# Define the columns to analyze
columns = ['Open', 'High', 'Low', 'Close']

# Create subplots (2 rows, 2 columns)
fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# Flatten the axes array for easy iteration
axes = axes.flatten()

# Loop through each column and plot
for i, col in enumerate(columns):
    sns.histplot(df[col], bins=30, kde=True, ax=axes[i], color='green', edgecolor='black')

    # Calculate mean and median
    mean_val = df[col].mean()
    median_val = df[col].median()

    # Plot mean and median as dashed lines
    axes[i].axvline(mean_val, color='red', linestyle='dashed', linewidth=2, label=f'Mean: {mean_val:.2f}')
    axes[i].axvline(median_val, color='green', linestyle='dashed', linewidth=2, label=f'Median: {median_val:.2f}')

    # Add title and legend
    axes[i].set_title(f'Distribution of {col}')
    axes[i].legend()

    # calculate Skewness for each column
    skewness = df[col].skew()
    axes[i].text(0.05, 0.95, f'Skewness: {skewness:.2f}', transform=axes[i].transAxes, fontsize=10, verticalalignment='top',horizontalalignment="right")

# Adjust layout and show the plots
plt.tight_layout()
plt.show()


# Hypothesis and Interpretation of above Histograms
Hypothesis:

The stock price variables (Open, High, Low, Close) might be right-skewed, indicating that most stock prices are concentrated in the lower range, with a few extreme values on the higher end.

There could be a significant difference between the mean and median, which suggests potential skewness or outliers in the data.

If skewness is high, the dataset might require transformations (e.g., log transformation) for better statistical modeling.

Interpretation:
The skewness values for Open (1.27), High (1.23), Low (1.30), and Close (1.26) indicate high positive skewness.

If skewness is between -0.5 to 0.5 → Distribution is nearly symmetric (not a concern).

If skewness is between 0.5 to 1.0 → Mild skewness (needs some attention).

If skewness is above 1.0 → Highly skewed → This needs careful analysis!

This confirms that the stock prices have a longer right tail, meaning that while most prices are low, some extreme values are present at the higher end. Mean vs. Median (Central Tendency):

The red dashed line (Mean) is positioned to the right of the green dashed line (Median) in all distributions.

This further confirms positive skewness, as the mean is being pulled toward the higher values due to a few high-priced stocks.

The density curves (KDE) show a peak at lower values, reinforcing that most stock prices fall within a lower range.

There are fewer instances of higher stock prices, contributing to the skewness. Potential Impact on Modeling:

Since the data is skewed, some machine learning models (like linear regression) may not perform well due to non-normality.

Conclusion:
The stock price variables exhibit moderate positive skewness, meaning stock prices are not normally distributed.
The difference between mean and median suggests the presence of outliers or extreme values.
Transformations might be necessary before using statistical models that assume normality.
Possible Solution:

Applying log transformation to normalize the distributions.

# Transform Data (Reduce Skewness and Outliers)

* Compresses large values (reduces impact of high outliers).

* Makes right-skewed distributions more normal.

* Retains trends without losing extreme values.

# **Detecting Outliers**

In [None]:
# Selecting the columns where we want to detect outliers
columns_to_check = ['Open', 'High', 'Low', 'Close']

# Creating a function to detect outliers
def detect_outliers_iqr(df, column):
    Q1 = df[column].quantile(0.25)
    Q3 = df[column].quantile(0.75)
    IQR = Q3 - Q1

    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    outliers = df[(df[column] < lower_bound) | (df[column] > upper_bound)]

    return outliers, lower_bound, upper_bound

# Checking outliers for all columns
for col in columns_to_check:
    outliers, lb, ub = detect_outliers_iqr(df, col)
    print(f"Outliers in {col}: {len(outliers)}")
    print(f"Lower Bound: {lb}, Upper Bound: {ub}\n")

# using winsorization over droping the outliers so that do not miss any imp info.
for col in columns_to_check:
    Q1 = df[col].quantile(0.25)
    Q3 = df[col].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    df[col] = np.where(df[col] > upper_bound, upper_bound, df[col])
    df[col] = np.where(df[col] < lower_bound, lower_bound, df[col])



Hypothesis:

H₀ (Null Hypothesis): There is no significant difference in the distribution of Open, High, Low, and Close prices. Stock price movements are stable with minimal extreme variations.

H₁ (Alternative Hypothesis): There are significant variations in stock prices, and extreme price movements (outliers) indicate market volatility.

Interpretation:
The median price for each category (Open, High, Low, Close) is positioned closer to the lower quartile (Q1), suggesting a right-skewed distribution. This means prices tend to have occasional sharp increases rather than sharp drops.
The IQR (Interquartile Range) for all categories is relatively broad, indicating fluctuations in stock prices.

The presence of outliers in all four price categories suggests that there were unusual price movements on certain days.
High and Close prices have more outliers, indicating that stock prices experienced sudden spikes.

This could be due to external market events, earnings reports, or investor sentiment shifts.

Conclusion:
Since outliers exist in all categories, stock prices are subject to high fluctuations on certain days.
The data shows that sudden price spikes (outliers in High & Close prices) are more common than sudden drops, which aligns with the trend of stocks reacting strongly to positive news.

# **Data Cleaning & Preprocessing**

# Log Transformation to handle skewness

In [None]:
col= ['Open', 'High', 'Low', 'Close']
for i in col:
    df[i] = np.log(df[i])
df.head()

In [None]:
# Create subplots
fig, axes = plt.subplots(2,2, figsize=(12,8))
axes = axes.flatten()  # Flatten the axes array for easy iteration

for i, col in enumerate(col):  # Use transformed columns
    sns.histplot(df[col], kde=True, color='green', stat='density', bins=30, ax=axes[i],edgecolor='black')
    axes[i].set_title(f'Distribution of {col} (Log Transformed)')
    axes[i].set_xlabel(col)
    axes[i].set_ylabel('Density')

    # Calculate skewness
    skewness = df[col].skew()

    # Plot mean and median lines and save the line objects
    mean_line = axes[i].axvline(df[col].mean(), color='red', linestyle='dashed', linewidth=1)
    median_line = axes[i].axvline(df[col].median(), color='green', linestyle='dashed', linewidth=1)

    # Create legend with the line objects
    axes[i].legend([mean_line, median_line], ['Mean', 'Median'])

    # Annotate skewness
    axes[i].text(0.95, 0.95, f'Skewness: {skewness:.2f}',
                 transform=axes[i].transAxes,
                 horizontalalignment='right',
                 verticalalignment='top',
                 fontsize=12)

# Adjust layout
plt.tight_layout()
plt.show()

Log Transformation Impact

The histograms depict the distributions of Open, High, Low, and Close prices after applying a log transformation.
Log transformation is useful for stabilizing variance and making the data more normally distributed.

Skewness Analysis:
Skewness values are close to 0 for all four variables, indicating that the distributions are nearly symmetrical.
Log transformation successfully reduced any original skewness, making the data more normally distributed.

Density and Shape
The distributions appear fairly uniform with no significant outliers or extreme asymmetry.

Kernel Density Estimation (KDE) overlay shows a peak around the middle values, reinforcing the symmet

# **Scaling **

In [None]:
from sklearn.preprocessing import StandardScaler

# Define columns to standardize
cols_to_standardize = ['Open', 'High', 'Low', 'Close']

# Initialize StandardScaler
scaler = StandardScaler()

# Apply standardization (Z-score normalization)
df[cols_to_standardize] = scaler.fit_transform(df[cols_to_standardize])

# Display the first few rows to verify transformation
print(df.head())

In [None]:
x = df.drop(columns=['Close'])  # Features (independent variables)
y = df['Close']  # Target variable

In [None]:
x.head()

In [None]:
from sklearn.model_selection import train_test_split as tts

In [None]:
x_train, x_test, y_train, y_test = tts(x, y, test_size=0.2, random_state=42)
# Print shapes to verify correctness
print(f"xtrain shape: {x_train.shape}, ytrain shape: {y_train.shape}")
print(f"xtest shape: {x_test.shape}, ytest shape: {y_test.shape}")

# **MODEL DEVELOPMENT**

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score, mean_squared_error


# 1. LINEAR REGRESSION

In [None]:
lr = LinearRegression()
lr.fit(x_train, y_train)
y_pred_lr = lr.predict(x_test)
r2_lr = r2_score(y_test, y_pred_lr)
rmse_lr = np.sqrt(mean_squared_error(y_test, y_pred_lr))
mae_lr = mean_absolute_error(y_test, y_pred_lr)

# Predictions on train and test sets
y_train_pred = lr.predict(x_train)
y_test_pred = lr.predict(x_test)

# Calculate R2 and RMSE for training set
r2_train = r2_score(y_train, y_train_pred)
rmse_train = np.sqrt(mean_squared_error(y_train, y_train_pred))

# Calculate R2 and RMSE for test set
r2_test = r2_score(y_test, y_test_pred)
rmse_test = np.sqrt(mean_squared_error(y_test, y_test_pred))

# Print results
print(f"Linear Regression - Train R2 Score: {r2_train:.4f}, Train RMSE: {rmse_train:.4f}")
print(f"Linear Regression - Test R2 Score: {r2_test:.4f}, Test RMSE: {rmse_test:.4f}")

print(f'Linear Regression - R2 Score: {r2_lr:.4f}')
print(f'Linear Regression - RMSE: {rmse_lr:.4f}')

# **Conclusion**

# The Model is good.
