# **Project Name**    -



##### **Project Type**    - Regression
##### **Contribution**    - Individual
##### **Team Member 1 -** Devanshu Mankar



# **Project Summary -**

This project aims to predict the closing stock price of Yes Bank using historical stock data and machine learning. The dataset includes daily information such as date, opening price, closing price, high, and low values.
The process begins with data cleaning, which involves handling any missing or inconsistent values. We then move to feature engineering, where we create new columns to enrich the dataset. Afterward, we split the data into training and testing sets and choose a regression model to predict the stock’s closing price. Possible models include linear regression, decision trees, or more advanced algorithms like LSTM, which are suitable for time-series data.

Once the model is trained, we evaluate its accuracy using performance metrics like Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE). Finally, the project provides insights into stock price patterns and the model’s prediction capabilities, potentially helping investors or analysts understand and forecast future stock performance

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The goal of this project is to predict Yes Bank’s daily closing stock price using historical data. By analyzing patterns and trends in past stock prices, we aim to develop a regression model that accurately forecasts future closing prices, providing valuable insights for investors and analysts.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.




















# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
df=pd.read_csv('/content/drive/MyDrive/data_YesBank_StockPrices.csv')

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print('Rows :', df.shape[0])
print('Columns :',df.shape[1])

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().any()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
sns.heatmap(df.isnull(), cbar=False)

### What did you know about your dataset?

The dataset has total 165 rows and 5 columns.

The dataset does not have any null as well as duplicate values in it.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

Date- Date of Record

Open - opening of price

Close - Closing price

Low- Lowest price in day

High - Highest price in day





### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in df.columns:
  df[i].unique
  print(i,':',df[i].unique())

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
df.dtypes

In [None]:
Month=[]
Day=[]
for i in df['Date']:
  i.split('-')
  Month.append(i.split('-')[0])
  Day.append(i.split('-')[1])

In [None]:
df['Month']=Month
df['Day']=Day

In [None]:
df.drop(columns=['Date'],inplace=True)

In [None]:
df.isnull().sum()

In [None]:
df['Percentage_Change'] = (df['Close'] - df['Open']) / df['Open'] * 100

In [None]:
df['Daily_Range']= df['High'] -df['Low']

In [None]:
df['Price_Change'] = df['Close'] - df['Open']

In [None]:
df.Month=df['Month'].map({'Jan':1,'Feb':2,'Mar':3,'Apr':4,'May':5,'Jun':6,'Jul':7,'Aug':8,'Sep':9,'Oct':10,'Nov':11,'Dec':12})

In [None]:
df.sort_values(by=['Month','Day'],inplace=True)

In [None]:
df

### What all manipulations have you done and insights you found?


Extracted Month and Day from Date.

Created the New columns named Percentage_Change,Daily_Range,Price_Change usind existing columns

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
sns.scatterplot(x='High',y='Low',data=df)
plt.show()

##### 1. Why did you pick the specific chart?

Examines the relationship between daily high and low prices, which can indicate price stability or volatility.


##### 2. What is/are the insight(s) found from the chart?

Price Stability and volatility


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes it can help the investor as if the price is making maximum high at the same time it will also make an maximum low .which gives them idea about the volatility in the price


#### Chart - 2

In [None]:
# Chart - 2 visualization code
sns.scatterplot(x='Percentage_Change',y='Price_Change',data=df)
plt.show()

##### 1. Why did you pick the specific chart?

It is best suitable to show relationship between the two variables

##### 2. What is/are the insight(s) found from the chart?

Range of price change is between -50 to 50 and percentage change is 20 to -20

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

It is mostly found to be in a range

#### Chart - 3

In [None]:
# Chart - 3 visualization code
sns.scatterplot(x='Percentage_Change',y='Daily_Range',data=df)
plt.show()

##### 1. Why did you pick the specific chart?

IT is best suitable to show relationship between the two variables

##### 2. What is/are the insight(s) found from the chart?

The Daily range of price is mostly  between 0 to 25 and for percentage of change between -20 to 20

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes It has positive impact as we can see  that i can flactuate between 0 to 25 mostly which helps to investors to get minimum loss if any trade go against them


#### Chart - 4

In [None]:
perch=[]
for i in range(1,13):
  perch.append(df[df['Month']==i]['Percentage_Change'].mean())
df1=pd.DataFrame(perch,columns=['perch'])
sns.barplot(x=df['Month'],y=df1['perch'])
plt.show()

##### 1. Why did you pick the specific chart?

it is best to compare category with the values

##### 2. What is/are the insight(s) found from the chart?

in the month of april the price is 8 % up

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

No it will not lead to any negative growth

#### Chart - 5

In [None]:
# Chart - 5 visualization code
close=[]
for i in range(1,13):
  close.append(df[df['Month']==i].tail(1)['Close'].values)

In [None]:
df1=pd.DataFrame(close,columns=['Close'])

In [None]:
sns.lineplot(x=df['Month'],y=df1['Close'],color='r',marker='o')
plt.show()

##### 1. Why did you pick the specific chart?

It is best suitable for showing trend

##### 2. What is/are the insight(s) found from the chart?

From 5th to the 6th month the price has gone very high

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

the trend in the price is upward if invertor want to buy the stock then they can buy it


#### Chart - 6

In [None]:
high=[]
for i in range(1,13):
  high.append(df[df['Month']==i]['High'].max())
df1=pd.DataFrame(high,columns=['high'])
sns.barplot(x=df['Month'],y=df1['high'])
plt.show()

##### 1. Why did you pick the specific chart?

It is best to comapre category with the value

##### 2. What is/are the insight(s) found from the chart?

The price is making high of 300+ in each month

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes,it will definietly help the investor.At the point of time is someone is buying it and the price is near around below 50 and the high is not yet created.Since they know that it makes the high above 350+ then can easily put their money in it and can get the profit from it

#### Chart - 7

In [None]:
# Chart - 7 visualization code
low=[]
for i in range(1,13):
  low.append(df[df['Month']==i]['Low'].min())
df1=pd.DataFrame(low,columns=['low'])
sns.barplot(x=df['Month'],y=df1['low'])
plt.show()

##### 1. Why did you pick the specific chart?

It is best to compare category with value

##### 2. What is/are the insight(s) found from the chart?

Nearly in each month the price is making the low of 11 if we take the average of all

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes it will definetly help the investor. If the investor want to invest in it and he knows that the price has already toches the low then it will work as indicator to him to buy the stock

#### Chart - 8

In [None]:
# Chart - 8 visualization code
price=[]
for i in range(1,13):
  price.append(round(df[df['Month']==i]['Price_Change'].sum()))
df1=pd.DataFrame(price,columns=['price'])
sns.barplot(x=df['Month'],y=df1['price'])
plt.show()

##### 1. Why did you pick the specific chart?

It is best to compare category with value

##### 2. What is/are the insight(s) found from the chart?

In march it is totally down amd the pricechange in  july was highest

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

It can lead to negative growth as we can see that from 12 month 7 month are the showing that price_change is going downwards. As a investors no one will buy it

#### Chart - 9

In [None]:
# Chart - 9 visualization code
sns.scatterplot(x='Open',y='Close',data=df)
plt.show()

##### 1. Why did you pick the specific chart?

they are best to find out the relationship between two or more category

##### 2. What is/are the insight(s) found from the chart?

there is a linear relationship present in between open and close

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

yes it will help the investors

#### Chart - 10

In [None]:
# Chart - 10 visualization code
sns.boxplot(df['Percentage_Change'])
plt.show()

##### 1. Why did you pick the specific chart?

 Box plots are excellent for visualizing the distribution of your data and identifying outliers across different categories

##### 2. What is/are the insight(s) found from the chart?

There are some ouliers in it.
max percentage change is above 60% and min in below 45%

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

yes it will help the investor

#### Chart - 11

In [None]:
# Chart - 11 visualization code
volatility=[]
for i in df['Daily_Range']:
  if i>25:
    volatility.append('high')
  elif i<25 and i>20:
    volatility.append('mid')
  else:
    volatility.append('low')
df['volatility']=volatility

In [None]:
plt.pie(x=df['volatility'].value_counts().values,labels=df['volatility'].value_counts().index,autopct='%0.1f%%')
plt.show()

##### 1. Why did you pick the specific chart?

they are best suited for proportion of data to whole

##### 2. What is/are the insight(s) found from the chart?

Mostly the volatility of price is low

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

it will help the investors.Since they know that mostly the price is nonvolatile so their losses will also be less

#### Chart - 12

In [None]:
# Chart - 12 visualization code
sns.barplot(x='volatility',y='Price_Change',data=df)
plt.show()

##### 1. Why did you pick the specific chart?

it is best suited to compare category with value

##### 2. What is/are the insight(s) found from the chart?

For high  volatile market the fluctuation in the price is very high where for low it is very low

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

yes it will definitely help the investors.As if they invest in stock in high volatile then they will either have huge profit or will get a huge loss

#### Chart - 13

In [None]:
# Chart - 13 visualization code
sns.scatterplot(x='Open',y='Low',data=df)
plt.show()

##### 1. Why did you pick the specific chart?

It is best for finding relationship between variables

##### 2. What is/are the insight(s) found from the chart?

There exists the linear relationship in open and low

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
from sklearn.preprocessing import OrdinalEncoder
df['volatility']=OrdinalEncoder().fit_transform(df[['volatility']])
sns.heatmap(df.corr(),annot=True)

##### 1. Why did you pick the specific chart?

heatmaps will help uncover hidden relationships and trends in your dataset, making it easier to understand how different features interact with each other and how they relate to stock price movements.

##### 2. What is/are the insight(s) found from the chart?

there exists strong relationship between open,close,high and low

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df)
plt.show()

##### 1. Why did you pick the specific chart?

It can help you  to visualize relationships between multiple features in a single plot

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## ***5. Hypothesis Testing***

### Based on your chart experiments, define three hypothetical statements from the dataset. In the next three questions, perform hypothesis testing to obtain final conclusion about the statements through your code and statistical testing.

Answer Here.

### Hypothetical Statement - 1

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Hypothesis 1: Is there a significant relationship between the Open price and the Close price?


Null Hypothesis (H₀): There is no significant relationship between the Open and Close prices.


Alternative Hypothesis (H₁): There is a significant relationship between the Open and Close prices.

#### 2. Perform an appropriate statistical test.

In [None]:
import scipy.stats as stats
correlation_stat, p_value_open_close = stats.pearsonr(df['Open'], df['Close'])
print(f'Hypothesis 1 - P-value for correlation between Open and Close: {p_value_open_close}')

##### Which statistical test have you done to obtain P-Value?

 Pearson’s correlation test is used to check the strength of the relationship between Open and Close.

##### Why did you choose the specific statistical test?

I used Pearson's correlation test in the first hypothesis (between the Open and Close prices) because it is a standard statistical method to assess the strength and direction of the linear relationship between two continuous variables.

### Hypothetical Statement - 2

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Is the volatility (volatility) different for days with positive versus negative Price_Change?


Null Hypothesis (H₀): The mean volatility is the same for days with positive and negative Price_Change.


Alternative Hypothesis (H₁): The mean volatility is different for days with positive and negative Price_Change.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value
positive_price_change = df[df['Price_Change'] > 0]['volatility']
negative_price_change = df[df['Price_Change'] <= 0]['volatility']
t_stat_volatility, p_value_volatility = stats.ttest_ind(positive_price_change, negative_price_change)
print(f'Hypothesis 2 - P-value for difference in volatility based on Price_Change sign: {p_value_volatility}')

##### Which statistical test have you done to obtain P-Value?

 A t-test for independent samples compares the volatility between days with positive and negative Price_Change values.

##### Why did you choose the specific statistical test?

 I used a t-test to compare the volatility between days with positive versus negative Price_Change. The goal of this test is to determine whether the mean volatility differs significantly between these two groups.

### Hypothetical Statement - 3

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Does the Daily_Range impact the Percentage_Change significantly?
Null Hypothesis (H₀): There is no significant impact of the Daily Range on the Percentage Change.
Alternative Hypothesis (H₁): There is a significant impact of the Daily Range on the Percentage Change.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value
slope, intercept, r_value, p_value_daily_range, std_err = stats.linregress(df['Daily_Range'], df['Percentage_Change'])
print(f'Hypothesis 3 - P-value for impact of Daily_Range on Percentage_Change: {p_value_daily_range}')

##### Which statistical test have you done to obtain P-Value?

Linear regression is used to assess the relationship between Daily_Range and Percentage_Change. The p-value tells us if Daily_Range significantly impacts Percentage_Change

In [None]:
df.head()

## ***6. Feature Engineering & Data Pre-processing***

### 1. Handling Missing Values

In [None]:
df.isnull().sum()

#### What all missing value imputation techniques have you used and why did you use those techniques?

There is not any missing values present in it

### 2. Handling Outliers

In [None]:
df['Day']=df['Day'].astype(int)

In [None]:
# Handling Outliers & Outlier treatments
for i in df.columns:
  per25=df[i].quantile(0.25)
  per75=df[i].quantile(0.75)
  iqr=per75-per25
  upperlimit=per75+(1.5*iqr)
  lowerlimit=per25-(1.5*iqr)
  df[i]=np.where(df[i]>upperlimit,upperlimit,df[i])
  df[i]=np.where(df[i]<lowerlimit,lowerlimit,df[i])


In [None]:
for i in df.columns:
  sns.boxplot(df[i])
  plt.show()

##### What all outlier treatment techniques have you used and why did you use those techniques?

I have used boxplot technique as it is best suitable for continuous data

### 3. Categorical Encoding

In [None]:
# Encode your categorical columns
df

#### What all categorical encoding techniques have you used & why did you use those techniques?

There is not any need of categorical encoding

### 4. Textual Data Preprocessing
(It's mandatory for textual dataset i.e., NLP, Sentiment Analysis, Text Clustering etc.)

#### 1. Expand Contraction

In [None]:
# Expand Contraction

#### 2. Lower Casing

In [None]:
# Lower Casing

#### 3. Removing Punctuations

In [None]:
# Remove Punctuations

#### 4. Removing URLs & Removing words and digits contain digits.

In [None]:
# Remove URLs & Remove words and digits contain digits

#### 5. Removing Stopwords & Removing White spaces

In [None]:
# Remove Stopwords

In [None]:
# Remove White spaces

#### 6. Rephrase Text

In [None]:
# Rephrase Text

#### 7. Tokenization

In [None]:
# Tokenization

#### 8. Text Normalization

In [None]:
# Normalizing Text (i.e., Stemming, Lemmatization etc.)

##### Which text normalization technique have you used and why?

Answer Here.

#### 9. Part of speech tagging

In [None]:
# POS Taging

#### 10. Text Vectorization

In [None]:
# Vectorizing Text

##### Which text vectorization technique have you used and why?

Answer Here.

### 4. Feature Manipulation & Selection

#### 1. Feature Manipulation

In [None]:
# Manipulate Features to minimize feature correlation and create new features
df

#### 2. Feature Selection

In [None]:
# Select your features wisely to avoid overfitting
x=df.loc[:,['Open','High','Low','volatility','Month','Day']]
y=df.loc[:,['Close']]


In [None]:
y

In [None]:
y

##### What all feature selection methods have you used  and why?

Answer Here.

##### Which all features you found important and why?

Open,High,low,month,day,volatility and price are the import features

### 5. Data Transformation

#### Do you think that your data needs to be transformed? If yes, which transformation have you used. Explain Why?

I have used standard scaler method because for the regression problem this method is mostly used

### 7. Dimesionality Reduction

##### Do you think that dimensionality reduction is needed? Explain Why?

No there is not any need dimentionality reduction as we already had limited features

### 8. Data Splitting

In [None]:
# Split your data to train and test. Choose Splitting ratio wisely.
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(x,y,test_size=0.2,random_state=42)

##### What data splitting ratio have you used and why?

I have used 80-20 split as only limited data is available

### 9. Handling Imbalanced Dataset

##### Do you think the dataset is imbalanced? Explain Why.

No

## ***7. ML Model Implementation***

### ML Model - 1

In [None]:
# ML Model - 1 Implementation
from sklearn.linear_model import LinearRegression
model1=LinearRegression()
model1.fit(xtrain,ytrain)
ypred=model.predict(xtest)


#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart
from sklearn.metrics import r2_score,mean_squared_error
print('r2_score',r2_score(ytest,ypred))
print('mse',mean_squared_error(ytest,ypred))

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import Ridge, Lasso
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import numpy as np

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('model', Ridge())
])
param_grid = [
    {'model': [Ridge()], 'model__alpha': np.logspace(-4, 4, 10)},
    {'model': [Lasso()], 'model__alpha': np.logspace(-4, 4, 10)}
]

grid_search = GridSearchCV(pipeline, param_grid, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)

grid_search.fit(x, y)
print("Best Parameters:", grid_search.best_params_)


##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### ML Model - 2

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart
from sklearn.ensemble import RandomForestRegressor
model2=RandomForestRegressor(max_depth= 10,min_samples_leaf= 1, min_samples_split= 2,n_estimators= 200)
model2.fit(xtrain,ytrain)
ypred=model.predict(xtest)

In [None]:
from sklearn.metrics import r2_score,mean_squared_error
print('r2_score',r2_score(ytest,ypred))
print('mse',mean_squared_error(ytest,ypred))

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV

rf = RandomForestRegressor(random_state=42)

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'max_features': ['auto', 'sqrt', 'log2'],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5,
                           scoring='neg_mean_squared_error', n_jobs=-1)

grid_search.fit(x, y)

print("Best Parameters:", grid_search.best_params_)


##### Which hyperparameter optimization technique have you used and why?

GridSearchCV is essential for optimizing a machine learning model's performance by systematically searching through a predefined set of hyperparameter


##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

yes

### ML Model - 3

In [None]:
# ML Model - 3 Implementation
from sklearn.svm import SVR
model3=SVR()
model3.fit(xtrain,ytrain)
ypred=model.predict(xtest)
# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart
from sklearn.metrics import r2_score,mean_squared_error
print('r2_score',r2_score(ytest,ypred))
print('mse',mean_squared_error(ytest,ypred))

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('svr', SVR())
])

param_grid = {
    'svr__C': [0.1, 1, 10, 100],
    'svr__epsilon': [0.01, 0.1, 0.2, 0.5],
    'svr__kernel': ['linear', 'rbf', 'poly']
}

grid_search = GridSearchCV(pipeline, param_grid, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)

grid_search.fit(xtrain, ytrain)

print("Best Parameters:", grid_search.best_params_)
print("Best Cross-Validation Score:", -grid_search.best_score_)

ypred = grid_search.best_estimator_.predict(xtest)
test_mse = mean_squared_error(ytest, ypred)
print("Test MSE:", test_mse)

##### Which hyperparameter optimization technique have you used and why?

GridSearchCv

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

yes

### 1. Which Evaluation metrics did you consider for a positive business impact and why?

R2score

### 2. Which ML model did you choose from the above created models as your final prediction model and why?

I will choose Random Forest

## ***8.*** ***Future Work (Optional)***

### 1. Save the best performing ml model in a pickle file or joblib file format for deployment process.


In [None]:
# Save the File
pd.to_pickle(model2,'model.pkl')

In [None]:
model2.predict([[13.68,17.16,13.58,1.0,1.0,6.0]])

### ***Congrats! Your model is successfully created and ready for deployment on a live server for a real user interaction !!!***

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***