# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 -** Afsal M

# **Project Summary -**

**Objective:**

To develop a predictive model that forecasts the closing price of Yes Bank stock using historical price data.

**Data Collection:**

Historical Stock Prices: Daily closing prices of Yes Bank for the past several Months.

Methodology:

1) Data Preprocessing:



*    Clean and preprocess the data, handling missing values and outliers.
*   Normalize the data for better model performance.



2) Exploratory Data Analysis (EDA):


*   Identify patterns and seasonal trends.

3) Model Selection:



*   Experiment with various algorithms such as:

  *   Linear Regression
  *   Decision Trees
  *   LSTM (Long Short-Term Memory) networks for time series prediction
*   Evaluate model performance using metrics like Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).


4) Training and Testing:


*   Split the dataset into training and testing sets (e.g., 80/20 split).
*   Train selected models and validate their performance.

5) Prediction:

*   Generate predictions for future closing prices and compare them against actual prices.


**Results:**


*   Present findings on model accuracy and performance.
*   Visualize predictions against historical data to illustrate model effectiveness.



# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The financial market is inherently volatile, making it challenging for investors to make informed decisions. This project aims to address the problem of predicting the closing price of Yes Bank stock, a prominent player in the Indian banking sector.

**Key Aspects of the Problem:**

1.   Volatility and Uncertainty:
  *   Stock prices are influenced by a multitude of factors, including market trends, economic indicators, and investor sentiment. This unpredictability necessitates the development of robust predictive models.

2.   Data Complexity:

  *   The interplay of historical stock prices, trading volumes, and external market conditions creates a complex dataset that requires effective analysis and modeling techniques.

3.   Investment Decision-Making:
  *   Accurate predictions can assist investors in making better trading decisions, optimizing their investment strategies, and managing risks effectively.

4.   Model Accuracy:
  *   The challenge lies in creating a model that not only forecasts future prices but does so with high accuracy, providing reliable insights for traders and analysts.


**Goal:**

To develop a predictive model that utilizes historical stock data and relevant financial indicators to forecast the closing price of Yes Bank stock for a specified future period. The model aims to enhance the understanding of price movements and assist in making informed investment decisions.

By addressing these challenges, the project seeks to contribute to the broader field of financial analytics and support investors in navigating the complexities of stock market investments.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.




















# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
pip install pandas mplfinance yfinance

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import mplfinance as mpf
%matplotlib inline
from sklearn.linear_model import Lasso, Ridge
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score, mean_squared_error,mean_absolute_error,mean_absolute_percentage_error
import math
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import VotingRegressor
from xgboost import XGBRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Dataset Loading

In [None]:
# Load Dataset
# Import csv file and put the data in data variable name
df = pd.read_csv('/content/drive/MyDrive/Data science/data_YesBank_StockPrices.csv', encoding = 'unicode_escape')

### Dataset First View

In [None]:
df.head()

In [None]:
df.tail()

In [None]:
df.sample(5)

### Dataset Rows & Columns count

In [None]:
df.shape

### Dataset Information

In [None]:
df.info()

#### Duplicate Values

In [None]:
duplicates = df.duplicated()
print(duplicates)

#### Missing Values/Null Values

In [None]:
df.isnull().sum()

In [None]:
# Visualizing the missing values
#There is no missing values.

### What did you know about your dataset?

This dataset is a collection of datas of yesbank stock price. The dataset consist of 5 columns they are
*   Date
*   Open
*   High
*   Low
*   Close

By looking the shape of the dataset we can understand that there are 185 rows of data and 5 columns in the dataset. There are no duplicate values or any missing values in the dataset so there is no need to do any operation for missing datas.

Here the datatype of date is in object and all the 4 is in float.







## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

Variables Description



1.   Date
  *   Type: Object
  *   Description: The date corresponding to each stock price record.

2.   Open
  *   Type: Float
  *   Description: The price of Yes Bank stock at the beginning of the trading day.
3.   High
  *   Type: Object
  *   Description: The highest price reached by Yes Bank stock during the trading day.


4.   Low
  *   Type: Float
  *   Description: The lowest price reached by Yes Bank stock during the trading day.
5.   Close
  *   Type: Float
  *   Description: The final price at which Yes Bank stock was traded on a given day.



### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

# Convert the 'Date' to datetime
df['Date'] = pd.to_datetime(df['Date'], format='%b-%y')

# Check the updated data types
print("\nUpdated Data Types:")
print(df.dtypes)

# Display the DataFrame
print("\nDataFrame:")
print(df)

### What all manipulations have you done and insights you found?

There is no null values in the data so there is no need of data wrangling in this.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

plt.plot(df['Date'],df['Close'], label='Closing Price', color='blue', linewidth=2)

plt.title(f'Stock Price Movement of yes bank Over Time', fontsize=16)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Price', fontsize=12)
plt.xticks(rotation=45)
plt.legend()
plt.grid()


##### 1. Why did you pick the specific chart?

The Purpose of this chart is to Show stock price movements over time.
Plot the closing prices for Yes bank stock over days to identify trends.

##### 2. What is/are the insight(s) found from the chart?

From this chart we get the daily closing price movements and identify the trend.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

We can understand how the stock price perform in the past days.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
import pandas as pd
import plotly.graph_objects as go

fig = go.Figure(data=[go.Candlestick(x=df.Date,
                                       open=df['Open'],
                                       high=df['High'],
                                       low=df['Low'],
                                       close=df['Close'])])

fig.update_layout(title=f'Yes Bank Candlestick Chart',
                  xaxis_title='Date',
                  yaxis_title='Price',
                  xaxis_rangeslider_visible=False)

fig.show()

##### 1. Why did you pick the specific chart?

The candlestick chart is very helpfull in analysing the stock price as it shows open, high, low and close so i selected this chart.

##### 2. What is/are the insight(s) found from the chart?

Got the previous trend of the stock movement.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

# Plot the closing prices and moving averages
plt.figure(figsize=(12, 6))
plt.plot(df['Close'], label='Closing Price', color='blue', linewidth=2)
plt.plot(df['Close'].rolling(window=10).mean(), label='10-Day MA', color='orange', linestyle='--', linewidth=2)
plt.plot(df['Close'].rolling(window=50).mean(), label='50-Day MA', color='red', linestyle='--', linewidth=2)

plt.title(f'Stock Price Movement of Yes Bank with Moving Average', fontsize=16)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Price', fontsize=12)
plt.xticks(rotation=45)
plt.legend()
plt.grid()

# Show the plot
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Adding moving averages to a chart helps smooth out price data, allowing for better trend identification and reducing noise from daily fluctuations.

##### 2. What is/are the insight(s) found from the chart?

By analyzing these insights, investors and traders can make more informed decisions, tailoring their strategies based on historical price movements, market behavior, and potential future trends. The combination of different chart types and indicators enriches the analysis, leading to a deeper understanding of market dynamics.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Sure with this chart we can understand the trend of the stock.

#### Chart - 4

In [None]:
# Chart - 4 visualization code

plt.figure(figsize=(12, 6))
plt.scatter(df['Date'], df['Close'], alpha=0.5, color='blue')

plt.title(f'Scatter Plot of Closing Price vs. Date', fontsize=16)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Closing Price', fontsize=12)

plt.grid()

# Show the plot
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Scatter plots effectively display the relationship between two variables, such as closing price and trading volume. This helps identify correlations, trends, or patterns.

They allow for easy identification of outliers—data points that deviate significantly from the overall trend. Outliers can indicate unusual trading behavior or market events that warrant further investigation.

##### 2. What is/are the insight(s) found from the chart?

If the scatter plot shows a trend where higher volumes correspond to higher closing prices, it may suggest a positive correlation. This could imply that increased trading activity often leads to price increases.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

plt.figure(figsize=(10, 6))
plt.boxplot(df['Close'], vert=False)

# Adding titles and labels
plt.title(f'Box Plot of Closing Prices for Yes Bank', fontsize=16)
plt.xlabel('Price', fontsize=12)

# Show the plot
plt.grid()
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Box plots provide a clear visual representation of the median, quartiles, and range of stock prices. This helps quickly convey essential statistical information about the data. They easily identify outliers, which can be crucial in stock analysis. Outliers might indicate unusual market activity, sudden price changes, or events that require further investigation.

##### 2. What is/are the insight(s) found from the chart?

Form this we can understand that it has outlier in 350 and most common rage is between 40 to 150.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This help us to know the normal price range of the stock.

#### Chart - 6

In [None]:
# Chart - 6 visualization code

sns.violinplot(x='Date', y='Close', data=df)
plt.title('Violin Plot of Closing Prices by Month')
plt.show()

##### 1. Why did you pick the specific chart?

To Visualize the distribution of stock prices with density estimation.

##### 2. What is/are the insight(s) found from the chart?

Got the price movement chart.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

correlation_matrix = df.corr()

# Create a heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm', square=True, cbar_kws={'shrink': .8})

# Adding titles and labels
plt.title('Correlation Heatmap of Stock Prices', fontsize=16)
plt.xticks(rotation=45)
plt.yticks(rotation=0)

# Show the plot
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Heatmaps provide a visually effective way to analyze complex relationships in stock data. By employing them in stock market prediction, analysts can gain valuable insights that inform trading strategies and investment decisions. They are especially useful for quickly identifying correlations and trends across multiple stocks.

##### 2. What is/are the insight(s) found from the chart?

Strong Correlations: High positive values indicate strong correlations, suggesting that stocks move together. This is useful for portfolio management.
Diversification Opportunities: Stocks with low or negative correlations can be good candidates for diversification, reducing overall portfolio risk.
Performance Trends: Identifying patterns in performance across different time frames can help in making informed trading decisions.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

sns.pairplot(df[['Open', 'High', 'Low', 'Close']])
plt.title('Pair Plot of Stock Price')
plt.show()

##### 1. Why did you pick the specific chart?

It helps identify correlations between pairs of features. Strong linear relationships can indicate potential predictors for modeling.
You can spot non-linear patterns that might not be evident from correlation coefficients alone.

##### 2. What is/are the insight(s) found from the chart?

The values of the datats in the dataset is mostly lenear data and related.

## ***5. Hypothesis Testing***

### Based on your chart experiments, define three hypothetical statements from the dataset. In the next three questions, perform hypothesis testing to obtain final conclusion about the statements through your code and statistical testing.

Answer Here.

### Hypothetical Statement - 1

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

### Hypothetical Statement - 2

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

### Hypothetical Statement - 3

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

## ***6. Feature Engineering & Data Pre-processing***

### 1. Handling Missing Values

In [None]:
# Handling Missing Values & Missing Value Imputation

#### What all missing value imputation techniques have you used and why did you use those techniques?

Ther was no missing values in the dataset so no techniques needs to be used.

### 2. Handling Outliers

In [None]:
# Handling Outliers & Outlier treatments

##### What all outlier treatment techniques have you used and why did you use those techniques?

Answer Here.

### 3. Categorical Encoding

In [None]:
# Encode your categorical columns

#### What all categorical encoding techniques have you used & why did you use those techniques?

Answer Here.

### 4. Textual Data Preprocessing
(It's mandatory for textual dataset i.e., NLP, Sentiment Analysis, Text Clustering etc.)

#### 1. Expand Contraction

In [None]:
# Expand Contraction

#### 2. Lower Casing

In [None]:
# Lower Casing

#### 3. Removing Punctuations

In [None]:
# Remove Punctuations

#### 4. Removing URLs & Removing words and digits contain digits.

In [None]:
# Remove URLs & Remove words and digits contain digits

#### 5. Removing Stopwords & Removing White spaces

In [None]:
# Remove Stopwords

In [None]:
# Remove White spaces

#### 6. Rephrase Text

In [None]:
# Rephrase Text

#### 7. Tokenization

In [None]:
# Tokenization

#### 8. Text Normalization

In [None]:
# Normalizing Text (i.e., Stemming, Lemmatization etc.)

##### Which text normalization technique have you used and why?

Answer Here.

#### 9. Part of speech tagging

In [None]:
# POS Taging

#### 10. Text Vectorization

In [None]:
# Vectorizing Text

##### Which text vectorization technique have you used and why?

Answer Here.

### 4. Feature Manipulation & Selection

#### 1. Feature Manipulation

In [None]:
# Manipulate Features to minimize feature correlation and create new features

features = df[['Open', 'High', 'Low']]
target = df['Close']

#### 2. Feature Selection

In [None]:
# Select your features wisely to avoid overfitting

print(features.shape)
print(target.shape)

##### What all feature selection methods have you used  and why?

Here making the close price as the target and others are taken as features.

##### Which all features you found important and why?

In stock price prediction i think the close price is more important thats why i have taken the close price as the target.

### 5. Data Transformation

#### Do you think that your data needs to be transformed? If yes, which transformation have you used. Explain Why?

In [None]:
# Transform Your data

### 6. Data Scaling

In [None]:
# Scaling your data

##### Which method have you used to scale you data and why?

### 7. Dimesionality Reduction

##### Do you think that dimensionality reduction is needed? Explain Why?

Answer Here.

In [None]:
# DImensionality Reduction (If needed)

##### Which dimensionality reduction technique have you used and why? (If dimensionality reduction done on dataset.)

Answer Here.

### 8. Data Splitting

In [None]:
# Split your data to train and test. Choose Splitting ratio wisely.

X_train, X_test, Y_train, Y_test = train_test_split(features, target, test_size=0.2, random_state=42)

print("Training set shape:", X_train.shape, Y_train.shape)
print("Testing set shape:", X_test.shape, Y_test.shape)

##### What data splitting ratio have you used and why?

I used a common data splitting ratio of 80% for training and 20% for testing.
By using 80% of the data for training, the model has access to a substantial amount of information, which helps it learn the underlying patterns in the dataset effectively. More training data generally leads to better model performance and generalization.

### 9. Handling Imbalanced Dataset

##### Do you think the dataset is imbalanced? Explain Why.

Answer Here.

In [None]:
# Handling Imbalanced Dataset (If needed)

##### What technique did you use to handle the imbalance dataset and why? (If needed to be balanced)

Answer Here.

## ***7. ML Model Implementation***

### ML Model - 1

In [None]:
# ML Model - 1 Implementation

# Appending all models parameters to the corrosponding list
mean_absolut_error = []
mean_sq_error=[]
root_mean_sq_error=[]
training_score =[]
r2_list=[]
adj_r2_list=[]
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score


def score_metrix (model,X_train,X_test,Y_train,Y_test):

  '''
    train the model and gives mae, mse,rmse,r2,adj r2 score of the model

  '''
  #training the model
  model.fit(X_train,Y_train)

  # Training Score
  training  = model.score(X_train,Y_train)
  print("Training score  =", training)

  try:
      # finding the best parameters of the model if any
    print(f"The best parameters found out to be :{model.best_params_} \nwhere model best score is:  {model.best_score_} \n")
  except:
    pass


  #predicting the Test set and evaluting the models

  if model == LinearRegression() or model == Lasso() or model == Ridge():
    Y_pred = model.predict(X_test)

    #finding mean_absolute_error
    MAE  = mean_absolute_error(Y_test**2,Y_pred**2)
    print("MAE :" , MAE)

    #finding mean_squared_error
    MSE  = mean_squared_error(Y_test**2,Y_pred**2)
    print("MSE :" , MSE)

    #finding root mean squared error
    RMSE = np.sqrt(MSE)
    print("RMSE :" ,RMSE)

    #finding the r2 score

    r2 = r2_score(Y_test**2,Y_pred**2)
    print("R2 :" ,r2)
    #finding the adjusted r2 score
    adj_r2=1-(1-r2_score(Y_test**2,Y_pred**2))*((X_test.shape[0]-1)/(X_test.shape[0]-X_test.shape[1]-1))
    print("Adjusted R2 : ",adj_r2,'\n')

  else:
    # for tree base models
    Y_pred = model.predict(X_test)

    #finding mean_absolute_error
    MAE  = mean_absolute_error(Y_test,Y_pred)
    print("MAE :" , MAE)

    #finding mean_squared_error
    MSE  = mean_squared_error(Y_test,Y_pred)
    print("MSE :" , MSE)

    #finding root mean squared error
    RMSE = np.sqrt(MSE)
    print("RMSE :" ,RMSE)

    #finding the r2 score

    r2 = r2_score(Y_test,Y_pred)
    print("R2 :" ,r2)
    #finding the adjusted r2 score
    adj_r2=1-(1-r2_score(Y_test,Y_pred))*((X_test.shape[0]-1)/(X_test.shape[0]-X_test.shape[1]-1))
    print("Adjusted R2 : ",adj_r2,'\n')


  # Here we appending the parameters for all models
  mean_absolut_error.append(MAE)
  mean_sq_error.append(MSE)
  root_mean_sq_error.append(RMSE)
  training_score.append(training)
  r2_list.append(r2)
  adj_r2_list.append(adj_r2)

  print('*'*80)
  # print the cofficient and intercept of which model have these parameters and else we just pass them
  try :
    print("coefficient \n",model.coef_)
    print('\n')
    print("Intercept  = " ,model.intercept_)
  except:
    pass
  print('\n')
  print('*'*20, 'ploting the graph of Actual and predicted only with 80 observation', '*'*20)

  # ploting the graph of Actual and predicted only with 80 observation for better visualisation which model have these parameters and else we just pass them
  try:
    # ploting the line graph of actual and predicted values
    plt.figure(figsize=(15,7))
    plt.plot((Y_pred)[:80])
    plt.plot((np.array(Y_test)[:80]))
    plt.legend(["Predicted","Actual"])
    plt.show()
  except:
    pass

# Fit the Algorithm



#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart
score_metrix(LinearRegression(),X_train,X_test,Y_train,Y_test)

#### 2. Cross- Validation & Hyperparameter Tuning

**Lasso with hypertuning**

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Fit the Algorithm

# Predict on the model


# Example data
X = np.random.rand(100, 10)  # 100 samples, 10 features
y = X @ np.random.rand(10) + np.random.normal(0, 0.1, 100)

# Split the data
X_train, X_test, Y_train, Y_test = train_test_split(X_train, Y_train, test_size=0.2, random_state=42)

# Define parameter grid
param_grid = {
    'fit_intercept': [True, False],
}

# Initialize GridSearchCV
grid_search = GridSearchCV(LinearRegression(), param_grid, cv=3)

# Fit the model
grid_search.fit(X_train, Y_train)

# Best parameters and score
print(f'Best Parameters: {grid_search.best_params_}')
print(f'Best Cross-Validation Score: {grid_search.best_score_}')

# Evaluate on test data
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
mse = mean_squared_error(Y_test, y_pred)
print(f'Mean Squared Error on Test Data: {mse}')




##### Which hyperparameter optimization technique have you used and why?

I have used GridSearchCV hyperparameter optimization technique here because It evaluates all combinations of the specified hyperparameters, ensuring a comprehensive search.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

When the gridsearchcv is used the Mean Squared Error on Test Data changed from 86.64379126513735 to 26.236850970698992

### ML Model - 2

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

param_grid = {"n_estimators":[50,100,150],
              'max_depth' : [10,15,20,25,'none'],
              'min_samples_split': [10,50,100],
              'max_features' :[24,35,40,49]}

# Using Grid SearchCV
Ranom_forest_Grid_search = GridSearchCV(RandomForestRegressor(),param_grid=param_grid,n_jobs=-1,verbose=2)
score_metrix(Ranom_forest_Grid_search,X_train,X_test,Y_train,Y_test)

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

I have used GridSearch CV in this model for hyperparameter tuning.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

There is no improvement in random forest compared to lenear reggression.

#### 3. Explain each evaluation metric's indication towards business and the business impact pf the ML model used.

Answer Here.

### ML Model - 3

In [None]:
# ML Model - 3 Implementation

# Fit the Algorithm

# Predict on the model

knn = KNeighborsRegressor()
score_metrix(knn,X_train,X_test,Y_train,Y_test)

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart
score_metrix(knn,X_train,X_test,Y_train,Y_test)

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 3 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model






##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### 1. Which Evaluation metrics did you consider for a positive business impact and why?

Selecting evaluation metrics for a linear regression model should be aligned with your business objectives and the specific context of your analysis. Metrics like MAE, MSE, RMSE, and R² provide insights into model performance, while business-specific considerations like the cost of errors and profitability analysis can help quantify the direct impact on business outcomes.

MSE highlights larger errors by squaring the differences, which can be beneficial if your business is sensitive to outliers. For instance, in financial forecasting, a large error could result in significant losses. MSE helps in understanding the variance of prediction errors.

### 2. Which ML model did you choose from the above created models as your final prediction model and why?

I chose Linear Regression as the final prediction model for several compelling reasons:
The accuracy shown in linear regression is very impressive
Linear regression offers straightforward coefficients that indicate the relationship between each feature and the target variable. This makes it easy to communicate findings to stakeholders and understand how inputs affect predictions.


### 3. Explain the model which you have used and the feature importance using any model explainability tool?

Answer Here.

# **Conclusion**

While analysing different ml models like i have used lenear regression, random forest regression and KNeighborsRegressor i have got good accuraccy in lenear regression.

The values got in lenear regression is

Training score  = 0.9961188216222026

MAE : 5.812554509942111

MSE : 86.64379126513735

RMSE : 9.30826467528386

R2 : 0.9904142726548665

Adjusted R2 :  0.989542842896218
*****************************************

**in Random forest regression values are**

where model best score is:  0.9447064969836791

MAE : 11.693349756703894

MSE : 392.7610482283477

RMSE : 19.818199924018018

R2 : 0.9659907688448195

Adjusted R2 :  0.9567155239843157
**********************************************
in KNeighborsRegressor values are

Training score  = 0.99203003784781

MAE : 10.523333333333335

MSE : 238.84380453333335

RMSE : 15.45457228568081

R2 : 0.9793184833501252

Adjusted R2 :  0.9736780697183411

*******************************

so according to these i think the lenear regression is the most suitable one and have good accuraccy and low error.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***