## **Model Objective:** <br>
> **To implement a machine leaning model that can accurately predict the Adjusted closing price of walmart store stock.**

## **Importing Necessary Libraries** 

In [None]:
#Importing libraries
import warnings
warnings.filterwarnings('ignore')

from datetime import datetime

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='darkgrid')

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


## **Retrieving Data** <br>
> **Dataset [Here](https://www.kaggle.com/datasets/amandam1/walmart-stock-20122016)**

In [None]:
#Importing dataset
data = pd.read_csv('/kaggle/input/walmart-stock-20122016/WMT.csv')
data.head()

## **Data Preparation**

In [None]:
#first peek into the dataset
data.describe()

In [None]:
data.isna().any()

In [None]:
data.shape

**Observation:**
* The dataset is very well maintained and doesn't contain any outliers or missing values or object variables (Except datetime).

In [None]:
#feature engineerig
data['Date'] = pd.to_datetime(data['Date'])
data['year'] = data['Date'].dt.year
data['month'] = data['Date'].dt.month
data['day'] = data['Date'].dt.day
    
data['weekday'] = data['Date'].dt.weekday +1
data = data.drop(['Date'], axis=1)
data.head()

**Observation:**
* Now the dataset is absolutely ready to be used for model building and testing as all the feature engineering required is now completed.

## **Data Exploration** 

In [None]:
#data visualization
fig = plt.figure(figsize=(100,5))
sns.barplot(x=data['year'], y=data['Adj Close'], ci=True)

**Observation:**
* The target variable shows significant increase in stock prices as the time progresses. This could be because of the increasing popularity of the store among the middle urban population.

In [None]:
sns.barplot(x=data['weekday'], y=data['Adj Close'], ci=True)

**Observation:**
* The target variable shows no trend towards the weekdays of the week. This shows that the target variable is totally independant of the 'weekday' variable.

In [None]:
fig = plt.figure(figsize=(8,8))
sns.heatmap(data.corr().round(2), annot=True)

**Observation:**
* The heatmap clearly shows that our target variable is totally independent of 'month, 'day', 'weekday' variables (the ones in black). So let's drop them. 

In [None]:
data = data.drop(['month', 'day', 'weekday'], axis=1)

In [None]:
fig = plt.figure(figsize=(8,8))
sns.heatmap(data.corr().round(2), annot=True)

**Observation:**
* The heatmap looks better now.
* The target variable appears to have very low correlation with 'Volume' variable. Let's explore this further.

In [None]:
sns.scatterplot(x=data['Adj Close'], y=data['Volume'])

**Observation:**
* As suspeted, there is no significant trend in number of shares bought to the adjusted closing of stock.

In [None]:
fig = plt.figure(figsize=(25,5))
sns.barplot(x=data['year'], y=data['Volume'], ci=True)

**Observation:**
* 'Volume' variable does show some trend with respect to time, but this ain't enough for it to be taken under consideration.

In [None]:
#some more graphs
sns.pairplot(data=data)

## **Data Modelling** 

In [None]:
#feature selection
X = data[['Open', 'High', 'Low', 'Close', 'year']]
y = data['Adj Close']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

**Model Building**

In [None]:
#model building
lr = LinearRegression()
lr.fit(X_train, y_train)
y_preds = lr.predict(X_test)

**Model Validation and Evaluation**

In [None]:
#model performance
print("Mean absolute error:",(mean_absolute_error(y_test, y_preds)).round(2))
print("Mean squared error:",mean_squared_error(y_test, y_preds).round(2))
print("Coefficient of determination/Accuracy:",(r2_score(y_test, y_preds)*100).round(2),'%')

**Model Diagnostics and Comparision**

In [None]:
#output file creation
Output = pd.DataFrame({
    'year' : X_test['year'],
    'Adj Close' : y_preds
})

In [None]:
#output and actual value comparision
fig = plt.figure(figsize=(10,10))
plt.scatter(x=X_test.index, y=y_test)
plt.scatter(x=X_test.index, y=Output['Adj Close'])

In [None]:
#line of regression
fig = plt.figure(figsize=(10,10))
sns.regplot(x=y_test, y=y_preds)

In [None]:
print("Coefficient of Linear Regression model:", lr.coef_)
print("Intercept of Linear Regression model:", lr.intercept_)

## **Conclusion:** 
> **Hence a model is developed using multi dimensional linear regression that can predict the adjusted stock price of walmart store with almost negligible error and accuracy of above 95%.**