# Linear Regression Project - Ecommerce clothing store
**Akshaykumar Mashalkar**

This is a project for [Machine Learning Bootcamp](https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp) course on Udemy. All data being used are generated(fake).

An Ecommerce company based in New York City that sells clothing online is at a fork in the road. Although, it is an online clothing store, they also have in-store style and clothing advice sessions. Customers come in to the store, have sessions/meetings with a personal stylist, then they can go home and order either on a mobile app or website for the clothes they want.


The company is trying to decide whether to focus their efforts on their mobile app experience or their website. We'll explore the data and apply linear regression technique to provide the company with the insights to better their decision process

## Imports

Let's make the necessary imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

## Read the data


In [None]:
cust_data = pd.read_csv('../input/new-york-ecommerce-customers/Ecommerce Customers')

The csv has Customer info, such as Email, Address, and their color Avatar. It also has numerical value columns:

* **Avg. Session Length**: Average session of in-store style advice sessions.
* **Time on App**: Average time spent on App in minutes
* **Time on Website**: Average time spent on Website in minutes
* **Length of Membership**: How many years the customer has been a member.

Let's explore the data a little bit.

In [None]:
cust_data.head()

In [None]:
cust_data.describe()

In [None]:
cust_data.info()

In [None]:
cust_data.isnull().sum()
#the dataset has no null values. 

In [None]:
plt.rcParams["patch.force_edgecolor"] = True
sns.set_style('whitegrid')

In [None]:
sns.jointplot(x='Time on Website',y='Yearly Amount Spent',data=cust_data,kind='hex')

In [None]:
sns.jointplot(x='Time on App',y='Yearly Amount Spent',data=cust_data,kind='hex')

By looking at both the graphs above, we see that the time spent on the mobile app is far lesser than time spent on the website for the same amount of money spent. We can double check this relationship by plotting a heatmap of the correlations of data columns.

In [None]:
sns.heatmap(cust_data.corr(),cmap = 'GnBu', annot=True)

Between the App and the website, the correlation with the yearly amount spent is much higher in the case of App at 0.5 as compared to the website which is -0.0026. Another finding is the Length of membership has the highest correlation at 0.81. 

In [None]:
sns.lmplot(x='Length of Membership',y='Yearly Amount Spent',data=cust_data)
plt.xlim(0,cust_data['Length of Membership'].max()+1)

In [None]:
cust_data.columns

## Splitting the data

We split the dataset into train and test data.

In [None]:
X = cust_data[[ 'Avg. Session Length', 'Time on App', 'Time on Website', 'Length of Membership']]
y = cust_data['Yearly Amount Spent']

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=101)

## Training the model 

In [None]:
from sklearn.linear_model import LinearRegression
linear_model = LinearRegression()
linear_model.fit(X_train,y_train)

## Model predictions

Let's test the model against the true y values

In [None]:
predictions = linear_model.predict(X_test)

In [None]:
plt.scatter(y_test,predictions,edgecolors='r')
plt.xlabel('Y test')
plt.ylabel('Predicted Y values')

## Conclusion

Let's evaluate the co-efficients

In [None]:
cust_coef_df = pd.DataFrame(linear_model.coef_,index=X.columns,columns=['Coefficient'])

In [None]:
cust_coef_df

Interpreting the coefficients:

* Holding all other features fixed, a 1 unit increase in Avg. Session Length is associated with an increase of **25.98** total dollars spent.
* Holding all other features fixed, a 1 unit increase in Time on App is associated with an increase of **38.59** total dollars spent.
* Holding all other features fixed, a 1 unit increase in Time on Website is associated with an increase of **0.19** total dollars spent.
* Holding all other features fixed, a 1 unit increase in Length of Membership is associated with an increase of **61.27** total dollars spent.

> Answer : By the data we see that each minute spent on the App would result in more dollars spent by the customer, the company has a choice, it can either focus entirely on the app or improve the website as it is performing poorly. In addition to the above, the company can also introduce measures to lengthen the existing customers' memberships.

#### **Thanks for reading!** 

#### **Cheers!**

Akshaykumar Mashalkar