### Importing Libraries and Loading Data

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Data Information and Summary Statistics

In [None]:
customers = pd.read_csv('Ecommerce Customers')

In [None]:
customers.head()

In [None]:
customers.info()

In [None]:
customers.describe()

### Visualizing Relationships Between Features and Target


In [None]:
sns.jointplot(x='Time on Website', y='Yearly Amount Spent', data=customers, alpha=0.5)

In [None]:
sns.jointplot(x='Time on App', y='Yearly Amount Spent', data=customers, alpha=0.5)

In [None]:
sns.pairplot(customers, 
             kind='scatter', 
             plot_kws={'alpha':0.4}, 
             diag_kws={'alpha':0.55, 'bins':40})

### Length of Membership vs Yearly Amount Spent

In [None]:
sns.lmplot(x='Length of Membership', 
           y='Yearly Amount Spent', 
           data=customers,
           scatter_kws={'alpha':0.3})

# Splitting the data
X are the predictores, and y is the output. What we want to do is create a model that will take in the values in the X variable and predict y with a linear regression algorithm. We will use the SciKit Learn library to create the model.

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
customers.info()

In [None]:
X = customers[['Avg. Session Length', 'Time on App', 'Time on Website', 'Length of Membership']]
y = customers['Yearly Amount Spent']

In [None]:
X.head()
y.head()

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3, random_state=42)


## Training the Model with multivariable regression using Scikit Learn

In this section, we create the model and feed the training data to it. This model will tell us which input has the biggest impact in the output (yearly expenditure). As the plots suggested, we find that the most important coefficient is that of the "Length of Membership" predictor, followed by the 'Time on App' and the 'Avg. Session Length'. The time on website does not seem to be an important factor to the amount a customer spends per year.

In [None]:
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm


In [None]:
lm = LinearRegression()


In [None]:
lm.fit(X_train, y_train)


In [None]:
# the coefficients
lm.coef_

In [None]:
# r squared
lm.score(X, y)

In [None]:
# The coefficients in a dataframe
cdf = pd.DataFrame(lm.coef_,X.columns,columns=['Coef'])
print(cdf)

### Training the model with multivariable regression using OLS

Allows us to get more details about the moel

In [None]:
X = sm.add_constant(X_train)
model = sm.OLS(y_train, X)
model_fit = model.fit()
print(model_fit.summary())

### Predicting Test Data

Now that the model is trained, we should be able to use it to make our predictions and evaluate our model. The scatter plot below plots the actual y values to the model's predictions. The model seems to behave accurately.

In [None]:
predictions = lm.predict(X_test)

In [None]:
sns.scatterplot(x=y_test, y=predictions)
plt.ylabel('Predictions')
plt.title('Yearly Amount Spent vs. Model Predictions')

### Evaluation of the model

In [None]:
from sklearn.metrics import mean_squared_error, mean_absolute_error
import math

In [None]:
print('Mean Absolute Error:',mean_absolute_error(y_test, predictions))
print('Mean Squared Error:',mean_squared_error(y_test, predictions))
print('Root Mean Squared Error:',math.sqrt(mean_squared_error(y_test, predictions)))

### Residuals

Distribution plot of the residuals of the model's predictions. They should be normally distributed.

In [None]:
residuals = y_test-predictions
sns.distplot(residuals, bins=30)

In [None]:
import pylab 
import scipy.stats as stats

stats.probplot(residuals, dist="norm", plot=pylab)
pylab.show()

### Conclusion

This analysis reveals that the length of membership is the most influential factor in predicting the yearly amount spent by customers. While intuitively, time spent on an app or website might seem important, the model indicates that time on the mobile app has a significantly stronger impact than time spent on the desktop website. In fact, the time spent on the desktop site shows little to no correlation with customer spending.

There are two potential interpretations of these findings. One is that the desktop website may require improvements to better engage customers and convert visits into sales. Alternatively, it may reflect a broader trend where customers are more influenced by mobile app experiences than desktop websites. This suggests that marketing and development efforts could be more effectively focused on enhancing the mobile app experience to drive revenue growth.

While the data provides valuable insights, interpreting these results benefits from domain expertise in online marketing strategies. Nonetheless, the model robustly quantifies the relative importance of predictors, offering a solid foundation for strategic decisions.