#### __Problem Statement__
The company is tryng to decide whether to focus their efforts on their mobile app experience or their website

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
from plotly.offline import iplot

from sklearn import linear_model 
from sklearn.metrics import mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split, KFold, cross_val_score


import warnings

pd.set_option('future.no_silent_downcasting', True)
pd.options.mode.copy_on_write = "warn"

### __About Dataset__
* `Avatar`: This column represent a Avatar Color chosen by the customer.¶
* `Avg. Session Length`: the average duration of sessions (in Minutes) of Mobile and Website.
* `Time on App`: the total amount of time (in Minutes) that a customer spends using the mobile App application.
* `Time on Website`: the total amount of time (in minutes) that a customer spends on the website.
* `Length of Membership`: the duration of membership or loyalty of each customer (in Months)
* `Yearly Amount Spent`:the total amount of money spent by each customer on the company's products Via an year.

In [None]:
df = pd.read_csv('./Ecommerce Customers.csv')

In [None]:
df.info()

In [None]:
df.sample(5, random_state=5)

__We can see that users spend more time on the website than on the mobile app__

__Hipothesis : We can say that he more time users spend on the website, the more money they spend throughout the year. But we need to figure it out on that Hipothesis__

### __Data Cleaning & Wrangling__

In [None]:
# Describe Categorical Data
df.select_dtypes(include='object').describe()

In [None]:
# Describe Numerical Data
np.round(df.describe().T, 2)

In [None]:
# Clean the columns' name from any spaces
df.columns = df.columns.str.replace(' ', '_').str.replace('.','')

In [None]:
df.rename(columns={
    'Time_on_App':'App_Usage',
    'Time_on_Website' : 'Website_Usage',
    'Length_of_Membership' : 'Membership_Length',
    'Yearly_Amount_Spent' : 'Yearly_Spent'}, inplace=True)

In [None]:
df.head()

### __Correlation Heatmap & Charts__

In [None]:
corr = df.corr(numeric_only=True)

fig = px.imshow(
    corr,
    template='plotly_dark',
    text_auto='0.2f',
    aspect=1,
    color_continuous_scale='orrd',
    title= 'Correlation Between Data'
)

fig.update_traces(
    textfont = {
        'size' : 16,
        'family' : 'consolas'
    }
)

fig.update_layout(
    title = {
        'font' : {
            'size' : 28,
            'family' : '<b>poppins'
        }
    }
)
iplot(fig)

In [None]:
import plotly.graph_objects as go

fig = px.scatter_matrix(
    df,
    dimensions= df.select_dtypes(include='number').columns,
    height=950,
    width=900,
    color='Yearly_Spent',
    opacity= .70,
    title= '<b>Relationships Between Numerical Data',
    template= 'plotly_dark'
)

fig.update_layout(
    title= {
        'font' : {
            'size' : 28,
            'family' : '<b>poppins',
            'color' : 'tomato'
        }
    }
)

iplot(fig)

__There is no correlation between `Yearly_Spent`and `Website_Usage`time__

### __Multiple Linear Regression Equation__

In [None]:
X = df.iloc[:, 3:7]
y = df.iloc[:, 7:8]

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.20, random_state=38)

In [None]:
X_train.head()

### __Build the Model__

In [None]:
model = linear_model.LinearRegression()

In [None]:
kf = KFold(n_splits=25, shuffle=True, random_state=99)
scores = cross_val_score(model, X, y, cv=kf)
print(f'Model Score: {np.mean(scores) * 100:.2f}%')

In [None]:
model.fit(X_train, y_train)

In [None]:
train_score = model.score(X_train, y_train)
print(f'Train Score: {train_score * 100:.2f}%')

### __Predict the Data__

In [None]:
predicts = model.predict(X_test).reshape(-1)

In [None]:
test_score = r2_score(y_test, predicts)
print(f'Test Score: {test_score * 100:.2f}%')

In [None]:
error_ratio = mean_absolute_error(y_test, predicts)
print(f'Error Ratio: {error_ratio:.2f}')

In [None]:
predicted_df = pd.DataFrame({
    'Actual_Yearly_Spent':y_test.values.flatten(),
    'Predicted_Yearly_Spent':predicts})
predicted_df.head()

In [None]:
fig = px.scatter(
    data_frame= predicted_df,
    x= 'Predicted_Yearly_Spent',
    y= 'Actual_Yearly_Spent',
    color= predicted_df['Predicted_Yearly_Spent'] - predicted_df['Actual_Yearly_Spent'],
    opacity= .8,
    title= '<b>Predicted vs. Actual',
    template= 'plotly_dark',
    trendline= 'ols'
)

fig.update_layout(
    title = {
        'font' : {
            'size' : 28,
            'family' : '<b>poppins',
            'color' : 'tomato'
        }
    }
)
iplot(fig)

### __Conclusion__
* Cross Validation Score: 98.25%
* Train Score: 98.46%
* Test Score: 98.27%

In [None]:
theta = model.coef_.flatten()
print(f'Independent Feature\t Coefficient'.expandtabs(25))
print('='*37)
for i in range(X_train.shape[1]):
    print(f'{X_train.columns[i]}\t{theta[i]:.2f}'.expandtabs(25))

### **Conclusion**

* __Cross Validation Score: 98.25%__
* __Train Score: 98.46%__
* __Test Score: 98.27%__

### **Insights based on Output**

1. **High Model Accuracy:**
   - The model used for predicting customer spending is highly accurate. With cross-validation, training, and testing scores all above 98%, we can be confident that the model reliably predicts how much customers are likely to spend annually.

2. **Key Influencers on Customer Spending:**
   - **Average Session Length (Coefficient: 25.45):** This feature has a significant positive impact on customer spending. Longer average session lengths suggest that customers who spend more time on the platform tend to spend more money. This could imply that improving user engagement can directly boost sales.
   - **App Usage (Coefficient: 38.79):** App usage is a crucial factor. Customers who spend more time on the app tend to spend significantly more. This highlights the importance of enhancing the app experience to encourage higher usage and, consequently, higher spending.
   - **Website Usage (Coefficient: 0.22):** Website usage has a minimal impact on spending compared to other features. This suggests that while website presence is necessary, it does not drive customer spending as much as app usage or session length.
   - **Length of Membership (Coefficient: 61.49):** The length of membership is the most influential factor. Longer membership durations lead to higher spending, indicating that customer loyalty programs and retention strategies are highly effective. Encouraging long-term membership can significantly boost revenue.

### **Actionable Recommendations:**

1. **Enhance User Engagement:**
   - Invest in features and content that encourage users to spend more time on the platform, especially focusing on the app. Interactive features, personalized content, and seamless user experience can help increase session length and app usage.

2. **Focus on Mobile App Development:**
   - Given the high impact of app usage on spending, prioritize improvements and innovations in the mobile app. This could include new functionalities, better user interface design, and regular updates to keep the app engaging and user-friendly.

3. **Strengthen Loyalty Programs:**
   - Develop and enhance loyalty programs that reward long-term membership. Offer exclusive benefits, discounts, and rewards to encourage customers to stay longer with the platform, as this has the highest impact on spending.

4. **Monitor and Adapt Website Strategy:**
   - While website usage has a lower impact, it should not be neglected. Ensure the website is user-friendly and complements the app experience. Use the website to drive users to the app where possible, as it is the primary driver of spending.

By understanding these insights and implementing the recommendations, stakeholders can effectively leverage the data to drive customer spending and improve overall business performance.