# IMPORTING LIBRARIES


In [44]:
# Importing essential libraries for data analysis, machine learning and visualization
import pandas as pd
import sklearn
import seaborn as sb 
import matplotlib.pyplot as plt
import numpy

# Loading the dataset 

In [None]:

# loading the dataset from its directory
dataset = pd.read_csv('dataset')
dataset

# Exploratory Data Analysis (EDA)

In [None]:
 # Display the first few rows of the DataFrame to quickly inspect its data and structure
dataset.head()


In [None]:
# Provide a concise summary of the DataFrame, including its structure, data types, and memory usage
dataset.info()


In [None]:
# Generate descriptive statistics for the DataFrame's numeric columns, including count, mean, std, min, max, and quartiles
numerical_summary = dataset.describe().transpose()

palette = sb.color_palette("viridis", as_cmap=True)

numerical_summary.style.background_gradient(cmap=palette)


# DATA CLEANING

In [None]:
#cheaking for null values
dataset.isnull().sum()

In [None]:
#checking for duplicated values
dataset.duplicated().sum()

# Statistical analysis

In [None]:
print(dataset['Yearly Amount Spent'].mean(),
dataset['Yearly Amount Spent'].median(),
dataset['Yearly Amount Spent'].mode())


In [None]:
print('Annual revenue:',dataset['Yearly Amount Spent'].sum())
print('Total time on App: ',dataset['Time on App'].sum())

# Visualization


In [None]:
numeric_dataset = dataset.select_dtypes(include='number')

corr = numeric_dataset.corr()

mask_ut = numpy.triu(numpy.ones(corr.shape)).astype(numpy.bool_)

plt.figure(figsize=(10, 6))
sb.heatmap(corr, annot=True, fmt=".2f", cmap="icefire", mask=mask_ut)
plt.title("Correlation Matrix")
plt.tight_layout()
plt.show()

# Correlations

In [None]:
#exploratory data analysis
# Do we make more money,if a customer spends more time on the website? 

sb.jointplot(x="Time on Website",y="Yearly Amount Spent", data=dataset, alpha=0.5)
plt.show()


From the above diagram, we can see that there is no correlation between customers time on the website and the money they spend.

In [None]:
# Do we make more money,if a customer spends more time on the app? 
sb.jointplot(x="Time on App",y="Yearly Amount Spent", data=dataset, alpha=0.5)
plt.show()

From the above diagram, we can see that there is correlation between customers time on the app and the money they spend.

In [None]:
#lets compare other variables to see what gives us more money by finding correlations 
sb.pairplot(dataset,kind='scatter',plot_kws={'alpha':0.5})
plt.show()

we have seen that there is a correlations between the time on the app and the amaunt of money spent.
The longer the membership the more money spent.

In [None]:
# checking the correlation between the membership and the money spent
sb.lmplot(data=dataset,x="Length of Membership", y="Yearly Amount Spent",scatter_kws={'alpha':0.5})
plt.show()

BUILDING A LINEAR REGRESSION MODEL FOR THE E COMMERCE TO PREDICT THE MONEY WE CAN EARN IN A PERIOD OF TIME IF WE INVEST IN GETTING MORE CUSTOMERS.
THIS HELPS IN MAKING INVESTMENT DECISIONS

# BUILDING THE MODEL (LINEAR REGRESSION)

In [13]:
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.linear_model import Lasso
from sklearn import metrics

In [14]:
# training data
x = dataset[['Avg. Session Length','Time on App','Time on Website','Length of Membership']]
y = dataset['Yearly Amount Spent']

In [15]:
#spliting the data for training
x_train,x_test, y_train, y_test = train_test_split(x, y , test_size=0.3, random_state=42)

# Training the model


In [16]:
regression = LinearRegression()

In [None]:
#fitting the data into the model
regression.fit(x_train,y_train)

In [None]:
coefficient = pd.DataFrame(regression.coef_,x.columns,columns=['coef'])
print(coefficient)

In [None]:
# predictions
predictions = regression.predict(x_test)
predictions

In [None]:
## is the money we have predicted similar to what we have made so far?
sb.scatterplot(x=predictions,y=y_test)
plt.xlabel("predictions")
plt.title('evaluation of prediction')
plt.show()

## Error evaluation of model


In [None]:
# using metrics to measure the error 
print("mean squared error:" , mean_squared_error(y_test,predictions))
print("mean squared error:" , mean_absolute_error(y_test,predictions))
# this means that we are 8 dollars error at every prediction

## INSIGHTS AND TRENDS

The following insights can be gained from the analysis of the data:
1.  High Average Spending: The mean "Yearly Amount Spent" is $499.31, indicating that customers are spending a significant amount on average. This suggests a potentially high-value customer base.
2. Consistent Session Length: The average session length of 33.05 minutes is relatively stable, with a small standard deviation. This suggests that customers are engaging with the platform for a reasonable duration on average.
3. Time on App vs. Website: The average time spent on the app (12.05 minutes) is significantly less than the time spent on the website (37.06 minutes). This could indicate that the website might be more feature-rich or the app might have a more streamlined, task-oriented design.
4. Membership Length: The average membership length of 3.53 years is fairly long, suggesting strong customer loyalty. The relatively low standard deviation indicates that there's a good deal of consistency in how long customers stay members.


# Trends and Observations:

1. Potential Correlation: The 50th percentile (median) for "Yearly Amount Spent" ($498.89) is very close to the mean ($499.31). This indicates a fairly symmetrical distribution of spending, with fewer extreme outliers on either side of the average.
2. Time Spent on App: While the average time spent on the app is lower than the website, the maximum time spent on the app (15.13 minutes) is relatively close to the maximum time spent on the website (40.01 minutes). This might indicate that some users are engaging deeply with the app, perhaps for specific tasks or features.
3. Membership Duration: The maximum membership length of 6.92 years is considerably higher than the average, indicating that some customers have been loyal members for a very long time. 


# Highlights
1. There should be more focus on App Engagement: The relatively lower average time spent on the app could be an area for improvement. Consider exploring ways to enhance app features, provide more personalized content, or offer incentives for increased app usage.
2. Customer Retention Strategies: The long average membership length suggests that existing customer retention is already good. However, consider strategies to further incentivize long-term engagement and loyalty, such as tiered rewards programs or exclusive offers for high-value customers.
3. Value-Oriented Marketing: The high average spending could be a sign that your customer base is willing to invest in your products or services. You might be able to target them with premium offers or more personalized marketing messages.




