# **Introduction**

====================================================================================================================================

This program is designed to analyze data and solve the defined problem statement. It involves data exploration, data cleaning, and data analysis. During data exploration, the columns, data types, and content of the data are examined. In data cleaning, relevant columns are selected, data formats are adjusted to the appropriate types, duplicates are removed, and missing values are handled. The data is then analyzed using data visualization, descriptive statistical calculations, and inferential statistics. Conclusions and recommendations are drawn from the analysis results. After the analysis, the data visualizations and statistical insights are also incorporated into a Tableau dashboard. This program using kaggle dataset. The problem and exploration results were based on my intuition when examining the raw data. The topic is "Stagnan Revenue".


Dataset:
https://www.kaggle.com/datasets/kartikeybartwal/ecommerce-product-recommendation-collaborative/data

Tableau:
https://public.tableau.com/views/DataVisualization_17249250434480/Dashboard1?:language=en-GB&publish=yes&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link

====================================================================================================================================

## Problem Statement

**Specific:** The store's revenue has stagnated and needs to be increased.

**Measurable:** The store's GMV from sales increases by 30%.

**Achievable:** GMV revenue can be increased by raising the average spending per customer on the website.

**Relevant:** Increasing the average customer spending can boost the GMV and directly impact the store's revenue growth.

**Timebound:** The target is to increase GMV by 30% within a 1-month timeframe.

**Problem Statement:** Increase the store's revenue by raising GMV by 30% over the next month. One way to achieve this is by increasing the average customer spending.

## Problem Breakdown

Problem breakdown to be analyzed:

1. What is the proportion of user-selected product categories in relation to total revenue?
2. Which location generates the most revenue?
3. What is the gender proportion in relation to total revenue?
4. What is the proportion of user age categories based on total revenue?
5. How many users subscribe to the newsletter and how many do not?
6. Is there a correlation between user income and the average order value spent by users?
7. Is there a correlation between total revenue and the average order value spent by users?
8. Is there a correlation between user interests and product categories?
9. Is there a correlation between age categories and product categories?
10. Is there a difference in shopping frequency between users who subscribe to the newsletter and those who do not?
11. Is there a difference in the last login duration between users who subscribe to the newsletter and those who do not?
12. Is there a difference in the average spending between users who subscribe to the newsletter and those who do not?

# **Data Loading**

In [2]:
# for data manipulation
import pandas as pd

# for statistic analysis
from scipy import stats

# for data visualization
import plotly.express as px

# read csv
data = pd.read_csv('user_personalized_features.csv')

In [3]:
# show rows and columns count in dataset
data.shape

(1000, 15)

From the function above, it is found that the dataset contains 1,000 rows and 15 columns.

In [4]:
# show dataset summary information
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 15 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   Unnamed: 0                   1000 non-null   int64 
 1   User_ID                      1000 non-null   object
 2   Age                          1000 non-null   int64 
 3   Gender                       1000 non-null   object
 4   Location                     1000 non-null   object
 5   Income                       1000 non-null   int64 
 6   Interests                    1000 non-null   object
 7   Last_Login_Days_Ago          1000 non-null   int64 
 8   Purchase_Frequency           1000 non-null   int64 
 9   Average_Order_Value          1000 non-null   int64 
 10  Total_Spending               1000 non-null   int64 
 11  Product_Category_Preference  1000 non-null   object
 12  Time_Spent_on_Site_Minutes   1000 non-null   int64 
 13  Pages_Viewed                 1000 

In [5]:
# check duplicated data
data.duplicated().sum()

0

In [6]:
# check missing value
data.isna().sum()

Unnamed: 0                     0
User_ID                        0
Age                            0
Gender                         0
Location                       0
Income                         0
Interests                      0
Last_Login_Days_Ago            0
Purchase_Frequency             0
Average_Order_Value            0
Total_Spending                 0
Product_Category_Preference    0
Time_Spent_on_Site_Minutes     0
Pages_Viewed                   0
Newsletter_Subscription        0
dtype: int64

**Insights:**

1. The 'Unnamed: 0' column does not contain important information.
2. The data types for the Income, Average_Order_Value, and Total_Spending columns, which contain price/nominal data, are in integer format while the User_ID column is of object type.
3. The dataset contains 1,000 rows of data.
4. There are 15 columns with no duplicates (matching the unique value of column names).
5. No missing values were found based on the Non-Null Count column.

**Recommendations:**

1. The 'Unnamed: 0' column will be removed (solving insight 1).
2. Change the data types for the Income, Average_Order_Value, and Total_Spending columns to float, and change the User_ID column type to integer (solving insight 2).
3. Check for any duplicate data during the data cleaning process (solving insight 3).

In [4]:
# show dataset columns name
data.columns

Index(['Unnamed: 0', 'User_ID', 'Age', 'Gender', 'Location', 'Income',
       'Interests', 'Last_Login_Days_Ago', 'Purchase_Frequency',
       'Average_Order_Value', 'Total_Spending', 'Product_Category_Preference',
       'Time_Spent_on_Site_Minutes', 'Pages_Viewed',
       'Newsletter_Subscription'],
      dtype='object')

There are a total of 15 columns containing data to be analyzed. The 'Unnamed: 0' column is likely an index column that was converted when the file was saved as CSV. The explanations for the other columns are as follows:


| Column Name                 |                                         Description                                                         |
|:---------------------------:|:-----------------------------------------------------------------------------------------------------------:|
| User_ID                     | Unique identifier for each user, facilitating tracking and analysis.                                        |
| Age                         | Age of the user, providing insights into demographic preferences.                                           |
| Gender                      | Gender of the user, enabling gender-specific recommendations and targeting.                                 |
| Location                    | User's location categorized into suburban, rural, or urban areas, influencing preferences & shopping habits.|
| Income                      | User's income level indicating purchasing power and affordability.                                          |
| Interest                    | User's interests such as sports, fashion, technology, etc., guiding content and product recommendations.    |
| Last_Login_Days_Ago         | Number of days since the user's last login, reflecting engagement frequency.                                |
| Purchase_Frequency          | Frequency of purchases made by the user, indicating shopping habits and loyalty.                            |
| Average_Order_Value         | Average value of orders placed by the user, crucial for pricing and promotion strategies.                   |
| Total_Spending              | Total amount spent by the user, indicating lifetime value and purchasing behavior.                          |
| Product_Category_Preference | User's preference for specific product categories, informing personalized recommendations.                  |
| Time_Spent_on_Site_Minutes  | Duration of time spent by the user on the ecommerce platform, indicating engagement level.                  |
| Pages Viewed                | Number of pages viewed by the user during their visit, reflecting browsing activity and interest.           |
| Newsletter_Subscription     |  Indicates whether the user is subscribed to newsletters, enabling targeted marketing campaigns.            |

In [5]:
# show 5 top rows from dataset
data.head()

Unnamed: 0.1,Unnamed: 0,User_ID,Age,Gender,Location,Income,Interests,Last_Login_Days_Ago,Purchase_Frequency,Average_Order_Value,Total_Spending,Product_Category_Preference,Time_Spent_on_Site_Minutes,Pages_Viewed,Newsletter_Subscription
0,0,#1,56,Male,Suburban,38037,Sports,5,7,18,2546,Books,584,38,True
1,1,#2,46,Female,Rural,103986,Technology,15,7,118,320,Electronics,432,40,False
2,2,#3,32,Female,Suburban,101942,Sports,28,1,146,3766,Apparel,306,1,True
3,3,#4,60,Female,Suburban,71612,Fashion,18,3,163,4377,Apparel,527,29,False
4,4,#5,25,Male,Suburban,49725,Travel,2,5,141,4502,Health & Beauty,53,10,True


From the output above, the following information was obtained:

1. Demographic information about customers from the Age, Gender, Location, and Income columns.
2. Information regarding customer preferences and product choices from the Interest and Product_Category_Preference columns.
3. Customer income information from the Income column.
4. Information on customer buying behavior from the Purchase_Frequency, Average_Order_Value, and Total_Spending columns.
5. Engagement metrics from the Last_Login_Days_Ago, Time_Spent_on_Site_Minutes, and Pages_Viewed columns.
6. Information regarding the offering channel from the Newsletter_Subscription column.
7. The User_ID column contains the character '#' in each row, which will need to be removed to facilitate the analysis process if necessary.

In [6]:
# show 5 bottom rows from dataset
data.tail()

Unnamed: 0.1,Unnamed: 0,User_ID,Age,Gender,Location,Income,Interests,Last_Login_Days_Ago,Purchase_Frequency,Average_Order_Value,Total_Spending,Product_Category_Preference,Time_Spent_on_Site_Minutes,Pages_Viewed,Newsletter_Subscription
995,995,#996,22,Male,Urban,104162,Technology,18,7,83,607,Apparel,542,21,False
996,996,#997,40,Male,Urban,99003,Travel,16,3,180,431,Apparel,359,4,True
997,997,#998,27,Female,Urban,72395,Technology,8,8,130,650,Books,331,34,False
998,998,#999,61,Male,Rural,59758,Travel,1,3,152,1041,Electronics,427,28,True
999,999,#1000,19,Male,Rural,74312,Sports,12,7,175,827,Health & Beauty,497,38,True


From the bottom rows of the data, it can be observed that the rows are well populated. However, a check will be conducted during the data cleaning process to see if there are any duplicate or missing values in the table.

# **Data Cleaning**

In [7]:
# check if any data duplicate from dataset
data.duplicated().sum()

0

From the output above, there's no duplicate data in this dataset.

In [8]:
# check if any missing value from dataset
data.isnull().sum()

Unnamed: 0                     0
User_ID                        0
Age                            0
Gender                         0
Location                       0
Income                         0
Interests                      0
Last_Login_Days_Ago            0
Purchase_Frequency             0
Average_Order_Value            0
Total_Spending                 0
Product_Category_Preference    0
Time_Spent_on_Site_Minutes     0
Pages_Viewed                   0
Newsletter_Subscription        0
dtype: int64

From the output above, there's no missing values in this dataset.

The 'Unnamed: 0' column was dropped because it does not contain important information.

In [9]:
# delete 'Unnamed: 0' column
c_data = data.drop(['Unnamed: 0'], axis=1)

To facilitate the analysis of age data, user ages will be grouped into 5 age categories: ages 18-24, 25-34, 35-44, 45-54, and 55-64. After that, a new column named Age_Category will be created to contain the age categories of the users.

In [10]:
Age_Category = []

for age in c_data['Age']:
    if 18 <= age <= 24:
        Age_Category.append('18-24 Tahun')
    elif 25 <= age <= 34:
        Age_Category.append('25-34 Tahun')
    elif 35 <= age <= 44:
        Age_Category.append('35-44 Tahun')
    elif 45 <= age <= 54:
        Age_Category.append('45-54 Tahun')
    elif 55 <= age <= 64:
        Age_Category.append('55-64 Tahun')
    else:
        Age_Category.append('Diluar range')

c_data['Age_Category'] = Age_Category

The data types of the 'Income', 'Average_Order_Value', and 'Total_Spending' columns have been changed to float because they represent monetary values.

In [11]:
# changing 'Income' column data type
c_data['Income'] = c_data['Income'].astype(float)

In [12]:
# changing 'Average_Order_Value' column data type
c_data['Average_Order_Value'] = c_data['Average_Order_Value'].astype(float)

In [13]:
# changing 'Total_Spending' column data type
c_data['Total_Spending'] = c_data['Total_Spending'].astype(float)

## Export DataFrame to CSV File for Use in Tableau

In [14]:
# eksport data to csv file
c_data.to_csv('cleaned_data.csv', index=False)

# **Visualization and Statistic Analysis**

## What is the proportion of user-selected product categories in relation to total revenue?

To see the proportion of product categories against total revenue, we first group the product categories and calculate the revenue generated by each product, and then display it using a bar chart. The construction is as follows:

In [15]:
# Summing revenue from each product category
cat_bar = c_data.groupby('Product_Category_Preference')['Total_Spending'].sum().reset_index()
cat_bar = cat_bar.sort_values('Total_Spending', ascending=False)

# setup the bar chart visualization
fig = px.bar(cat_bar, x='Product_Category_Preference', y='Total_Spending', 
             color='Product_Category_Preference',
             color_discrete_sequence=['green', 'blue', 'purple', 'brown', 'orange'],
             title='Total Revenue per Product Category')

# Changing the x and y axis titles and some layout
fig.update_layout(
    xaxis_title='Product Category',
    yaxis_title='Total Revenue (in $)',
    title_x=0.5)

# Displaying the chart
fig.show()


From the output obtained, the Apparel category generates the highest revenue, while the Books category generates the lowest revenue. Additionally, the Home & Kitchen and Electronics product categories generate nearly equivalent revenue, with a difference of $14,000.

## Which location generates the most revenue?

To see the proportion of user locations against total revenue, we first group the user locations and calculate the revenue generated by each location, and then display it using a pie chart. The construction is as follows:

In [16]:
# Summing revenue from each location 
loc_pie = c_data.groupby('Location')['Total_Spending'].sum().reset_index()

# setup the pie chart visualization 
fig = px.pie(loc_pie, values='Total_Spending', names='Location', title='Proportion of User Locations Generating the Most Revenue')

# Changing the layout title 
fig.update_layout(title_x=0.5)

# Displaying the chart
fig.show()


Users from the Suburban and Urban locations generate the highest revenue, accounting for 35.2%. In contrast, users located in Rural areas generate revenue of 29.6%.

## What is the gender proportion in relation to total revenue?

To see the proportion of gender locations against total revenue, we first group the user gender data and calculate the revenue generated by each gender, and then display it using a pie chart. The construction is as follows:

In [17]:
# Summing revenue from each gender 
gen_barh = c_data.groupby('Gender')['Total_Spending'].sum().reset_index()
gen_barh = gen_barh.sort_values('Total_Spending', ascending=False)

# setup the horizontal bar chart visualization
fig = px.bar(gen_barh, x='Total_Spending', y='Gender',
             color='Gender',
             color_discrete_sequence=['blue', 'magenta'],
             title='Proportion of User Gender from Total Spending',
             orientation='h')

# Changing the x-axis title and some layout
fig.update_layout(
    xaxis_title='Total Revenue (in $)')
fig.update_layout(title_x=0.5)

# Displaying the chart
fig.show()

Male users generate the highest revenue compared to female users, but the difference is not significant, at 5.23%.

## What is the proportion of user age categories based on total revenue?

To see the proportion of user age categories against total revenue, we first group the age categories and calculate the revenue generated by each age category, and then display it using a pie chart. The construction is as follows:

In [18]:
# Summing the revenue from each age category
age_pie = c_data.groupby('Age_Category')['Total_Spending'].sum().reset_index()

# setup bar plot visualization
fig = px.pie(age_pie, values='Total_Spending', names='Age_Category', title='Proportion of User Age Categories from Total Revenue')

# Changing the title layout 
fig.update_layout(title_x=0.5)

# Displaying the chart
fig.show()

Insight:
- Users aged 18-24 generate the lowest revenue, likely because they are still in education.
- Users aged 25-34 and 55-64 generate nearly the same revenue, suggesting they may be in a financially unstable phase or no longer active in the workforce.
- Users aged 35-44 and 45-54 contribute the most to total revenue, indicating that they are generally more financially stable.

## How many users subscribe to the newsletter and how many do not?

In [19]:
# Summing the number of users from each newsletter subscription status
nl_bar = c_data.groupby('Newsletter_Subscription')['User_ID'].count().reset_index()
nl_bar = nl_bar.sort_values('User_ID', ascending=False)

# setup a bar plot visualization
fig = px.bar(nl_bar, x='User_ID', y='Newsletter_Subscription',
             color='Newsletter_Subscription',
             color_discrete_sequence=['green', 'red'],
             title='Proportion of User Newsletter Subscription Status')

# Changing the x and y axis titles
fig.update_layout(
    xaxis_title='Number of Users',
    yaxis_title='Newsletter Subscription Status')

# Displaying the chart
fig.show()


The number of users who subscribe to the newsletter is 2.83% higher than the number of users who do not subscribe.

This indicates a positive engagement with the newsletter, suggesting that users may find value in the content provided or incentives offered through the subscription.

## Is there a correlation between user income and the average order value spent by users?

The correlation test between user income (continuous numeric) and average order value (continuous numeric) is conducted using the Spearman correlation test. The formulation of the hypotheses is as follows:

H0: There is no correlation between user income and average order value.

H1: There is a correlation between user income and average order value.

The calculations would proceed as follows:

In [20]:
# Using the Spearman correlation test formula
corr_rho, pval_s = stats.spearmanr(c_data['Income'], c_data['Average_Order_Value'])

# Displaying the results of the Spearman correlation test
print(f"rho-correlation: {corr_rho:.2f}")

rho-correlation: -0.01


The result of the Spearman correlation test is -0.01. This negative value is close to zero, indicating that there is no correlation between user income and average order value.

## Is there a correlation between total revenue and the average order value spent by users?

The correlation test between total revenue (continuous numeric) and average order value (continuous numeric) is conducted using the Spearman correlation test. The formulation of the hypotheses is as follows:

H0: There is no correlation between total revenue and average order value.

H1: There is a correlation between total revenue and average order value.

The calculations would proceed as follows:

In [21]:
# Using the Spearman correlation test formula
corr_rho, pval_s = stats.spearmanr(c_data['Total_Spending'], c_data['Average_Order_Value'])

# Displaying the results of the Spearman correlation test
print(f"rho-correlation: {corr_rho:.2f}")

rho-correlation: 0.10


The result of the Spearman correlation test is 0.10. This positive value is close to zero, indicating that there is a correlation between total revenue and average order value, but it is very weak.

## Is there a correlation between user interests and product categories?

The correlation test between interest (categorical-nominal) and average order value (categorical-nominal) is conducted using the chi-squared correlation test. The formulation of the hypotheses is as follows:

H0: There is no correlation between interest and product category.

H1: There is a correlation between interest and product category.

The calculations would proceed as follows:

In [22]:
# Using the chi-square correlation test formula
contingency_tableip = pd.crosstab(c_data['Interests'], c_data['Product_Category_Preference'])
res = stats.chi2_contingency(contingency_tableip)

# Displaying the results of the chi-square correlation test
print("P-value:", res.pvalue)

P-value: 0.4567048696797793


The result of the chi-squared correlation test shows a p-value of 0.456. This value is greater than the critical value of 0.05, leading to the acceptance of H0. This indicates that there is no significant correlation between the two variables.

## Is there a correlation between age categories and product categories?

The correlation test between age category (categorical-nominal) and product category (categorical-nominal) is conducted using the chi-squared correlation test. The formulation of the hypotheses is as follows:

H0: There is no correlation between age category and product category.

H1: There is a correlation between age category and product category.

The calculations would proceed as follows:

In [23]:
# Using the chi-square correlation test formula
contingency_table = pd.crosstab(c_data['Age_Category'], c_data['Product_Category_Preference'])
res = stats.chi2_contingency(contingency_table)

# Displaying the results of the chi-square correlation test
print("P-value:", res.pvalue)

P-value: 0.92533447739625


The result of the chi-squared correlation test shows a p-value of 0.925. This value is greater than the critical value of 0.05, leading to the acceptance of H0. This indicates that there is no significant correlation between the two variables.

## Is there a difference in shopping frequency between users who subscribe to the newsletter and those who do not?

To test whether there is a difference in shopping frequency between users who subscribe to the newsletter and those who do not, the calculation will use a two-sample test since the sample data being tested is different/independent. The formulation of the hypotheses is as follows:

H0: The shopping frequency of users who subscribe to the newsletter is equal to the shopping frequency of users who do not subscribe.

H1: The shopping frequency of users who subscribe to the newsletter is not equal to the shopping frequency of users who do not subscribe.

The calculations would proceed as follows:

In [24]:
# Combining shopping frequency data from users who subscribe to the newsletter and those who do not, stored in separate variables
subFreq = c_data['Purchase_Frequency'][c_data['Newsletter_Subscription'] == True]
nonsubFreq = c_data['Purchase_Frequency'][c_data['Newsletter_Subscription'] == False]

# Using the two-sample test formula
t_stat, p_val = stats.ttest_ind(subFreq, nonsubFreq)

# Displaying the results of the t-test and p-value
print('T-Statistic:', t_stat)
print('P-value:', p_val)

T-Statistic: 0.06868370337595951
P-value: 0.9452551487961752


From the hypothesis test conducted, the result indicates that H0 can be accepted because the p-value is greater than 0.05. The T-test result, which is close to zero, suggests that the distribution of the data leans positively and is not far from the mean value. Therefore, it can be concluded that the shopping frequency of users who subscribe to the newsletter is equal to the shopping frequency of users who do not subscribe.

## Is there a difference in the last login duration between users who subscribe to the newsletter and those who do not?

To test whether there is a difference in the last login duration between users who subscribe to the newsletter and those who do not, the calculation will use a two-sample test since the sample data being tested is different/independent. The formulation of the hypotheses is as follows:

H0: The last login duration of users who subscribe to the newsletter is equal to the last login duration of users who do not subscribe.

H1: The last login duration of users who subscribe to the newsletter is not equal to the last login duration of users who do not subscribe.

The calculations would proceed as follows:

In [25]:
# Combining last login duration data from users who subscribe to the newsletter and those who do not, stored in separate variables
subLalda = c_data['Last_Login_Days_Ago'][c_data['Newsletter_Subscription'] == True]
nonsubLalda = c_data['Last_Login_Days_Ago'][c_data['Newsletter_Subscription'] == False]

# Using the two-sample test formula
t_stat, p_val = stats.ttest_ind(subLalda, nonsubLalda)

# Displaying the results of the t-test and p-value
print('T-Statistic:', t_stat)
print('P-value:', p_val)

T-Statistic: -1.257614910727311
P-value: 0.20882524967296492


From the hypothesis test conducted, the result indicates that H0 can be accepted because the p-value is greater than 0.05. Additionally, the T-test result of -1.26 suggests that the data distribution leans slightly negative but is not far from the mean value. Therefore, it can be concluded that the last login duration of users who subscribe to the newsletter is equal to the last login duration of users who do not subscribe.

## Is there a difference in the average spending between users who subscribe to the newsletter and those who do not?

To test whether there is a difference in the average spending between users who subscribe to the newsletter and those who do not, the calculation will use a two-sample test since the sample data being tested is different/independent. The formulation of the hypotheses is as follows:

H0: The average spending of users who subscribe to the newsletter is equal to the average spending of users who do not subscribe.

H1: The average spending of users who subscribe to the newsletter is not equal to the average spending of users who do not subscribe.

The calculations would proceed as follows:

In [26]:
# Combining average spending data from users who subscribe to the newsletter and those who do not, stored in separate variables
subAOV = c_data['Average_Order_Value'][c_data['Newsletter_Subscription'] == True]
nonsubAOV = c_data['Average_Order_Value'][c_data['Newsletter_Subscription'] == False]

# Using the two-sample test formula
t_stat, p_val = stats.ttest_ind(subAOV, nonsubAOV)

# Displaying the results of the t-test and p-value
print('T-Statistic:', t_stat)
print('P-Value:', p_val)

T-Statistic: 0.09012604663931488
P-Value: 0.9282051352246777


From the hypothesis test conducted, the result indicates that H0 can be accepted because the p-value is greater than 0.05. Additionally, the T-test result of 0.09 suggests that the data distribution leans slightly positive and is not far from the mean value. Therefore, it can be concluded that the average spending of users who subscribe to the newsletter is equal to the average spending of users who do not subscribe.

# **Decision Making**

## Conclusion

From the analysis, the following insights can be derived:

1. The Apparel product category generates the highest revenue, while the Books category generates the lowest. The Home & Kitchen and Electronics categories generate nearly equal revenue.
2. Users from Suburban and Urban locations contribute the highest revenue, while those from Rural locations contribute the least.
3. There is no significant difference in total revenue generated by male and female users.
4. The highest revenue comes from age categories that can be classified as financially stable.
5. There is no significant difference in the number of users who subscribe to the newsletter and those who do not.
6. There is no significant correlation between user average order value and income.
7. There is a weak correlation between user average order value and total revenue.
8. There is no significant correlation between product categories and interests or age categories.
9. The shopping frequency of users who subscribe to the newsletter is the same as that of users who do not subscribe.
10. The last login duration of users who subscribe to the newsletter is the same as that of users who do not subscribe.
11. The average spending of users who subscribe to the newsletter is equal to that of users who do not subscribe.

## **Recommendation**

To increase the average order value or user spending, promotional strategies could include:

1. Bundling promotions from the Apparel category with other products.
2. Offering an additional 15% discount on other product categories with the purchase of Books to boost Books sales.
3. Providing free shipping for users located in Urban areas.
4. Offering a 10% shipping discount for users in Suburban locations.
5. Providing a 30% shipping discount for users in Rural areas.
6. Customizing product recommendations when users log into their accounts based on their interests.
7. Maintaining the existing product catalog, and if possible, adding more unisex products, especially in the Apparel category.
8. Implementing seasonal discount programs, such as an additional 10% discount on Books for back-to-school promotions in September.
9. Creating a membership program with a tiered system that offers benefits at each level.
10. Since the newsletter subscription status does not affect shopping frequency, last login duration, or average order value, we can still execute email marketing campaigns, optimize the website, and announce discounts through those channels.