### App Design Experiment

#### Problem Statement

A hotel chain "Smith Hotels" with properties including full-service hotels and resorts is considering whether to push a newly built app to its customers. \
In this course project, you are tasked with analyzing data from a live test to determine the effectiveness of a new mobile app that has been built. \
Help the organization get a deeper understanding of the impact that the mobile app will have on customers' purchare behaviour.

#### Experiment Design

- Here's an overview of the experiment that was conducted. In this scenario, a before-after design of experiment was followed.
- To determine the app's efficacy, the organization ran a live test with a beta version of app. \
- A subset of customers who had been enrolled in the loyalty program run by the hotel chain for at least one year was selected at random \
and assigned to one of two groups - ___Control and Treatment___, and the experiment was conducted as follows:

#### Control Group

- Number of customers: 2000 (n)
- Before: Spending behaviour of all the customers in the control group was recorded for one-year.
- Treatment: No changes were introduced to the control group.
- After: The control group was tracked for another year, and their spending behaviours was recorded.

#### Treatment Group

- Number of customers: 2000 (n)
- Before: At the beginning of the experiment, the spending behaviour of all the customers in the treatment group was recorded for one year.
- Treatment: A mail was sent to all the customers in the treatment group with a link to download the beta version of the app.
- After: All the customers in the treatment group started using the app, and their spending behaviour was recorded for another year.

#### Objectives

To make an appropirate decision about launching the app, the hotel chain's management needs to find the answers to these questions:

- will the app lead to increased spending on the part of customers?
- How much of an increase in spending do you expect?
- Do you expect the app's effect on spending to vary by customers' characteristics?

#### Data Overview

- Adopt:
  - For people who adopt the app (Adopt = 1), there is a post download period (Post = 1) and a pre-download period (Post = 0).
  - For people who do not adopt the app (Adopt = 0) in both cases (Post = 0 and Post = 1)

<hr style="border:2px solid red;border-radius:50%" />

## Importing necessary Libraries.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pingouin as pg
from IPython.display import HTML

In [2]:
sns.set(style="whitegrid")

<hr style="border:1px dashed yellow;border-radius:50%">

## Importing the Dataset

In [3]:
df = pd.read_csv('data_app.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'data_app.csv'

<hr style="border:1px dashed yellow;border-radius:50%">

### Exploring the dataset.

In [None]:
df.shape

In [None]:
df.columns

In [None]:
df.head()

In [None]:
df.tail()

In [None]:
df.describe().T

In [None]:
df.info()

In [None]:
def unique_values_description(data):
    unique_values = pd.DataFrame({
        'Column': data.columns,
        'Unique_Values': [data[col].unique() for col in data.columns],
        'Num_Unique': [len(data[col].unique()) for col in data.columns]
    }).T

    # Display the result in scrollable HTML
    html_output = unique_values.to_html(classes='table table-striped', index=False)
    html_with_scroll = f"<div style='overflow-x: auto; max-height: 300px;'>{html_output}</div>"

    return HTML(html_with_scroll)

In [None]:
unique_values_description(df[['Adopt', 'Gender', 'Post', 'Nationality', 'Loyalty']])

#### Observations

1. No column has null values. (Shape shows 4000 rows and the Non-Null count also shows 4000 rows)
2. Every Column is an Integer apart from the Gender.
3. Gender, Adopt, Post, Nationality, Loyalty can be considered as Categorical Variables.

<hr style="border:1px dashed yellow;border-radius:50%">

### Refactoring For Categorical Data.

- Adopt, Post, Nationality, Loyalty are the numerical data which can be converted to categorical string for easy description when exploring Categorical data.
- Gender can be mapped to integer for correlation matrix.

#### Data Mappings:

1. Post: Whether an observation is before or after the app’s adoption:    -  1 = After the app’s adoptio
    -   0 = Before the app’s adopti 

2. Adopt: Whether a customer is an adopter of the app    -  1 = App’s adopte
    -   0 = Not an adopter of the a

3. Nationality: Whether the customer is a US resident
    -  1 = A US residen
    -   0 = Not a US resideshipntleppon

4. Loyalty: Customer loyalty membership level
    - 1 = Basic membership
    - 2 = Silver membership
    - 3 = Gold membership
    - 4 = Platinum membership
  
5. Gender: Gender of the customer.
   - 0 = Male
   - 1 = Female

In [None]:
post_mapping = {
    1: "Post",
    0: "Pre"
}

adopt_mapping = {
    1: "Adopter",
    0: "Not Adopter"
}

notionality_mapping = {
    1: "USA",
    0: "Non USA"
}

loyalty_mapping = {
    1: "Basic",
    2: "Silver",
    3: "Gold",
    4: "Platinum"
}

gender_mapping = {
    "Male": 0,
    "Female": 1
}

In [None]:
df['post_cat'] = df['Post'].replace(post_mapping)
df['adopt_cat'] = df['Adopt'].replace(adopt_mapping)
df['nationality_cat'] = df['Nationality'].replace(notionality_mapping)
df['loyalty_cat'] = df['Loyalty'].replace(loyalty_mapping)
df['gender_int'] = df['Gender'].replace(gender_mapping)

In [None]:
df[['post_cat','adopt_cat', 'nationality_cat', 'loyalty_cat', 'gender_int']]


<hr style="border:1px dashed yellow;border-radius:50%">

### Exploring Numerical Data.

In [None]:
def describe_numerical_data(data):
    """
    Helper function for describing numerical data.
    """
    numerical_data = data.select_dtypes(include=[np.number])
    description = numerical_data.describe()
    return description

In [None]:
def display_description_as_html(description):
    html_output = description.to_html()
    html_with_scroll = f"<div style='overflow-x: auto;'>{html_output}</div>"
    display(HTML(html_with_scroll))

In [None]:
numerical_data_description = describe_numerical_data(df[['Age', 'Tenure', 'NumBookings', 'Spending']]).T

In [None]:
display_description_as_html(numerical_data_description)

<hr style="border:1px dashed yellow;border-radius:50%">

### Exploring Categorical Data

In [None]:
def describe_categorical_data(data):
    categorical_data = data.select_dtypes(include=['object'])
    description = categorical_data.describe()
    return description

In [None]:
# Get the description of categorical data
categorical_description = describe_categorical_data(df)

# Display the description in HTML
display_description_as_html(categorical_description.T)

<hr style="border:1px dashed yellow;border-radius:50%">

### Pre and Post download Filters

In [None]:
pre_download_df = df[df['Post'] == 0]
post_download_df = df[df['Post'] == 1]

In [None]:
assert (pre_download_df['Post'] == 0).all() == True
assert (post_download_df['Post'] == 1).all() == True

<hr style="border:2px solid red;border-radius:50%" />

## Analysing the Gender Variable.

In [None]:
display_description_as_html(df[['Gender']].describe().T)

In [None]:
gender_counts = df['Gender'].value_counts()
gender_counts_table = pd.DataFrame(gender_counts).rename(columns={'Gender': 'Count'})
display_description_as_html(gender_counts_table)

In [None]:
plt.figure(figsize=(10, 10))
plt.suptitle("Exploring Gender in Experiment.", fontsize="18")

# Gender wise count plot.
plt.subplot(2, 2, 1)
count_plot = sns.countplot(df,x="Gender")
count_plot.set_title('Overall Customer Frequency by Gender')
count_plot.set_ylabel("Frequency")

# Gender wise count plot for Pre and Post download stages.
plt.subplot(2, 2, 2)
count_plot = sns.countplot(df, x="Gender", hue="post_cat")
count_plot.set_title('Post-Pre Customer Frequency by Gender')
count_plot.set_ylabel("Frequency")

plt.tight_layout()
plt.show()

In [None]:
gender_frequency_df = df[['Gender']].value_counts().reset_index()

In [None]:
gender_frequency_df

<hr style="border:2px solid red;border-radius:50%" />

### Analysing Nationality.

In [None]:
display_description_as_html(df[['nationality_cat']].describe().T)

In [None]:
nationality_counts = df['nationality_cat'].value_counts()
nationality_counts_table = pd.DataFrame(nationality_counts).rename(columns={'nationality_cat': 'Count'})
display(HTML(nationality_counts_table.to_html()))

In [None]:
plt.figure(figsize=(10, 10))
plt.suptitle("Distribution of Nationality", fontsize="18")

# Gender wise count plot.
plt.subplot(2, 2, 1)
count_plot = sns.countplot(df,x="nationality_cat")
count_plot.set_title('Overall Customer Frequency by Nationality')
count_plot.set_ylabel("Frequency")

# Gender wise count plot for Pre and Post download stages.
plt.subplot(2, 2, 2)
count_plot = sns.countplot(df, x="nationality_cat", hue="post_cat")
count_plot.set_title('Post-Pre Customer Frequency by Nationality')
count_plot.set_ylabel("Frequency")

plt.tight_layout()
plt.show()

<hr style="border:2px solid red;border-radius:50%" />

### Analysing Loyalty.

In [None]:
display_description_as_html(df[['loyalty_cat']].describe().T)

In [None]:
loyalty_counts = df['loyalty_cat'].value_counts()
loyalty_counts_table = pd.DataFrame(loyalty_counts).rename(columns={'loyalty_cat': 'Count'}).reset_index()
display(HTML(loyalty_counts_table.to_html()))

In [None]:
ax = sns.barplot(loyalty_counts_table, x="loyalty_cat", y="count")
ax.set_title("Loyalty Distribution")
ax.set_ylabel("Loyalty Category")
plt.show()

In [None]:
count_plot = sns.countplot(df, x="loyalty_cat", hue="post_cat")
count_plot.set_title('Post-Pre Customer Frequency by Loyalty')
count_plot.set_ylabel("Frequency")
count_plot.set_xlabel('Loyalty Category')
plt.show()

<hr style="border:2px solid red;border-radius:50%" />

### Analysing AGE.

In [None]:
df['Age'].describe()

In [None]:
sns.boxplot(df, x="Age")
plt.show()

In [None]:
ax = sns.histplot(df, x="Age", kde=True)
ax.set_title("Age Distribution")
ax.set_ylabel("Frequency")
plt.show()

In [None]:
ax = sns.histplot(df, x="Age", multiple="stack", hue="post_cat", kde=True)
ax.set_title("Age Post-Pre Distribution")
ax.set_ylabel("Frequency")
plt.show()

<hr style="border:2px solid red;border-radius:50%" />

### Analysing Tenure.

In [None]:
ax = sns.histplot(df, x="Tenure", kde=True)
ax.set_title("Tenure Distribution")
ax.set_ylabel("Frequency")
plt.show()

In [None]:
ax = sns.histplot(df, x="Tenure", multiple="stack", hue="post_cat", kde=True)
ax.set_title("Tenure Post-Pre Distribution")
ax.set_ylabel("Frequency")
plt.show()

In [None]:
ax = sns.boxplot(df, x="Tenure")
ax.set_title("Tenure Boxplot")
plt.show()

<hr style="border:2px solid red;border-radius:50%" />

## Purchase Behaviour of Customer.

### Spending Habits of Customer.

In [None]:
ax = sns.boxplot(df, y="Spending", x="Gender")
ax.set_title("Spending by Gender")
plt.show()

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
fig.suptitle("Gender Spendings in Pre-Post download.")

box_plot1 = sns.boxplot(data=pre_download_df, x='Gender', y='Spending', ax=ax[0])
box_plot1.set_title('Spendings by genders (Pre download)', fontsize=10)


box_plot2 = sns.boxplot(data=post_download_df, x='Gender', y='Spending', ax=ax[1])
box_plot2.set_title('Spendings by genders (Post download)', fontsize=10)
box_plot2.set_ylabel("")

plt.show()

In [None]:
ax = sns.histplot(df, x="Spending", hue="loyalty_cat", multiple="stack", kde=True)
ax.set_title("Spendings by Loyalty Membership")
ax.set_ylabel("Frequency")
plt.show()

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
fig.suptitle("Spendings in Pre-Post download.(Loyalty membership)")

box_plot1 = sns.histplot(pre_download_df, x="Spending", hue="loyalty_cat", multiple="stack", kde=True, ax=ax[0])
box_plot1.set_title('Spendings by Loyalty Membership (Pre download)', fontsize=10)
box_plot1.set_ylabel("Frequency")


box_plot2 = sns.histplot(post_download_df, x="Spending", hue="loyalty_cat",multiple="stack", kde=True, ax=ax[1])
box_plot2.set_title('Spendings by Loyalty Membership (Post download)', fontsize=10)
box_plot2.set_ylabel("")

plt.show()

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
fig.suptitle("Spendings in Pre-Post download.(Loyalty membership)")

box_plot1 = sns.histplot(pre_download_df, x="Spending", y="Age", ax=ax[0])
box_plot1.set_title('Spendings by Loyalty Membership (Pre download)', fontsize=10)
box_plot1.set_ylabel("Age")


box_plot2 = sns.histplot(post_download_df, x="Spending", y="Age", color="#e67e35", ax=ax[1])
box_plot2.set_title('Spendings by Loyalty Membership (Post download)', fontsize=10)
box_plot2.set_ylabel("")

plt.show()

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
fig.suptitle("Spendings in Pre-Post download.(By Nationality)")

strip_plot1 = sns.stripplot(pre_download_df, y="Spending", x="nationality_cat", hue="nationality_cat", alpha=0.15, ax=ax[0])
point_plot1 = sns.pointplot(x="nationality_cat", y="Spending", hue='nationality_cat', data=pre_download_df, dodge=True, join=False, ax=ax[0])
strip_plot1.set_title('Spendings by Loyalty Membership (Pre download)', fontsize=10)
strip_plot1.set_xlabel("Nationality")
strip_plot1.set_ylabel("Spending")

strip_plot2 = sns.stripplot(post_download_df, y="Spending", x="nationality_cat", hue="nationality_cat", alpha=0.25, ax=ax[1])
strip_plot2.set_title('Spendings by Loyalty Membership (Post download)', fontsize=10)
strip_plot1.set_xlabel("Nationality")
strip_plot2.set_ylabel("")

plt.show()

<hr style="border:2px solid red;border-radius:50%" />

## Analysing Number of bookings.

In [None]:
jp = sns.jointplot(df, x="Spending", y="NumBookings", hue="post_cat", alpha=0.5)
jp.fig.suptitle("Spendings Vs Number of Bookings")
jp.fig.subplots_adjust(top=0.95)
plt.show()

In [None]:
ax = sns.histplot(df, x="NumBookings", multiple="stack", hue="post_cat", kde=True)
ax.set_title("Number of Bookings (Pre and Post adoption)")
ax.set_ylabel("Frequency")
plt.show()

In [None]:
ax = sns.pointplot(df, y="NumBookings", x="Gender", hue="post_cat")
ax.set_title("Number of Bookings Gender Vise")
ax.set_xlabel("Gender")
plt.show()

In [None]:
ax = sns.pointplot(df, y="NumBookings", x="nationality_cat", hue="post_cat")
ax.set_title("Number of Bookings Nationality wise")
ax.set_xlabel("Nationality")
plt.show()

<hr style="border:2px solid red;border-radius:50%" />

## Analysing Adoption.

In [None]:
correlation_filt_df = df[['Adopt', 'Post', 'gender_int', 'Spending', 'NumBookings', 'Tenure', 'Loyalty', 'Age']]

In [None]:
corr_df = correlation_filt_df.corr().round(2)

In [None]:
sns.heatmap(corr_df, cmap=sns.color_palette("Blues", as_cmap=True), annot=True, fmt=".2f", linewidth=0.5)

plt.show()

In [None]:
ax = sns.barplot(df, x="adopt_cat", y="Spending", hue="post_cat")
ax.set_title("Spendings by Adopters and Non Adopters")
ax.set_xlabel("Adopt Status")
plt.show()

<hr style="border:2px solid red;border-radius:50%" />

## Hypothesis Testing.

Determine where there is a statistically significant difference between the avg spending of men and women (at a 5% significance level)?
- Determine whether there is difference in means.
- Construct a 95% confidence interval for difference in means.

__NOTE: The above test is conducted for the entire dataset.__

- We are taking a categorical (Gender) and numerical variable (Spending).
- We will be applying t-test.

In [None]:
df