# Import Libraries

In [7]:
import pandas as pd
import numpy as np
import seaborn as sns

import warnings
# ignore all warnings
warnings.filterwarnings('ignore')

# TABLE OF CONTENT

### Introduction
- 1.1 [Project Background](#Project-Background)
- 1.2 [Analysis Objectives](#Analysis-Objectives)
- 1.3 [Data Sources](#Data-Sources)

### Data Preparation
- 2.1 [Loading the Data](#Loading-the-Data)
- 2.2 [Inspecting Data Structure](#Inspecting-Data-Structure)

### Data Cleaning and Preprocessing
- 3.1 [Handling Missing Data](#Handling-Missing-Data)
- 3.2 [Data Formatting and Standardization](#Data-Formatting-and-Standardization)
- 3.3 [Removing Duplicates](#Removing-Duplicates)
- 3.4 [Outlier Management](#Outlier-Management)

### Preliminary Data Exploration
- 4.1 [Descriptive Statistics](#Descriptive-Statistics)
- 4.2 [Initial Pattern and Correlation Analysis](#Initial-Pattern-and-Correlation-Analysis)

### Visual Analysis
- 5.1 [Univariate Analysis](#Univariate-Analysis)
- 5.2 [Bivariate and Multivariate Analysis](#Bivariate-and-Multivariate-Analysis)
- 5.3 [Objective-Driven Visualizations](#Objective-Driven-Visualizations)

### Key Findings
- 6.1 [Key Findings and Patterns](#Key-Findings-and-Patterns)
- 6.2 [Interpretation of Analysis Results](#Interpretation-of-Analysis-Results)

### Recommendations and Next Steps
- 7.1 [Recommendations Based on Findings](#Recommendations-Based-on-Findings)
- 7.2 [Next Steps for Further Analysis](#Next-Steps-for-Further-Analysis)

### Conclusion
- 8.1 [Summary of Findings](#Summary-of-Findings)
- 8.2 [Impact and Implications of Analysis](#Impact-and-Implications-of-Analysis)

### Appendices
- 9.1 [Complete Analysis Code](#Complete-Analysis-Code)
- 9.2 [Additional Data Sources](#Additional-Data-Sources)


# Introduction to Dataset Description:

This dataset consists of survey questions answered by over 100 respondents regarding their purchasing behavior at Starbucks. Income is stated in Malaysian Ringgit (RM).

**Content**

The dataset contains demographic information about customers, including:

- Gender
- Age Range
- Employment Status
- Income Range

This data also includes information about customer purchasing behavior, including:

- Features and amenities at Starbucks that contribute to their choices.

# Objectives

1) Determining the Profile of Loyal Customers: (Descriptive Analysis)

- Identify the demographic characteristics of customers (such as age, income, and employment status) who frequently purchase at Starbucks.
- Analyze which demographic groups are most likely to be loyal customers.

**Age:** Which age group purchases the most at Starbucks?\
**Income:** Which income group is most often among the customers?\
**Employment Status:** Are full-time employees, students, or freelancers more likely to be loyal customers?

2) Identifying Key Factors that Attract Customers: (Correlation Analysis)

- Analyze the features and amenities at Starbucks that most attract customers, such as location, atmosphere, Wi-Fi availability, menu options, and others.
- Measure the extent to which these factors influence the decision to purchase or return

# 1) Load and Inspect the Data

In [8]:
df = pd.read_csv('Data/Raw/Starbucks satisfactory survey.csv')
df.head()

Unnamed: 0,Timestamp,1. Your Gender,2. Your Age,3. Are you currently....?,4. What is your annual income?,5. How often do you visit Starbucks?,6. How do you usually enjoy Starbucks?,7. How much time do you normally spend during your visit?,8. The nearest Starbucks's outlet to you is...?,9. Do you have Starbucks membership card?,...,"11. On average, how much would you spend at Starbucks per visit?","12. How would you rate the quality of Starbucks compared to other brands (Coffee Bean, Old Town White Coffee..) to be:",13. How would you rate the price range at Starbucks?,14. How important are sales and promotions in your purchase decision?,"15. How would you rate the ambiance at Starbucks? (lighting, music, etc...)",16. You rate the WiFi quality at Starbucks as..,"17. How would you rate the service at Starbucks? (Promptness, friendliness, etc..)",18. How likely you will choose Starbucks for doing business meetings or hangout with friends?,19. How do you come to hear of promotions at Starbucks? Check all that apply.,20. Will you continue buying at Starbucks?
0,2019/10/01 12:38:43 PM GMT+8,Female,From 20 to 29,Student,"Less than RM25,000",Rarely,Dine in,Between 30 minutes to 1 hour,within 1km,Yes,...,Less than RM20,4,3,5,5,4,4,3,Starbucks Website/Apps;Social Media;Emails;Dea...,Yes
1,2019/10/01 12:38:54 PM GMT+8,Female,From 20 to 29,Student,"Less than RM25,000",Rarely,Take away,Below 30 minutes,1km - 3km,Yes,...,Less than RM20,4,3,4,4,4,5,2,Social Media;In Store displays,Yes
2,2019/10/01 12:38:56 PM GMT+8,Male,From 20 to 29,Employed,"Less than RM25,000",Monthly,Dine in,Between 30 minutes to 1 hour,more than 3km,Yes,...,Less than RM20,4,3,4,4,4,4,3,In Store displays;Billboards,Yes
3,2019/10/01 12:39:08 PM GMT+8,Female,From 20 to 29,Student,"Less than RM25,000",Rarely,Take away,Below 30 minutes,more than 3km,No,...,Less than RM20,2,1,4,3,3,3,3,Through friends and word of mouth,No
4,2019/10/01 12:39:20 PM GMT+8,Male,From 20 to 29,Student,"Less than RM25,000",Monthly,Take away,Between 30 minutes to 1 hour,1km - 3km,No,...,Around RM20 - RM40,3,3,4,2,2,3,3,Starbucks Website/Apps;Social Media,Yes


In [9]:
len(df)

122

# 2) Data Cleaning and Preprocessing

## Find the missing values

In [10]:
df.isna().sum()

Timestamp                                                                                                                 0
1. Your Gender                                                                                                            0
2. Your Age                                                                                                               0
3. Are you currently....?                                                                                                 0
4. What is your annual income?                                                                                            0
5. How often do you visit Starbucks?                                                                                      0
6. How do you usually enjoy Starbucks?                                                                                    1
7. How much time do you normally  spend during your visit?                                                                0
8. The n

## Missing column

In [11]:
#using iloc
# Step 1: Identify rows with missing values in the specific columns using iloc
missing_mask = df.iloc[:, [6, 19]].isnull().any(axis=1)

# Step 2: Use iloc to view the rows with missing values
missing_rows = df[missing_mask]
missing_rows

Unnamed: 0,Timestamp,1. Your Gender,2. Your Age,3. Are you currently....?,4. What is your annual income?,5. How often do you visit Starbucks?,6. How do you usually enjoy Starbucks?,7. How much time do you normally spend during your visit?,8. The nearest Starbucks's outlet to you is...?,9. Do you have Starbucks membership card?,...,"11. On average, how much would you spend at Starbucks per visit?","12. How would you rate the quality of Starbucks compared to other brands (Coffee Bean, Old Town White Coffee..) to be:",13. How would you rate the price range at Starbucks?,14. How important are sales and promotions in your purchase decision?,"15. How would you rate the ambiance at Starbucks? (lighting, music, etc...)",16. You rate the WiFi quality at Starbucks as..,"17. How would you rate the service at Starbucks? (Promptness, friendliness, etc..)",18. How likely you will choose Starbucks for doing business meetings or hangout with friends?,19. How do you come to hear of promotions at Starbucks? Check all that apply.,20. Will you continue buying at Starbucks?
81,2019/10/03 9:11:28 AM GMT+8,Male,From 20 to 29,Employed,"Less than RM25,000",Never,,Below 30 minutes,more than 3km,No,...,Zero,1,1,1,3,3,3,3,,No


## Using mode to replace missing data (NaN) for categorical variables for more accurate analysis.

In [12]:

# Calculate mode using iloc for columns with NaN values
mode_6 = df['6. How do you usually enjoy Starbucks?'].mode()[0]  # Mod untuk kolum 6
mode_19 = df['19. How do you come to hear of promotions at Starbucks? Check all that apply.'].mode()[0]  # Mod untuk kolum 19

# Replace NaN with mode and save in a new DataFrame
cleaned_df = df.copy()   # Create a copy of the original DataFrame
cleaned_df['6. How do you usually enjoy Starbucks?'].fillna(mode_6, inplace=True) # Fill NaN in the 6th column with mode
cleaned_df['19. How do you come to hear of promotions at Starbucks? Check all that apply.'].fillna(mode_19, inplace=True) # Fill NaN in the 19th column with mode
 
# Check the DataFrame after replacement
cleaned_df.head(5)

Unnamed: 0,Timestamp,1. Your Gender,2. Your Age,3. Are you currently....?,4. What is your annual income?,5. How often do you visit Starbucks?,6. How do you usually enjoy Starbucks?,7. How much time do you normally spend during your visit?,8. The nearest Starbucks's outlet to you is...?,9. Do you have Starbucks membership card?,...,"11. On average, how much would you spend at Starbucks per visit?","12. How would you rate the quality of Starbucks compared to other brands (Coffee Bean, Old Town White Coffee..) to be:",13. How would you rate the price range at Starbucks?,14. How important are sales and promotions in your purchase decision?,"15. How would you rate the ambiance at Starbucks? (lighting, music, etc...)",16. You rate the WiFi quality at Starbucks as..,"17. How would you rate the service at Starbucks? (Promptness, friendliness, etc..)",18. How likely you will choose Starbucks for doing business meetings or hangout with friends?,19. How do you come to hear of promotions at Starbucks? Check all that apply.,20. Will you continue buying at Starbucks?
0,2019/10/01 12:38:43 PM GMT+8,Female,From 20 to 29,Student,"Less than RM25,000",Rarely,Dine in,Between 30 minutes to 1 hour,within 1km,Yes,...,Less than RM20,4,3,5,5,4,4,3,Starbucks Website/Apps;Social Media;Emails;Dea...,Yes
1,2019/10/01 12:38:54 PM GMT+8,Female,From 20 to 29,Student,"Less than RM25,000",Rarely,Take away,Below 30 minutes,1km - 3km,Yes,...,Less than RM20,4,3,4,4,4,5,2,Social Media;In Store displays,Yes
2,2019/10/01 12:38:56 PM GMT+8,Male,From 20 to 29,Employed,"Less than RM25,000",Monthly,Dine in,Between 30 minutes to 1 hour,more than 3km,Yes,...,Less than RM20,4,3,4,4,4,4,3,In Store displays;Billboards,Yes
3,2019/10/01 12:39:08 PM GMT+8,Female,From 20 to 29,Student,"Less than RM25,000",Rarely,Take away,Below 30 minutes,more than 3km,No,...,Less than RM20,2,1,4,3,3,3,3,Through friends and word of mouth,No
4,2019/10/01 12:39:20 PM GMT+8,Male,From 20 to 29,Student,"Less than RM25,000",Monthly,Take away,Between 30 minutes to 1 hour,1km - 3km,No,...,Around RM20 - RM40,3,3,4,2,2,3,3,Starbucks Website/Apps;Social Media,Yes


## Checking again df.cleaned Missing values 

In [13]:
# Check for remaining NaN values in the specific columns
missing_6 = cleaned_df.iloc[:, 5].isna().sum()  # Count NaNs in the 6th column
missing_19 = cleaned_df.iloc[:, 18].isna().sum()  # Count NaNs in the 19th column

print(f"Missing values in 6th column after replacement: {missing_6}")
print(f"Missing values in 19th column after replacement: {missing_19}")


Missing values in 6th column after replacement: 0
Missing values in 19th column after replacement: 0


## Outlier Management

# Preliminary Data Exploration

1) Determining the Profile of Loyal Customers: (Descriptive Analysis)

- Identify the demographic characteristics of customers (such as age, income, and employment status) who frequently purchase at Starbucks.
- Analyze which demographic groups are most likely to be loyal customers.

**Age:** Which age group purchases the most at Starbucks?\
**Income:** Which income group is most often among the customers?\
**Employment Status:** Are full-time employees, students, or freelancers more likely to be loyal customers?

2) Identifying Key Factors that Attract Customers: 

- Analyze the features and amenities at Starbucks that most attract customers, such as location, atmosphere, Wi-Fi availability, menu options, and others.
- Measure the extent to which these factors influence the decision to purchase or return


### FIND THE UNIQUE COLUMN VARIABLE

In [14]:
# Load your data
data = pd.read_csv('Data/Cleaned/cleaned_data.csv')  # Replace with your actual file path

# Loop through each column and print unique values
for column in data.columns:
    unique_values = data[column].unique()
    print(f"Unique values in '{column}': {unique_values}")

Unique values in 'Timestamp': ['2019/10/01 12:38:43 PM GMT+8' '2019/10/01 12:38:54 PM GMT+8'
 '2019/10/01 12:38:56 PM GMT+8' '2019/10/01 12:39:08 PM GMT+8'
 '2019/10/01 12:39:20 PM GMT+8' '2019/10/01 12:39:39 PM GMT+8'
 '2019/10/01 12:39:42 PM GMT+8' '2019/10/01 12:40:58 PM GMT+8'
 '2019/10/01 12:42:27 PM GMT+8' '2019/10/01 12:43:36 PM GMT+8'
 '2019/10/01 12:47:00 PM GMT+8' '2019/10/01 12:48:26 PM GMT+8'
 '2019/10/01 12:49:25 PM GMT+8' '2019/10/01 12:53:09 PM GMT+8'
 '2019/10/01 12:53:16 PM GMT+8' '2019/10/01 12:57:31 PM GMT+8'
 '2019/10/01 12:59:11 PM GMT+8' '2019/10/01 1:08:15 PM GMT+8'
 '2019/10/01 1:09:12 PM GMT+8' '2019/10/01 1:13:03 PM GMT+8'
 '2019/10/01 1:13:45 PM GMT+8' '2019/10/01 1:14:43 PM GMT+8'
 '2019/10/01 1:21:50 PM GMT+8' '2019/10/01 1:24:04 PM GMT+8'
 '2019/10/01 1:24:21 PM GMT+8' '2019/10/01 1:25:56 PM GMT+8'
 '2019/10/01 1:29:11 PM GMT+8' '2019/10/01 1:33:54 PM GMT+8'
 '2019/10/01 1:34:30 PM GMT+8' '2019/10/01 1:37:27 PM GMT+8'
 '2019/10/01 1:39:16 PM GMT+8' '2019/1

# 1) Determining the Profile of Loyal Customers: (Descriptive Analysis)

## Descriptive Statistics
Demographic Frequency/Percentage (Age/Income/Employment/Gender) - Univariate

## A) Identify the demographic characteristics of customers (such as age, income, and employment status) who frequently purchase at Starbucks.

### AGE

In [15]:
# Calculate frequency for '2. Your Age'
freq_age = cleaned_df['2. Your Age'].value_counts()
print("Frekuensi:\n", freq_age)

# Calculate percentage for '2. Your Age'
perc_age = cleaned_df['2. Your Age'].value_counts(normalize=True) * 100
print("\nPeratusan:\n", perc_age)

# Calculate the mode of the '2. Your Age' column
mode_age = cleaned_df['2. Your Age'].mode()[0]
print("\nMod:", mode_age)


Frekuensi:
 2. Your Age
From 20 to 29    85
From 30 to 39    17
Below 20         13
40 and above      7
Name: count, dtype: int64

Peratusan:
 2. Your Age
From 20 to 29    69.672131
From 30 to 39    13.934426
Below 20         10.655738
40 and above      5.737705
Name: proportion, dtype: float64

Mod: From 20 to 29


## INCOME

In [16]:
# Calculate frequency for '4. What is your annual income?'
freq_income = cleaned_df['4. What is your annual income?'].value_counts()
print("Frekuensi:\n", freq_income)

# Calculate percentage for '4. What is your annual income?'
perc_income = cleaned_df['4. What is your annual income?'].value_counts(normalize=True) * 100
print("\nPeratusan:\n", perc_income)

# Calculate the mode of the '4. What is your annual income?' column
mode_income = cleaned_df['4. What is your annual income?'].mode()[0]
print("\nMod:", mode_income)


Frekuensi:
 4. What is your annual income?
Less than RM25,000       71
RM25,000 - RM50,000      25
RM50,000 - RM100,000     17
More than RM150,000       6
RM100,000 - RM150,000     3
Name: count, dtype: int64

Peratusan:
 4. What is your annual income?
Less than RM25,000       58.196721
RM25,000 - RM50,000      20.491803
RM50,000 - RM100,000     13.934426
More than RM150,000       4.918033
RM100,000 - RM150,000     2.459016
Name: proportion, dtype: float64

Mod: Less than RM25,000


## EMPLOYMENT STATUS

In [17]:
# Calculate frequency for '3. Are you currently....?'
freq_employment = df['3. Are you currently....?'].value_counts()
print("Frekuensi:\n", freq_employment)

# Calculate percentage for '3. Are you currently....?'
perc_employment = df['3. Are you currently....?'].value_counts(normalize=True) * 100
print("\nPeratusan:\n", perc_employment)

# Calculate the mode of the '3. Are you currently....?' column
mode_employment = df['3. Are you currently....?'].mode()[0]
print("\nMod:", mode_employment)


Frekuensi:
 3. Are you currently....?
Employed         61
Student          42
Self-employed    17
Housewife         2
Name: count, dtype: int64

Peratusan:
 3. Are you currently....?
Employed         50.000000
Student          34.426230
Self-employed    13.934426
Housewife         1.639344
Name: proportion, dtype: float64

Mod: Employed


## GENDER

In [18]:
# Calculate frequency for gender
freq_gender = df['1. Your Gender'].value_counts()
print("Frekuensi:\n", freq_gender)

# Calculate percentage for gender
perc_gender = df['1. Your Gender'].value_counts(normalize=True) * 100
print("\nPeratusan:\n", perc_gender)

# Calculate the mode of the gender column
mode_gender = df['1. Your Gender'].mode()[0]
print("\nMod:", mode_gender)


Frekuensi:
 1. Your Gender
Female    65
Male      57
Name: count, dtype: int64

Peratusan:
 1. Your Gender
Female    53.278689
Male      46.721311
Name: proportion, dtype: float64

Mod: Female


## B1) Analyze which demographic groups are most likely to be loyal customers. (USING Chi-aquare)
Cannot using kolerasi , because anlysis kualitatif (Norminal) vs kualitatif(norminal)
1) AGE
2) INCOME
3) EMPLOYMENT

In [30]:
# Muatkan data
cleaned_df

# Tukarkan jawapan kepada nilai binari
cleaned_df['Loyal_Customer'] = df['20. Will you continue buying at Starbucks?'].map({'Yes': 1, 'No': 0})
cleaned_df.head(2)

Unnamed: 0,Timestamp,1. Your Gender,2. Your Age,3. Are you currently....?,4. What is your annual income?,5. How often do you visit Starbucks?,6. How do you usually enjoy Starbucks?,7. How much time do you normally spend during your visit?,8. The nearest Starbucks's outlet to you is...?,9. Do you have Starbucks membership card?,...,"12. How would you rate the quality of Starbucks compared to other brands (Coffee Bean, Old Town White Coffee..) to be:",13. How would you rate the price range at Starbucks?,14. How important are sales and promotions in your purchase decision?,"15. How would you rate the ambiance at Starbucks? (lighting, music, etc...)",16. You rate the WiFi quality at Starbucks as..,"17. How would you rate the service at Starbucks? (Promptness, friendliness, etc..)",18. How likely you will choose Starbucks for doing business meetings or hangout with friends?,19. How do you come to hear of promotions at Starbucks? Check all that apply.,20. Will you continue buying at Starbucks?,Loyal_Customer
0,2019/10/01 12:38:43 PM GMT+8,Female,From 20 to 29,Student,"Less than RM25,000",Rarely,Dine in,Between 30 minutes to 1 hour,within 1km,Yes,...,4,3,5,5,4,4,3,Starbucks Website/Apps;Social Media;Emails;Dea...,Yes,1
1,2019/10/01 12:38:54 PM GMT+8,Female,From 20 to 29,Student,"Less than RM25,000",Rarely,Take away,Below 30 minutes,1km - 3km,Yes,...,4,3,4,4,4,5,2,Social Media;In Store displays,Yes,1


## AGE VS LOYAL CUSTOMER

In [23]:
import pandas as pd
from scipy.stats import chi2_contingency

# Create a contingency table for Age vs Loyal Customer
contingency_table_age = pd.crosstab(cleaned_df['2. Your Age'], cleaned_df['Loyal_Customer'])

# Perform Chi-Square test
chi2, p, dof, expected = chi2_contingency(contingency_table_age)

# Print the result
print("Chi-Square Test result for Age vs Loyal Customer:")
print("Chi2 Value:", chi2)
print("P-value:", p)
print("Degrees of Freedom:", dof)
print("Expected Frequencies Table:\n", expected)


Chi-Square Test result for Age vs Loyal Customer:
Chi2 Value: 1.9133579454695133
P-value: 0.5905828390741805
Degrees of Freedom: 3
Expected Frequencies Table:
 [[ 1.60655738  5.39344262]
 [ 2.98360656 10.01639344]
 [19.50819672 65.49180328]
 [ 3.90163934 13.09836066]]


## INCOME VS LOYAL CUSTOMER

In [36]:
import pandas as pd
from scipy.stats import chi2_contingency

# Create a contingency table for Age vs Loyal Customer
contingency_table_age = pd.crosstab(cleaned_df['4. What is your annual income?'], cleaned_df['Loyal_Customer'])

# Perform Chi-Square test
chi2, p, dof, expected = chi2_contingency(contingency_table_age)

# Print the result
print("Chi-Square Test result for Income vs Loyal Customer:")
print("Chi2 Value:", chi2)
print("P-value:", p)
print("Degrees of Freedom:", dof)
print("Expected Frequencies Table:\n", expected)


Chi-Square Test result for Income vs Loyal Customer:
Chi2 Value: 2.106181519656109
P-value: 0.7162368186691492
Degrees of Freedom: 4
Expected Frequencies Table:
 [[16.29508197 54.70491803]
 [ 1.37704918  4.62295082]
 [ 0.68852459  2.31147541]
 [ 5.73770492 19.26229508]
 [ 3.90163934 13.09836066]]


## EMPLOYMENT VS LOYAL CUSTOMER

In [37]:
import pandas as pd
from scipy.stats import chi2_contingency

# Create a contingency table for Age vs Loyal Customer
contingency_table_age = pd.crosstab(cleaned_df['3. Are you currently....?'], cleaned_df['Loyal_Customer'])

# Perform Chi-Square test
chi2, p, dof, expected = chi2_contingency(contingency_table_age)

# Print the result
print("Chi-Square Test result for Employment vs Loyal Customer:")
print("Chi2 Value:", chi2)
print("P-value:", p)
print("Degrees of Freedom:", dof)
print("Expected Frequencies Table:\n", expected)


Chi-Square Test result for Employment vs Loyal Customer:
Chi2 Value: 4.729781274211812
P-value: 0.19268745913696161
Degrees of Freedom: 3
Expected Frequencies Table:
 [[14.         47.        ]
 [ 0.45901639  1.54098361]
 [ 3.90163934 13.09836066]
 [ 9.63934426 32.36065574]]


# Dapatan Kajian (DEMOGRAFI VS LOYAL CUSTOMER)

Berdasarkan hasil ujian Chi-Square yang kamu jalankan untuk Age, Income, dan Employment berbanding dengan Loyal Customer, berikut adalah kesimpulan yang boleh dibuat:

1. Age vs Loyal Customer\
Chi2 Value: 1.91\
P-value: 0.59\
Degrees of Freedom: 3\
Expected Frequencies Table: Menunjukkan bilangan yang dijangkakan bagi setiap kategori.\
Kesimpulan: P-value (0.59) lebih besar daripada nilai 0.05, yang menunjukkan tiada hubungan yang signifikan antara umur dan kesetiaan pelanggan. Ini bermakna umur tidak memberi kesan yang besar terhadap sama ada seseorang itu setia atau tidak sebagai pelanggan Starbucks.

2. Income vs Loyal Customer\
Chi2 Value: 2.11\
P-value: 0.72\
Degrees of Freedom: 4\
Expected Frequencies Table: Menunjukkan bilangan yang dijangkakan bagi setiap kategori.\
Kesimpulan: P-value (0.72) lebih besar daripada 0.05, yang menunjukkan tiada hubungan yang signifikan antara pendapatan dan kesetiaan pelanggan. Ini bermakna pendapatan juga tidak memberi kesan yang signifikan terhadap kesetiaan pelanggan Starbucks.

3. Employment vs Loyal Customer\
Chi2 Value: 4.73\
P-value: 0.19\
Degrees of Freedom: 3\
Expected Frequencies Table: Menunjukkan bilangan yang dijangkakan bagi setiap kategori.\
Kesimpulan: P-value (0.19) lebih besar daripada 0.05, yang menunjukkan tiada hubungan yang signifikan antara status pekerjaan dan kesetiaan pelanggan. Oleh itu, status pekerjaan juga tidak memberi kesan yang signifikan terhadap kesetiaan pelanggan Starbucks.

Ringkasan Keseluruhan:\
P-values yang lebih besar daripada 0.05 menunjukkan bahawa tiada hubungan yang signifikan antara umur, pendapatan, atau status pekerjaan dengan kesetiaan pelanggan Starbucks. Oleh itu, kita gagal untuk menolak hipotesis nol (tiada hubungan) dan boleh menyimpulkan bahawa faktor-faktor ini tidak mempengaruhi kesetiaan pelanggan Starbucks dalam sampel yang dikaji.

## B2) Analyze which Factor are most likely to be loyal customers. (USING Chi-Squared Test)
Cannot using korelasi, because anlysis involve kualitative (ordinal) vs kualitative (Norminal)
1) WIFI QUALITY
2) SALES


## WIFI QUALITY VS LOYAL CUSTOMER

In [32]:
# Create a contingency table for WiFi Quality vs Loyal Customer
contingency_table_wifi_quality = pd.crosstab(cleaned_df['16. You rate the WiFi quality at Starbucks as..'], cleaned_df['Loyal_Customer'])

# Perform Chi-Square test
chi2, p, dof, expected = chi2_contingency(contingency_table_wifi_quality)

# Print the result
print("Chi-Square Test result for WiFi Quality vs Loyal Customer:")
print("Chi2 Value:", chi2)
print("P-value:", p)
print("Degrees of Freedom:", dof)
print("Expected Frequencies Table:\n", expected)


Chi-Square Test result for WiFi Quality vs Loyal Customer:
Chi2 Value: 4.214536005537444
P-value: 0.37774945584472164
Degrees of Freedom: 4
Expected Frequencies Table:
 [[ 1.60655738  5.39344262]
 [ 2.98360656 10.01639344]
 [12.39344262 41.60655738]
 [ 8.72131148 29.27868852]
 [ 2.29508197  7.70491803]]


## SALES VS LOYAL CUSTOMER

In [34]:
# Create a contingency table for WiFi Quality vs Loyal Customer
contingency_table_wifi_quality = pd.crosstab(cleaned_df['14. How important are sales and promotions in your purchase decision?'], cleaned_df['Loyal_Customer'])

# Perform Chi-Square test
chi2, p, dof, expected = chi2_contingency(contingency_table_wifi_quality)

# Print the result
print("Chi-Square Test 14. How important are sales and promotions in your purchase decision?  vs Loyal Customer:")
print("Chi2 Value:", chi2)
print("P-value:", p)
print("Degrees of Freedom:", dof)
print("Expected Frequencies Table:\n", expected)


Chi-Square Test 14. How important are sales and promotions in your purchase decision?  vs Loyal Customer:
Chi2 Value: 3.008873267535881
P-value: 0.5563415797926121
Degrees of Freedom: 4
Expected Frequencies Table:
 [[ 1.37704918  4.62295082]
 [ 1.60655738  5.39344262]
 [ 6.8852459  23.1147541 ]
 [ 9.63934426 32.36065574]
 [ 8.49180328 28.50819672]]


## SERVICE RATING VS LOYAL CUSTOMER

In [35]:
# Create a contingency table for WiFi Quality vs Loyal Customer
contingency_table_wifi_quality = pd.crosstab(cleaned_df['17. How would you rate the service at Starbucks? (Promptness, friendliness, etc..)'], cleaned_df['Loyal_Customer'])

# Perform Chi-Square test
chi2, p, dof, expected = chi2_contingency(contingency_table_wifi_quality)

# Print the result
print("Chi-Square Test 17. How would you rate the service at Starbucks? (Promptness, friendliness, etc..)  vs Loyal Customer:")
print("Chi2 Value:", chi2)
print("P-value:", p)
print("Degrees of Freedom:", dof)
print("Expected Frequencies Table:\n", expected)


Chi-Square Test 17. How would you rate the service at Starbucks? (Promptness, friendliness, etc..)  vs Loyal Customer:
Chi2 Value: 10.09967280589494
P-value: 0.03878176286493808
Degrees of Freedom: 4
Expected Frequencies Table:
 [[ 0.2295082   0.7704918 ]
 [ 0.91803279  3.08196721]
 [ 9.86885246 33.13114754]
 [11.70491803 39.29508197]
 [ 5.27868852 17.72131148]]


# Dapatan Kajian (OTHERS FACTOR VS LOYAL CUSTOMER)

Berdasarkan keputusan ujian Chi-Square yang kamu jalankan untuk WiFi Quality, Sales and Promotions, dan Service berbanding dengan Loyal Customer, berikut adalah kesimpulan yang boleh dibuat:

1. WiFi Quality vs Loyal Customer\
Chi2 Value: 4.21\
P-value: 0.38\
Degrees of Freedom: 4\
Expected Frequencies Table: Menunjukkan bilangan yang dijangkakan bagi setiap kategori.\
Kesimpulan: P-value (0.38) lebih besar daripada 0.05, yang menunjukkan tiada hubungan yang signifikan antara kualiti WiFi dan kesetiaan pelanggan. Oleh itu, kualiti WiFi di Starbucks tidak mempengaruhi kesetiaan pelanggan dalam sampel yang dikaji.

2. Sales and Promotions vs Loyal Customer\
Chi2 Value: 3.01\
P-value: 0.56\
Degrees of Freedom: 4\
Expected Frequencies Table: Menunjukkan bilangan yang dijangkakan bagi setiap kategori.\
Kesimpulan: P-value (0.56) lebih besar daripada 0.05, yang menunjukkan tiada hubungan yang signifikan antara penjualan dan promosi dengan kesetiaan pelanggan. Ini bermakna promosi dan penjualan tidak memberi kesan yang besar terhadap kesetiaan pelanggan Starbucks dalam kajian ini.

3. Service vs Loyal Customer\
Chi2 Value: 10.10\
P-value: 0.0387\
Degrees of Freedom: 4\
Expected Frequencies Table: Menunjukkan bilangan yang dijangkakan bagi setiap kategori.\
Kesimpulan: P-value (0.0387) adalah kurang daripada 0.05, yang menunjukkan hubungan yang signifikan antara kualiti perkhidmatan (kecekapan, keramahan, dll.) dan kesetiaan pelanggan. Oleh itu, perkhidmatan yang diberikan di Starbucks mempengaruhi kesetiaan pelanggan dalam kajian ini, di mana pelanggan yang lebih berpuas hati dengan perkhidmatan cenderung untuk menjadi pelanggan yang setia.

Ringkasan Keseluruhan:\
WiFi Quality dan Sales and Promotions tidak mempunyai hubungan yang signifikan dengan kesetiaan pelanggan (P-value > 0.05).
Perkhidmatan di Starbucks mempunyai hubungan yang signifikan dengan kesetiaan pelanggan (P-value < 0.05). Oleh itu, meningkatkan kualiti perkhidmatan seperti kecekapan dan keramahan dapat mempengaruhi kesetiaan pelanggan dengan lebih baik.

# GRAFIK

# KESIMPUALAN KESELURUHAN

Kesimpulan Keseluruhan Projek Data Analisis:\
Berdasarkan analisis data yang telah dilakukan, berikut adalah beberapa kesimpulan utama yang dapat diambil mengenai kesetiaan pelanggan Starbucks dan faktor-faktor yang mempengaruhinya, termasuk analisis demografi dan analisis faktor lain yang berkaitan dengan Loyal Customer.

1. Analisis Demografi Pelanggan\
Mod (Umur): Majoriti pelanggan yang setia adalah dalam kumpulan umur 20 hingga 29 tahun.\
Mod (Pendapatan Tahunan): Pelanggan setia biasanya mempunyai pendapatan tahunan kurang daripada RM25,000.\
Mod (Pekerjaan): Pelanggan yang paling setia adalah mereka yang bekerja.\
Mod (Jantina): Kebanyakan pelanggan yang setia adalah wanita.\


Kesimpulan: Pelanggan yang berusia muda (20-29 tahun), dengan pendapatan rendah (kurang daripada RM25,000), yang bekerja dan berjenis kelamin perempuan, cenderung untuk menjadi pelanggan setia Starbucks. Ini menunjukkan bahawa kumpulan demografi ini mempunyai kesetiaan yang lebih tinggi terhadap Starbucks, yang mungkin berkait dengan gaya hidup dan keperluan sosial yang lebih cenderung untuk berkunjung ke kafe.

3. Analisis Chi-Square (Hubungan dengan Loyal Customer)\
WiFi Quality vs Loyal Customer: Tiada hubungan yang signifikan ditemui antara kualiti WiFi dan kesetiaan pelanggan (P-value > 0.05), yang bermakna kualiti WiFi tidak mempengaruhi keputusan pelanggan untuk tetap setia.\
Sales and Promotions vs Loyal Customer: Tiada hubungan yang signifikan antara sales dan promosi dengan kesetiaan pelanggan (P-value > 0.05), menunjukkan bahawa promosi atau diskaun tidak memberikan impak besar kepada kesetiaan pelanggan.\
Service vs Loyal Customer: Ditemui bahawa perkhidmatan (kecekapan, keramahan, dsb.) mempunyai hubungan yang signifikan dengan kesetiaan pelanggan (P-value < 0.05). Pelanggan yang berpuas hati dengan perkhidmatan di Starbucks lebih cenderung untuk menjadi pelanggan setia.
Kesimpulan:

Faktor perkhidmatan adalah elemen yang paling penting dalam membentuk kesetiaan pelanggan di Starbucks. Pelanggan yang mengalami perkhidmatan yang baik lebih cenderung untuk terus membeli dan menjadi pelanggan setia.\
Kualiti WiFi dan sales/promosi tidak mempunyai impak yang besar terhadap kesetiaan pelanggan dalam kes ini, menunjukkan bahawa aspek lain seperti pengalaman pelanggan lebih penting daripada insentif atau faktor fizikal seperti WiFi.\

3. Kesimpulan Keseluruhan untuk Projek Ini\
Demografi memainkan peranan penting dalam kesetiaan pelanggan Starbucks, dengan pelanggan muda yang bekerja dan berpendapatan rendah lebih cenderung menjadi pelanggan setia.\
Faktor perkhidmatan adalah faktor utama yang mempengaruhi kesetiaan pelanggan. Oleh itu, Starbucks harus menumpukan perhatian kepada peningkatan kualiti perkhidmatan (kecekapan, keramahan) untuk mengekalkan pelanggan yang setia.\
Walaupun faktor WiFi dan promosi tidak menunjukkan hubungan yang signifikan dengan kesetiaan pelanggan, mereka masih boleh menjadi elemen penting dalam pengalaman pelanggan, namun perkhidmatan dan hubungan dengan pelanggan adalah faktor yang lebih kuat.\
Saranan:\
Starbucks perlu menumpukan usaha mereka dalam meningkatkan kualiti perkhidmatan, terutamanya di kalangan pelanggan muda dan wanita, kerana mereka lebih cenderung untuk menjadi pelanggan setia.\
Walaupun promosi dan WiFi tidak menunjukkan hubungan yang kuat, aspek harga dan pengalaman keseluruhan boleh dipertimbangkan untuk meningkatkan pengalaman pelanggan.

# Penulisan laporan ?