# Cuisine Analysis

In [None]:
pip install --upgrade kagglehub[pandas-datasets,hf-datasets]

In [1]:
import kagglehub

In [2]:
# Download latest version
path = kagglehub.dataset_download("surajjha101/cuisine-rating")

print("Path to dataset files:", path)

Downloading from https://www.kaggle.com/api/v1/datasets/download/surajjha101/cuisine-rating?dataset_version_number=2...


100%|██████████| 2.67k/2.67k [00:00<00:00, 1.37MB/s]

Extracting files...
Path to dataset files: C:\Users\Kelvin Rizky\.cache\kagglehub\datasets\surajjha101\cuisine-rating\versions\2





## Load Dataset

In [1]:
import pandas as pd

df = pd.read_csv('Cuisine_rating.csv')

df

Unnamed: 0,User ID,Area code,Location,Gender,YOB,Marital Status,Activity,Budget,Cuisines,Alcohol,Smoker,Food Rating,Service Rating,Overall Rating,Often A S
0,1,153,"Upper East Side,NY",Female,2006,Single,Professional,3,Japanese,Never,Never,5,4,4.5,No
1,2,123,"St. George,NY",Female,1991,Married,Student,3,Indian,Never,Socially,1,1,1.0,No
2,3,122,"Upper West Side,NY",Male,1977,Single,Student,5,Seafood,Often,Often,5,5,5.0,Yes
3,4,153,"Upper East Side,NY",Female,1956,Married,Professional,5,Japanese,Never,Socially,3,1,2.0,No
4,5,129,"Central Park,NY",Male,1997,Single,Student,4,Filipino,Socially,Never,2,4,3.0,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,196,175,"St. George,NY",Female,1982,Single,Professional,4,French,Never,Socially,1,2,1.5,No
196,197,170,"Upper West Side,NY",Female,2000,Married,Student,4,Chinese,Never,Often,1,2,1.5,No
197,198,160,"St. George,NY",Female,2006,Single,Professional,5,Japanese,Never,Often,5,2,3.5,No
198,199,130,"St. George,NY",Male,2002,Married,Student,3,Filipino,Never,Socially,3,2,2.5,No


## Initial Data Exploration

In [2]:
# Display information about the dataset, such as type of data, columns, entries, etc
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   User ID         200 non-null    int64  
 1   Area code       200 non-null    int64  
 2   Location        200 non-null    object 
 3   Gender          200 non-null    object 
 4   YOB             200 non-null    int64  
 5   Marital Status  200 non-null    object 
 6   Activity        200 non-null    object 
 7   Budget          200 non-null    int64  
 8   Cuisines        200 non-null    object 
 9   Alcohol         200 non-null    object 
 10  Smoker          200 non-null    object 
 11  Food Rating     200 non-null    int64  
 12  Service Rating  200 non-null    int64  
 13  Overall Rating  200 non-null    float64
 14  Often A S       200 non-null    object 
dtypes: float64(1), int64(6), object(8)
memory usage: 23.6+ KB


From the dataset, we can see that the dataset has 15 columns and 200 data. For type of data, there are 6 interger, 1 float, and 8 string

In [3]:
# Check for missing value
df.isna().sum()

User ID           0
Area code         0
Location          0
Gender            0
YOB               0
Marital Status    0
Activity          0
Budget            0
Cuisines          0
Alcohol           0
Smoker            0
Food Rating       0
Service Rating    0
Overall Rating    0
Often A S         0
dtype: int64

There is no missing value in the dataset. 

In [4]:
# Display basic statistics of dataset
df.describe()

Unnamed: 0,User ID,Area code,YOB,Budget,Food Rating,Service Rating,Overall Rating
count,200.0,200.0,200.0,200.0,200.0,200.0,200.0
mean,100.5,141.06,1984.83,3.815,3.22,3.23,3.225
std,57.879185,26.130257,16.809339,1.056578,1.411226,1.526022,1.079445
min,1.0,101.0,1955.0,1.0,1.0,1.0,1.0
25%,50.75,123.0,1971.0,3.0,2.0,2.0,2.5
50%,100.5,135.0,1987.0,4.0,3.0,3.0,3.0
75%,150.25,158.0,2000.0,5.0,5.0,5.0,4.0
max,200.0,199.0,2009.0,5.0,5.0,5.0,5.0


In [5]:
def Find_Outliers(df):



    try:
        num_cols= df.select_dtypes(include=['number']).columns # Mengambil data numerik
        outlier_indicates = set()

        for col in num_cols:
            q1 = df[col].quantile(0.25)
            q3 = df[col].quantile(0.75)
            IQR = q3 - q1
            lowerbound = q1 - (IQR * 1.5)
            upperbound = q3 + (IQR * 1.5)

        # Cara indeks dari baris yang memiliki outlier

        outlier_indicates.update(df[(df[col] < lowerbound) | (df[col] > upperbound)].index)

        outliers = df.loc[list(outlier_indicates)]
        return outliers

    except Exception as e:
        print(f"Error: {e}")
        return None
    
outliers = Find_Outliers(df)
print(f'Number of outliers: {len(outliers)}')
print('Max outlier value:\n', outliers.max(numeric_only=True))
print('Min outlier value:\n', outliers.min(numeric_only=True))

Number of outliers: 0
Max outlier value:
 User ID          NaN
Area code        NaN
YOB              NaN
Budget           NaN
Food Rating      NaN
Service Rating   NaN
Overall Rating   NaN
dtype: float64
Min outlier value:
 User ID          NaN
Area code        NaN
YOB              NaN
Budget           NaN
Food Rating      NaN
Service Rating   NaN
Overall Rating   NaN
dtype: float64


There is no outliers in the dataset.

## Data Manipulation

### Grouping By

#### Customer Segmentation

In [6]:
# Getting information about customers
customers_segmentation = df.groupby(['Gender', 'Activity'])['User ID'].count()
customers_segmentation

Gender  Activity    
Female  Professional    40
        Student         42
Male    Professional    40
        Student         78
Name: User ID, dtype: int64

Based on the information above, it is known that the male gender visits the restaurant more than the female. Then, in terms of activity, professional customers or those who have worked visit the restaurant more than students. Therefore, restaurants can add certain discounts or promotions to customers who have worked to attract more customers.

#### Cuisine Interest Analysis

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   User ID         200 non-null    int64  
 1   Area code       200 non-null    int64  
 2   Location        200 non-null    object 
 3   Gender          200 non-null    object 
 4   YOB             200 non-null    int64  
 5   Marital Status  200 non-null    object 
 6   Activity        200 non-null    object 
 7   Budget          200 non-null    int64  
 8   Cuisines        200 non-null    object 
 9   Alcohol         200 non-null    object 
 10  Smoker          200 non-null    object 
 11  Food Rating     200 non-null    int64  
 12  Service Rating  200 non-null    int64  
 13  Overall Rating  200 non-null    float64
 14  Often A S       200 non-null    object 
dtypes: float64(1), int64(6), object(8)
memory usage: 23.6+ KB


In [8]:
# Getting information about cuisine interest among customers
cuisine_interest = df.groupby('Cuisines')["User ID"].count().sort_values(ascending=False)
cuisine_interest

Cuisines
Japanese    36
Filipino    34
French      34
Indian      32
Chinese     24
Seafood     22
Italian     18
Name: User ID, dtype: int64

Japanese cuisine is more popular than Filipino, French, and the other cuisine. Therefore, restaurants can add Japanese cuisine to their menu.

#### Food and Service Rating Analysis

In [9]:
# Getting information about food and service rating analysis
food_and_service_rating = df.groupby('Cuisines')[['Food Rating', 'Service Rating', 'Overall Rating']].mean().sort_values(by='Overall Rating', ascending=False)
food_and_service_rating

Unnamed: 0_level_0,Food Rating,Service Rating,Overall Rating
Cuisines,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Japanese,3.861111,3.333333,3.597222
Chinese,3.458333,3.083333,3.270833
Italian,3.222222,3.166667,3.194444
Indian,2.78125,3.5625,3.171875
Seafood,3.227273,3.045455,3.136364
French,3.294118,2.911765,3.102941
Filipino,2.705882,3.382353,3.044118


Japanese cuisine is more liked than Chinese, Italian, and the other cuisine. Therefore, restaurants can add Japanese cuisine to their menu as same as cuisine interest. Based on the data, it can also be seen that Filipino and French cuisine have low ratings compared to other cuisines. Therefore, it is necessary to conduct an evaluation in terms of food quality, the menu to be served, and the service provided so that the food and service ratings can be increased.

#### Customer Budget Analysis

In [10]:
# Getting information about customer's budget
customer_budget = df.groupby('Cuisines')['Budget'].mean().sort_values(ascending=False)
customer_budget

Cuisines
Japanese    4.111111
Chinese     4.083333
French      3.941176
Italian     3.888889
Filipino    3.705882
Seafood     3.545455
Indian      3.406250
Name: Budget, dtype: float64

Based on data information, it is known that Japanese and Chinese food have a larger average budget compared to other cuisines. Given the large budget spent on these dishes, restaurants can create exclusive and premium menus so that customers feel satisfied with the Japanese and Chinese menus they pay for. For the example live cooking sushi, hotpot service, etc.

For food with a low budget compared to other cuisines such as Indian and Filipino, restaurants can provide savings package promos, such as buy 1 get 1 to increase sales for customers with a low budget.

#### Smoker and Alcohol Ratio

In [11]:
# Getting information about smoker ratio
smoker_ratio=df.groupby(["Smoker","Cuisines"])["User ID"].count()
smoker_ratio

Smoker    Cuisines
Never     Chinese      6
          Filipino    16
          French      10
          Indian       7
          Italian      6
          Japanese     5
          Seafood      9
Often     Chinese     10
          Filipino     4
          French      14
          Indian      10
          Italian      5
          Japanese    20
          Seafood      7
Socially  Chinese      8
          Filipino    14
          French      10
          Indian      15
          Italian      7
          Japanese    11
          Seafood      6
Name: User ID, dtype: int64

In [13]:
# Getting information about alcohol ratio
smoker_ratio=df.groupby(["Alcohol ","Cuisines"])["User ID"].count()
smoker_ratio

Alcohol   Cuisines
Never     Chinese     12
          Filipino     8
          French      19
          Indian      15
          Italian      6
          Japanese    20
          Seafood      8
Often     Chinese      8
          Filipino    14
          French      10
          Indian       9
          Italian      4
          Japanese     8
          Seafood      8
Socially  Chinese      4
          Filipino    12
          French       5
          Indian       8
          Italian      8
          Japanese     8
          Seafood      6
Name: User ID, dtype: int64

Based on the data above, Chinese cuisine has customers who like to smoke and drink alcohol compared to other cuisines, therefore restaurants can consider creating a special food room for smokers and increasing the supply of alcoholic drinks for Chinese cuisine menus.

### Correlation

#### Budget and Overall Rating 

In [15]:
# Getting information about correlation between Budget and Overall Rating
budget_rating = df[['Budget', 'Overall Rating']].corr()
budget_rating

Unnamed: 0,Budget,Overall Rating
Budget,1.0,-0.058049
Overall Rating,-0.058049,1.0


There is no correlation between Budget and Overall Rating in Cuisine because the value of corr near to zero. Therefore, high-budget customers do not always give higher ratings, and vice versa. Faktor lain seperti kualitas makanan, pelayanan, suasana restoran, atau ekspektasi pelanggan mungkin lebih berpengaruh terhadap rating.

## Conclusion

By leveraging these insights, the restaurant can optimize its menu, improve service quality, offer targeted promotions, and enhance the dining experience to attract and retain more customers.

Customer Demographics & Promotions

Male customers visit the restaurant more frequently than females.

Professionals visit more often than students.

Recommendation: Introduce discounts or promotions targeted at working professionals to attract more customers.

Popular Cuisine Preferences

Japanese cuisine is the most popular compared to Filipino, French, and others.

Recommendation: Add Japanese cuisine to the menu to align with customer preferences.

Cuisine Ratings & Quality Improvement

Japanese cuisine is rated higher than Chinese, Italian, and other cuisines.

Filipino and French cuisines have lower ratings, indicating possible issues with food quality or service.

Recommendation: Evaluate and improve food quality, menu offerings, and service standards to enhance customer satisfaction.

Budget & Premium Menu Opportunities

Customers spend more on Japanese and Chinese cuisines than others.

Recommendation: Introduce premium experiences such as live sushi cooking or hotpot services to justify the higher spending and enhance the dining experience.

Smoker & Alcohol Preferences

Customers who prefer Chinese cuisine tend to smoke and drink alcohol more than others.

Recommendation: Consider creating a designated smoking area and expanding alcoholic beverage options to cater to these preferences.