In [79]:
import pandas as pd

df = pd.read_excel (r'/Users/Mikayla/Desktop/RamenSideProject/The-Ramen-Rater-The-Big-List-1-3400-Current-As-Of-Jan-25-2020.xlsx')
df.head()

Unnamed: 0,Review #,Brand,Variety,Style,Country,Stars
0,3400,EDO Pack,Kumamoto Flavour Noodles,Cup,Hong Kong,1.0
1,3399,Pan Mee,Goreng Dried Chili Shrimp Flavour,Pack,Malaysia,5.0
2,3398,Paldo,King Lid Ramen Noodle Soup,Pack,South Korea,5.0
3,3397,Nissin Miojo,Cremoso Carno Com Chili,Pack,Brazil,2.0
4,3396,Samyang Foods,Cham Ramen Big Bowl,Bowl,South Korea,2.25


# Data Overview
I downloaded the dataset from https://www.theramenrater.com/. This includes 3400 reviews by various ramen enthusiasts. For each review, they include the brand, variety of ramen, serving style, country, and the rating out of 5 stars. 

# Frequencies
First, I thought it would be hepful to look at the frequencies for each of the main categories (Brand, Variety, Style, and Country). I wanted to know what kind of overlap existed for reviews of certain ramen vaireties. I also wanted to know the most commonly reviewed style and the country doing the most reviewing. 

## Styles
Packs of ramen are more commonly reviewed, followed by bowl, cup, and tray. When looking at data filtered by style, I will primarily focus on these more common types. 

In [80]:
df['Style'].value_counts()

Pack          1948
Bowl           655
Cup            593
Tray           151
Box             48
Restaurant       3
Can              1
Bar              1
Name: Style, dtype: int64

## Brands
I wanted to look at the most commonly reviewed brands, which met my expectation as I am familiar with the 3 most reviewed brands.

In [81]:
df['Brand'].value_counts()

Nissin                     460
Maruchan                   120
Nongshim                   113
Myojo                       96
Samyang Foods               93
Paldo                       76
Mama                        71
Indomie                     56
Sapporo Ichiban             54
Ottogi                      48
Acecook                     41
Maggi                       38
KOKA                        36
Vifon                       35
Lucky Me!                   34
Vina Acecook                34
MAMA                        30
MyKuali                     30
Ve Wong                     29
Master Kong                 29
Mamee                       29
Vedan                       28
A-Sha Dry Noodle            26
Wei Lih                     26
JML                         26
Wai Wai                     25
Yum Yum                     23
Wu-Mu                       23
Sau Tao                     20
Samyang                     19
                          ... 
Seven-Eleven                 1
Fukumen 

## Countries
Japan reviews the most ramen on this website, as may be expected with ramen's popularity in Japan. This also made me note some inconsistency on country name. There are small counts incorrectly separated due to United States and USA being counted differently as well as United Kingdom and UK. 

In [82]:
df['Country'].value_counts()

Japan             606
United States     419
South Korea       383
Taiwan            351
China             217
Thailand          208
Malaysia          190
Hong Kong         159
Indonesia         152
Singapore         136
Vietnam           112
UK                 69
Canada             56
Philippines        51
India              41
Mexico             32
Germany            28
Australia          25
Brazil             20
Netherlands        16
Myanmar            14
Nepal              14
Bangladesh         12
Pakistan            9
Hungary             9
Poland              6
Colombia            6
Cambodia            5
Russia              5
Sarawak             5
Fiji                4
Holland             4
France              4
Italy               4
Peru                3
Sweden              3
Ukraine             3
Finland             3
Dubai               3
Spain               2
Ghana               2
Estonia             2
Nigeria             2
Portugal            1
New Zealand         1
United Kin

## Varieties
The counts by variety are not very useful by themselves. They did give me an idea of some slight overlap in varieties. These will be useful as a guideline for later finding if anyone has reviewed the same variety by the same brand. 

In [83]:
df['Variety'].value_counts()

Beef                                                    7
Miso Ramen                                              7
Yakisoba                                                7
Chicken                                                 7
Vegetable                                               6
Artificial Chicken                                      6
Instant Noodles Chicken Flavour                         4
Curry Flavour Instant Noodles                           4
Curry Udon                                              4
Tempura Soba                                            4
Artificial Beef Flavor                                  4
Spicy Beef                                              4
Instant Noodles Beef Flavour                            4
Hot & Spicy Flavour Noodle Soup                         4
Tonkotsu Ramen                                          4
Chicken Flavor                                          4
Chili Chicken Flavour Noodle Soup                       4
Artificial Spi

# Cleaning the data
There are some inconsistent datatypes within the star rating columns. They will influence the ease of aggregate functions later in analysis. Looking at the frequencies of different ratings, I will remove any records with a star rating of datatype 'str'. 

In [84]:
df['Stars'].value_counts()

5                      593
3.75                   499
3.5                    486
4                      395
3.25                   214
4.5                    207
4.25                   200
3                      167
2.75                   109
2.5                     89
2                       89
4.75                    88
1.5                     48
1                       37
0                       37
2.25                    28
1.75                    28
0.5                     19
0.25                    16
1.25                    13
NR                       3
Unrated                  3
2.8                      2
3.1                      2
1.1                      2
2.9                      2
4.25/5                   2
4.125                    2
0.75                     1
5/2.5                    1
2.125                    1
2017-05-05 00:00:00      1
3.4                      1
2.3                      1
3.2                      1
1.8                      1
0.1                      1
3

In [85]:
# Using to_numeric changes the string values to null. All records with a star value are then kept in the df.
df = df[pd.to_numeric(df['Stars'], errors='coerce').notnull()]

In [86]:
# Checking to make sure the cleaning was sucessful, I have the counts for the remaining unique star rating values.
df['Stars'].value_counts()

5.000    593
3.750    499
3.500    486
4.000    395
3.250    214
4.500    207
4.250    200
3.000    167
2.750    109
2.000     89
2.500     89
4.750     88
1.500     48
0.000     37
1.000     37
2.250     28
1.750     28
0.500     19
0.250     16
1.250     13
1.100      2
4.125      2
2.800      2
2.900      2
3.100      2
0.750      1
3.400      1
3.125      1
2.100      1
3.650      1
0.900      1
3.700      1
2.850      1
3.600      1
0.100      1
1.800      1
3.200      1
2.300      1
2.125      1
Name: Stars, dtype: int64

# Deeper Dive into Brands
I am looking to answer two main questions in this section. First, what styles are being rated most within each brand? I am interested in how this might skew the brand average ratings depending on what style is the most popular (either inflating or dragging down the brand rating). I am also interested in looking simply at the average ratings for each brand (I will look at brands only with a certain amount of ratings- I want to avoid a brand with one rating coming out on top from one 5 star rating). 

In [93]:
# The average rating for all reviews
df['Stars'].mean()
df.head()


Unnamed: 0,Review #,Brand,Variety,Style,Country,Stars
0,3400,EDO Pack,Kumamoto Flavour Noodles,Cup,Hong Kong,1.0
1,3399,Pan Mee,Goreng Dried Chili Shrimp Flavour,Pack,Malaysia,5.0
2,3398,Paldo,King Lid Ramen Noodle Soup,Pack,South Korea,5.0
3,3397,Nissin Miojo,Cremoso Carno Com Chili,Pack,Brazil,2.0
4,3396,Samyang Foods,Cham Ramen Big Bowl,Bowl,South Korea,2.25


In [112]:
#The average rating for each brand
grouped = pd.to_numeric(df['Stars']).groupby(df['Brand'])
grouped.mean()



Brand
1 To 3 Noodles         4.000000
7 Select               3.625000
7 Select/Nissin        3.500000
7-Eleven / Nissin      4.250000
A-One                  2.750000
A-Sha                  4.525000
A-Sha Dry Noodle       4.076923
A1                     3.083333
ABC                    4.208333
Acecook                3.207317
Adabi                  3.812500
Ah Lai                 4.750000
Aji-no-men             3.333333
Ajinatori              3.000000
Ajinomoto              2.000000
Alhami                 3.892857
Amianda                3.950000
Amino                  3.500000
Annie Chun's           3.479167
Aroi                   4.500000
Asia Gold              3.437500
Asian Thai Foods       3.517857
Atomy                  4.500000
Authentically Asian    1.000000
Azami                  1.950000
Baijia                 1.250000
Baixiang Noodles       3.700000
Baltix                 3.500000
Bamee                  3.300000
Banzai                 3.500000
                         ...   
Wa

In [117]:
#The average rating for each brand
grouped = df.groupby(['Brand','Style'])['Style'].count()
print(grouped)

Brand              Style
1 To 3 Noodles     Pack      1
7 Select           Bowl      2
7 Select/Nissin    Cup       1
7-Eleven / Nissin  Cup       1
A-One              Cup       4
A-Sha              Box       3
                   Pack      7
A-Sha Dry Noodle   Pack     23
                   Tray      3
A1                 Pack      3
ABC                Cup       6
                   Pack      6
Acecook            Bowl     13
                   Cup      20
                   Pack      2
                   Tray      6
Adabi              Pack      4
Ah Lai             Pack      2
Aji-no-men         Pack      3
Ajinatori          Pack      2
Ajinomoto          Cup       1
Alhami             Pack      7
Amianda            Pack     10
Amino              Pack      3
Annie Chun's       Bowl      4
                   Pack      5
                   Tray      3
Aroi               Pack      2
Asia Gold          Pack      4
Asian Thai Foods   Pack     14
                            ..
Western Family

# Deeper dive- Style
Do certain styles carry a higher rating than others?

In [119]:
groupedstyle = pd.to_numeric(df['Stars']).groupby(df['Style'])
groupedstyle.mean()

Style
Bar           5.000000
Bowl          3.708397
Box           4.390625
Can           3.500000
Cup           3.475759
Pack          3.760677
Restaurant    3.583333
Tray          3.527318
Name: Stars, dtype: float64

# Deeper Dive- Country

In [120]:
groupedcountry = pd.to_numeric(df['Stars']).groupby(df['Country'])
groupedcountry.mean()

Country
Australia         3.260000
Bangladesh        3.479167
Brazil            3.675000
Cambodia          4.200000
Canada            2.330357
China             3.463364
Colombia          3.291667
Dubai             3.583333
Estonia           3.500000
Fiji              3.875000
Finland           3.583333
France            4.187500
Germany           3.580357
Ghana             3.500000
Holland           3.562500
Hong Kong         3.794025
Hungary           3.611111
India             3.371951
Indonesia         4.111842
Italy             2.750000
Japan             3.916045
Malaysia          4.166138
Mexico            3.609375
Myanmar           3.946429
Nepal             3.517857
Netherlands       2.500000
New Zealand       3.000000
Nigeria           2.375000
Pakistan          2.916667
Peru              3.333333
Philippines       3.382353
Phlippines        3.500000
Poland            3.083333
Portugal          2.000000
Russia            3.450000
Sarawak           4.000000
Singapore         4.