Neste projeto, analisaremos o dataset **Ramen Ratings**, disponível no Kaggle, que contém milhares de avaliações de diferentes marcas e estilos de ramen ao redor do mundo.

**Objetivos principais:**
- Investigar quais países produzem os ramens mais bem avaliados;
- Identificar as marcas mais consistentes em qualidade;
- Explorar padrões entre estilo (cup, bowl, pack) e nota;
- Criar visualizações que contem a história dos dados.

In [82]:
import pandas as pd

df = pd.read_csv('/home/marcus-vinicius/Desktop/Python/Kaggle/ramen-ratings.csv')

Conhecendo meus dados:

In [83]:
df.head()

Unnamed: 0,Review #,Brand,Variety,Style,Country,Stars,Top Ten
0,2580,New Touch,T's Restaurant Tantanmen,Cup,Japan,3.75,
1,2579,Just Way,Noodles Spicy Hot Sesame Spicy Hot Sesame Guan...,Pack,Taiwan,1.0,
2,2578,Nissin,Cup Noodles Chicken Vegetable,Cup,USA,2.25,
3,2577,Wei Lih,GGE Ramen Snack Tomato Flavor,Pack,Taiwan,2.75,
4,2576,Ching's Secret,Singapore Curry,Pack,India,3.75,


In [84]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2580 entries, 0 to 2579
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Review #  2580 non-null   int64 
 1   Brand     2580 non-null   object
 2   Variety   2580 non-null   object
 3   Style     2578 non-null   object
 4   Country   2580 non-null   object
 5   Stars     2580 non-null   object
 6   Top Ten   41 non-null     object
dtypes: int64(1), object(6)
memory usage: 141.2+ KB


In [85]:
# A coluna 'Star' é float porem está como object. Aqui mudamos isso!
df['Stars'] = pd.to_numeric(df['Stars'], errors= 'coerce')
# Deixando o nome das colunas, valores str uniformes
df.columns = df.columns.str.lower().str.replace(' ','_')

categorical_columns = list(df.dtypes[df.dtypes == 'object'].index)
# Aqui faço uma limpeza nas str
for c in categorical_columns:
    df[c] = df[c].str.lower().str.replace(' ', '_')
    df[c] = df[c].str.lower().str.replace('-', '_')

In [86]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2580 entries, 0 to 2579
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   review_#  2580 non-null   int64  
 1   brand     2580 non-null   object 
 2   variety   2580 non-null   object 
 3   style     2578 non-null   object 
 4   country   2580 non-null   object 
 5   stars     2577 non-null   float64
 6   top_ten   41 non-null     object 
dtypes: float64(1), int64(1), object(5)
memory usage: 141.2+ KB


In [87]:
df.head()

Unnamed: 0,review_#,brand,variety,style,country,stars,top_ten
0,2580,new_touch,t's_restaurant_tantanmen_,cup,japan,3.75,
1,2579,just_way,noodles_spicy_hot_sesame_spicy_hot_sesame_guan...,pack,taiwan,1.0,
2,2578,nissin,cup_noodles_chicken_vegetable,cup,usa,2.25,
3,2577,wei_lih,gge_ramen_snack_tomato_flavor,pack,taiwan,2.75,
4,2576,ching's_secret,singapore_curry,pack,india,3.75,


In [88]:
# Procurando valores NaN/vazios
df.isnull().sum()

review_#       0
brand          0
variety        0
style          2
country        0
stars          3
top_ten     2539
dtype: int64

In [89]:
df[df['style'].isnull()]

Unnamed: 0,review_#,brand,variety,style,country,stars,top_ten
2152,428,kamfen,e_menm_chicken,,china,3.75,
2442,138,unif,100_furong_shrimp,,taiwan,3.0,


In [90]:
# Valores incoerentes/incompletos
df[df['top_ten'].isnull() == False]

Unnamed: 0,review_#,brand,variety,style,country,stars,top_ten
616,1964,mama,instant_noodles_coconut_milk_flavour,pack,myanmar,5.0,2016_#10
633,1947,prima_taste,singapore_laksa_wholegrain_la_mian,pack,singapore,5.0,2016_#1
655,1925,prima,juzz's_mee_creamy_chicken_flavour,pack,singapore,5.0,2016_#8
673,1907,prima_taste,singapore_curry_wholegrain_la_mian,pack,singapore,5.0,2016_#5
752,1828,tseng_noodles,scallion_with_sichuan_pepper__flavor,pack,taiwan,5.0,2016_#9
891,1689,wugudaochang,tomato_beef_brisket_flavor_purple_potato_noodle,pack,china,5.0,2016_#7
942,1638,a_sha_dry_noodle,veggie_noodle_tomato_noodle_with_vine_ripened_...,pack,taiwan,5.0,2015_#10
963,1617,mykuali,penang_hokkien_prawn_noodle_(new_improved_taste),pack,malaysia,5.0,2015_#7
995,1585,carjen,nyonya_curry_laksa,pack,malaysia,5.0,2015_#4
1059,1521,maruchan,gotsumori_sauce_yakisoba,tray,japan,5.0,2015_#9


In [91]:
df['style'].value_counts()

style
pack    1531
bowl     481
cup      450
tray     108
box        6
can        1
bar        1
Name: count, dtype: int64

In [92]:
df['brand'].value_counts()

brand
nissin         381
nongshim        98
mama            98
maruchan        76
paldo           66
              ... 
katoz            1
osaka_ramen      1
pamana           1
sokensha         1
westbrae         1
Name: count, Length: 351, dtype: int64

In [93]:
df['country'].value_counts()

country
japan            352
usa              323
south_korea      309
taiwan           224
thailand         191
china            169
malaysia         156
hong_kong        137
indonesia        126
singapore        109
vietnam          108
uk                69
philippines       47
canada            41
india             31
germany           27
mexico            25
australia         22
netherlands       15
myanmar           14
nepal             14
pakistan           9
hungary            9
bangladesh         7
colombia           6
brazil             5
cambodia           5
fiji               4
holland            4
poland             4
finland            3
sarawak            3
sweden             3
dubai              3
ghana              2
estonia            2
nigeria            1
united_states      1
Name: count, dtype: int64