# Cerealie

dataset je dostupny tu: https://www.kaggle.com/datasets/crawford/80-cereals

## Questions to Ask

1. Ktoré cereálie majú najmenej kalórií?
2. Ktoré cereálie majú najviac proteínov?
3. Ktoré cereálie majú najviac proteínov a zároveň najmenej cukru?
4. Ktoré majú najvyšší rating?
5. Top 10 najlepších cereálií?
6. Ktoré cereálie majú najviac proteínov a zároveň najmenej cukru na jeden hrnček?
7. Ktoré cereálie vyrába Kellogs? Ktore cerealie nevyraba Kelloggs?
8. Ktoré cereálie viem zalievať teplým nápojom?
9. Ktoré neobsahujú žiadne vitamíny?
10. Ktoré sú najväčšou kalorickou bombou?

## Data Cleanup

In [1]:
import pandas as pd

df = pd.read_csv('data/cereals.csv', index_col=0)

In [2]:
# prevedieme hmotnost z ounces na kilogramy
df['weight'] = df['weight'] * 28.3495231

In [3]:
# normalizujeme data na 1 hrncek
df['weight'] = (1 / df['cups']) * df['weight']
df['vitamins'] = (1 / df['cups']) * df['vitamins']
df['potass'] = (1 / df['cups']) * df['potass']
df['sugars'] = (1 / df['cups']) * df['sugars']
df['carbo'] = (1 / df['cups']) * df['carbo']
df['fiber'] = (1 / df['cups']) * df['fiber']
df['sodium'] = (1 / df['cups']) * df['sodium']
df['fat'] = (1 / df['cups']) * df['fat']
df['protein'] = (1 / df['cups']) * df['protein']
df['calories'] = (1 / df['cups']) * df['calories']

# for column in ('weight', 'vitamins', 'potass', 'sugars', 'carbo', 'fiber', 'sodium', 'fat', 'protein', 'calories'):
#     df[column] = (1/df['cups']) * df[column]
    
df['cups'] = 1

In [4]:
df

Unnamed: 0,name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
0,100% Bran,N,C,212.121212,12.121212,3.030303,393.939394,30.303030,15.151515,18.181818,848.484848,75.757576,3,85.907646,1,68.402973
1,100% Natural Bran,Q,C,120.000000,3.000000,5.000000,15.000000,2.000000,8.000000,8.000000,135.000000,0.000000,3,28.349523,1,33.983679
2,All-Bran,K,C,212.121212,12.121212,3.030303,787.878788,27.272727,21.212121,15.151515,969.696970,75.757576,3,85.907646,1,59.425505
3,All-Bran with Extra Fiber,K,C,100.000000,8.000000,0.000000,280.000000,28.000000,16.000000,0.000000,660.000000,50.000000,3,56.699046,1,93.704912
4,Almond Delight,R,C,146.666667,2.666667,2.666667,266.666667,1.333333,18.666667,10.666667,-1.333333,33.333333,3,37.799364,1,34.384843
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
72,Triples,G,C,146.666667,2.666667,1.333333,333.333333,0.000000,28.000000,4.000000,80.000000,33.333333,3,37.799364,1,39.106174
73,Trix,G,C,110.000000,1.000000,1.000000,140.000000,0.000000,13.000000,12.000000,25.000000,25.000000,2,28.349523,1,27.753301
74,Wheat Chex,R,C,149.253731,4.477612,1.492537,343.283582,4.477612,25.373134,4.477612,171.641791,37.313433,1,42.312721,1,49.787445
75,Wheaties,G,C,100.000000,3.000000,1.000000,200.000000,3.000000,17.000000,3.000000,110.000000,25.000000,1,28.349523,1,51.592193


## Solutions

### 1. Ktoré cereálie majú najmenej kalórií?

In [5]:
filter_min_calories = df['calories'] == df['calories'].min()
df.loc[ filter_min_calories, ('name', 'calories')]

Unnamed: 0,name,calories
54,Puffed Rice,50.0
55,Puffed Wheat,50.0


### 2. Ktoré cereálie majú najviac proteínov?

In [6]:
filter_max_protein = df['protein'] == df['protein'].max()
df.loc[ filter_max_protein, ('name', 'protein') ]

Unnamed: 0,name,protein
0,100% Bran,12.121212
2,All-Bran,12.121212


### 3. Ktoré cereálie majú najviac proteínov a zároveň najmenej cukru?

In [7]:
filter_sugar = (df['sugars'] >= 0)
df_filtered = df.loc[ filter_sugar, : ]

filter_protein_and_sugar = (df_filtered['protein'] == df_filtered['protein'].max()) | (df_filtered['sugars'] == df_filtered['sugars'].min())
df_filtered.loc[ filter_protein_and_sugar, ('name', 'protein', 'sugars') ]

Unnamed: 0,name,protein,sugars
0,100% Bran,12.121212,18.181818
2,All-Bran,12.121212,15.151515
3,All-Bran with Extra Fiber,8.0,0.0
20,Cream of Wheat (Quick),3.0,0.0
54,Puffed Rice,1.0,0.0
55,Puffed Wheat,2.0,0.0
63,Shredded Wheat,2.0,0.0
64,Shredded Wheat 'n'Bran,4.477612,0.0
65,Shredded Wheat spoon size,4.477612,0.0


### 4. Ktoré majú najvyšší rating?

In [8]:
filter_best_rating = df['rating'] == df['rating'].max()
df.loc[ filter_best_rating, ('name', 'mfr', 'rating') ]

Unnamed: 0,name,mfr,rating
3,All-Bran with Extra Fiber,K,93.704912


### 5. Top 10 najlepších cereálií?

In [9]:
# sortujeme a vyberieme 10 najlepsich
df.sort_values(by='rating', ascending=False).head(10).loc[:, ('name', 'rating')]

Unnamed: 0,name,rating
3,All-Bran with Extra Fiber,93.704912
64,Shredded Wheat 'n'Bran,74.472949
65,Shredded Wheat spoon size,72.801787
0,100% Bran,68.402973
63,Shredded Wheat,68.235885
20,Cream of Wheat (Quick),64.533816
55,Puffed Wheat,63.005645
54,Puffed Rice,60.756112
50,Nutri-grain Wheat,59.642837
2,All-Bran,59.425505


In [10]:
# pomocou metody n-largest
df.nlargest(n=10, columns=['rating']).loc[:, ('name', 'rating')]

Unnamed: 0,name,rating
3,All-Bran with Extra Fiber,93.704912
64,Shredded Wheat 'n'Bran,74.472949
65,Shredded Wheat spoon size,72.801787
0,100% Bran,68.402973
63,Shredded Wheat,68.235885
20,Cream of Wheat (Quick),64.533816
55,Puffed Wheat,63.005645
54,Puffed Rice,60.756112
50,Nutri-grain Wheat,59.642837
2,All-Bran,59.425505


### 6. Ktoré cereálie majú najviac proteínov a zároveň najmenej cukru na jeden hrnček?

In [11]:
df['sugars'] / df['cups']
# filter_sugar = (df['sugars'] >= 0)
# df_filtered = df.loc[ filter_sugar, : ]

# filter_protein_and_sugar = (df_filtered['protein'] == df_filtered['protein'].max()) | (df_filtered['sugars'] == df_filtered['sugars'].min())
# df_filtered.loc[ filter_protein_and_sugar, : ]



0     18.181818
1      8.000000
2     15.151515
3      0.000000
4     10.666667
        ...    
72     4.000000
73    12.000000
74     4.477612
75     3.000000
76    10.666667
Length: 77, dtype: float64

### 7. Ktoré cereálie vyrába Kellogs?

In [12]:
filter_kelloggs = df['mfr'] == 'K'
df.loc[ filter_kelloggs, ('name', 'mfr') ]

Unnamed: 0,name,mfr
2,All-Bran,K
3,All-Bran with Extra Fiber,K
6,Apple Jacks,K
16,Corn Flakes,K
17,Corn Pops,K
19,Cracklin' Oat Bran,K
21,Crispix,K
24,Froot Loops,K
25,Frosted Flakes,K
26,Frosted Mini-Wheats,K


Ktore cerealie nevyraba Kelloggs?

In [13]:
# filter_not_kelloggs = (df['mfr'] == 'N') | (df['mfr'] == 'Q') |
df.loc[ ~filter_kelloggs, ('name', 'mfr') ]

Unnamed: 0,name,mfr
0,100% Bran,N
1,100% Natural Bran,Q
4,Almond Delight,R
5,Apple Cinnamon Cheerios,G
7,Basic 4,G
8,Bran Chex,R
9,Bran Flakes,P
10,Cap'n'Crunch,Q
11,Cheerios,G
12,Cinnamon Toast Crunch,G


### 8. Ktoré cereálie viem zalievať teplým nápojom?

In [14]:
filter_hot = df['type'] == 'H'
df.loc[ filter_hot, ('name', 'type') ]

Unnamed: 0,name,type
20,Cream of Wheat (Quick),H
43,Maypo,H
57,Quaker Oatmeal,H


### 9. Ktoré neobsahujú žiadne vitamíny?

In [15]:
filter_no_vitamins = df['vitamins'] == 0
df.loc[ filter_no_vitamins, ('name', 'vitamins') ]

Unnamed: 0,name,vitamins
1,100% Natural Bran,0.0
20,Cream of Wheat (Quick),0.0
54,Puffed Rice,0.0
55,Puffed Wheat,0.0
57,Quaker Oatmeal,0.0
63,Shredded Wheat,0.0
64,Shredded Wheat 'n'Bran,0.0
65,Shredded Wheat spoon size,0.0


### 10. Ktoré sú najväčšou kalorickou bombou?

In [16]:
filter_max_calories = df['calories'] == df['calories'].max()
df.loc[ filter_max_calories, ('name', 'calories', 'protein') ]

Unnamed: 0,name,calories,protein
33,Grape-Nuts,440.0,12.0
