# US - Baby Names

### Introduction:

We are going to use a subset of [US Baby Names](https://www.kaggle.com/kaggle/us-baby-names) from Kaggle.  
In the file it will be names from 2004 until 2014


### Step 1. Import the necessary libraries

In [1]:
import pandas as pd
import numpy as np

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/US_Baby_Names/US_Baby_Names_right.csv). 

### Step 3. Assign it to a variable called baby_names.

In [49]:
url = "https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/US_Baby_Names/US_Baby_Names_right.csv"
baby_names = pd.read_csv(url)
baby_names

Unnamed: 0.1,Unnamed: 0,Id,Name,Year,Gender,State,Count
0,11349,11350,Emma,2004,F,AK,62
1,11350,11351,Madison,2004,F,AK,48
2,11351,11352,Hannah,2004,F,AK,46
3,11352,11353,Grace,2004,F,AK,44
4,11353,11354,Emily,2004,F,AK,41
...,...,...,...,...,...,...,...
1016390,5647421,5647422,Seth,2014,M,WY,5
1016391,5647422,5647423,Spencer,2014,M,WY,5
1016392,5647423,5647424,Tyce,2014,M,WY,5
1016393,5647424,5647425,Victor,2014,M,WY,5


### Step 5. Delete the column 'Unnamed: 0' and 'Id'

In [14]:
baby_names.rename(columns={"Unnamed: 0" : "Id"}, inplace=True)

### Step 6. Is there more male or female names in the dataset?

In [27]:
# deletes Unnamed: 0
del baby_names['Unnamed: 0']

# deletes Unnamed: 0
del baby_names['Id']

baby_names.head()

Unnamed: 0,Name,Year,Gender,State,Count
0,Emma,2004,F,AK,62
1,Madison,2004,F,AK,48
2,Hannah,2004,F,AK,46
3,Grace,2004,F,AK,44
4,Emily,2004,F,AK,41


### Step 7. Group the dataset by name and assign to names

In [50]:
# you don't want to sum the Year column, so you delete it
del baby_names["Year"]

# group the data
names = baby_names.groupby("Name").sum()

# print the first 5 observations
names.head()

# print the size of the dataset
print(names.shape)

# sort it from the biggest value to the smallest one
names.sort_values("Count", ascending = 0).head()

(17632, 5)


Unnamed: 0_level_0,Unnamed: 0,Id,Gender,State,Count
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Jacob,1665680788,1665681356,MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM...,AKAKAKAKAKAKAKAKAKAKAKALALALALALALALALALALALAR...,242874
Emma,1629481684,1629482250,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF...,AKAKAKAKAKAKAKAKAKAKAKALALALALALALALALALALALAR...,214852
Michael,1687520717,1687521295,MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMFF...,AKAKAKAKAKAKAKAKAKAKAKALALALALALALALALALALALAR...,214405
Ethan,1660807908,1660808475,MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM...,AKAKAKAKAKAKAKAKAKAKAKALALALALALALALALALALALAR...,209277
Isabella,1630131220,1630131786,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF...,AKAKAKAKAKAKAKAKAKAKAKALALALALALALALALALALALAR...,204798


In [51]:
names

Unnamed: 0_level_0,Unnamed: 0,Id,Gender,State,Count
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Aaban,7733801,7733803,MM,NYNY,12
Aadan,7158061,7158065,MMMM,CACACATX,23
Aadarsh,1728030,1728031,M,IL,5
Aaden,555052029,555052225,MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM...,ALALALALALALALARARARAZAZAZAZAZAZCACACACACACACA...,3426
Aadhav,709606,709607,M,CA,6
...,...,...,...,...,...
Zyra,17538998,17539005,FFFFFFF,CACACAFLTXTXTX,42
Zyrah,5487073,5487075,FF,CATX,11
Zyren,5074229,5074230,M,TX,6
Zyria,29787029,29787039,FFFFFFFFFF,GAGAGALALALATXTXTXTX,59


### Step 8. How many different names exist in the dataset?

In [53]:
len(names)

17632

### Step 9. What is the name with most occurrences?

In [57]:
names.info()

<class 'pandas.core.frame.DataFrame'>
Index: 17632 entries, Aaban to Zyriah
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  17632 non-null  int64 
 1   Id          17632 non-null  int64 
 2   Gender      17632 non-null  object
 3   State       17632 non-null  object
 4   Count       17632 non-null  int64 
dtypes: int64(3), object(2)
memory usage: 1.3+ MB


### Step 10. How many different names have the least occurrences?

In [59]:
names[names.Count == names.Count.min()]
names

Unnamed: 0_level_0,Unnamed: 0,Id,Gender,State,Count
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Aaban,7733801,7733803,MM,NYNY,12
Aadan,7158061,7158065,MMMM,CACACATX,23
Aadarsh,1728030,1728031,M,IL,5
Aaden,555052029,555052225,MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM...,ALALALALALALALARARARAZAZAZAZAZAZCACACACACACACA...,3426
Aadhav,709606,709607,M,CA,6
...,...,...,...,...,...
Zyra,17538998,17539005,FFFFFFF,CACACAFLTXTXTX,42
Zyrah,5487073,5487075,FF,CATX,11
Zyren,5074229,5074230,M,TX,6
Zyria,29787029,29787039,FFFFFFFFFF,GAGAGALALALATXTXTXTX,59


### Step 11. What is the median name occurrence?

In [None]:
names

### Step 12. What is the standard deviation of names?

### Step 13. Get a summary with the mean, min, max, std and quartiles.