# US - Baby Names

### Introduction:

We are going to use a subset of [US Baby Names](https://www.kaggle.com/kaggle/us-baby-names) from Kaggle.  
In the file it will be names from 2004 until 2014


### Step 1. Import the necessary libraries

In [1]:
# Step 1: Import the necessary libraries
import pandas as pd


### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/thieu1995/csv-files/main/data/pandas/US_Baby_Names_right.csv).

### Step 3. Assign it to a variable called baby_names.

In [4]:
# Step 2 & Step 3: Import the dataset from the URL and assign it to baby_names
url = 'https://raw.githubusercontent.com/thieu1995/csv-files/main/data/pandas/US_Baby_Names_right.csv'
baby_names = pd.read_csv(url, sep=',')



### Step 4. See the first 10 entries

In [3]:
# Step 4: See the first 10 entries
baby_names.head(10)


Unnamed: 0.1,Unnamed: 0,Id,Name,Year,Gender,State,Count
0,11349,11350,Emma,2004,F,AK,62
1,11350,11351,Madison,2004,F,AK,48
2,11351,11352,Hannah,2004,F,AK,46
3,11352,11353,Grace,2004,F,AK,44
4,11353,11354,Emily,2004,F,AK,41
5,11354,11355,Abigail,2004,F,AK,37
6,11355,11356,Olivia,2004,F,AK,33
7,11356,11357,Isabella,2004,F,AK,30
8,11357,11358,Alyssa,2004,F,AK,29
9,11358,11359,Sophia,2004,F,AK,28


### Step 5. Delete the column 'Unnamed: 0' and 'Id'

In [5]:
# Step 5: Delete the column 'Unnamed: 0' and 'Id'
baby_names.drop(columns=['Unnamed: 0', 'Id'], inplace=True, errors='ignore')
baby_names.head(10)


Unnamed: 0,Name,Year,Gender,State,Count
0,Emma,2004,F,AK,62
1,Madison,2004,F,AK,48
2,Hannah,2004,F,AK,46
3,Grace,2004,F,AK,44
4,Emily,2004,F,AK,41
5,Abigail,2004,F,AK,37
6,Olivia,2004,F,AK,33
7,Isabella,2004,F,AK,30
8,Alyssa,2004,F,AK,29
9,Sophia,2004,F,AK,28


### Step 6. Is there more male or female names in the dataset?

In [6]:
# Step 6: Is there more male or female names in the dataset?
baby_names['Gender'].value_counts()


Unnamed: 0_level_0,count
Gender,Unnamed: 1_level_1
F,558846
M,457549


### Step 7. Group the dataset by name and assign to names

In [7]:
# Step 7: Group the dataset by name and assign to names
names = baby_names.groupby('Name').agg({'Count': 'sum'})
names.head()


Unnamed: 0_level_0,Count
Name,Unnamed: 1_level_1
Aaban,12
Aadan,23
Aadarsh,5
Aaden,3426
Aadhav,6


### Step 8. How many different names exist in the dataset?

In [8]:
# Step 8: How many different names exist in the dataset?
num_unique_names = names.shape[0]
print("Number of different names:", num_unique_names)


Number of different names: 17632


### Step 9. What is the name with most occurrences?

In [9]:
# Step 9: What is the name with most occurrences?
most_common_name = names['Count'].idxmax()
most_common_count = names['Count'].max()
print(f"The name with most occurrences is {most_common_name} with {most_common_count} occurrences.")


The name with most occurrences is Jacob with 242874 occurrences.


### Step 10. How many different names have the least occurrences?

In [10]:
# Step 10: How many different names have the least occurrences?
min_count = names['Count'].min()
least_names = names[names['Count'] == min_count]
num_least_names = len(least_names)
print(f"Least occurrences count: {min_count}, number of names with that count: {num_least_names}")



Least occurrences count: 5, number of names with that count: 2578


### Step 11. What is the median name occurrence?

In [11]:
# Step 11: What is the median name occurrence?
median_occurrence = names['Count'].median()
print("Median name occurrence:", median_occurrence)


Median name occurrence: 49.0


### Step 12. What is the standard deviation of names?

In [12]:
# Step 12: What is the standard deviation of names?
std_occurrence = names['Count'].std()
print("Standard Deviation of name occurrences:", std_occurrence)


Standard Deviation of name occurrences: 11006.069467891111


### Step 13. Get a summary with the mean, min, max, std and quartiles.

In [13]:
# Step 13: Get a summary with the mean, min, max, std and quartiles.
names['Count'].describe()


Unnamed: 0,Count
count,17632.0
mean,2008.932169
std,11006.069468
min,5.0
25%,11.0
50%,49.0
75%,337.0
max,242874.0
