In this challenge you must analyze demographic data using Pandas. You are given a dataset of demographic data that was extracted from the 1994 Census database.


You must use Pandas to answer the following questions:

1. How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (race column)
2. What is the average age of men?
3. What is the percentage of people who have a Bachelor's degree?
4. What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?
5. What percentage of people without advanced education make more than 50K?
6. What is the minimum number of hours a person works per week?
7. What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?
8. What country has the highest percentage of people that earn >50K and what is that percentage?
9. Identify the most popular occupation for those who earn >50K in India.

In [1]:
import numpy as np
import pandas as pd

df=pd.read_csv('adult.data.csv')
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [3]:
# 1-Question.
# How many people of each race are represented in this dataset? 
# This should be a Pandas series with race names as the index labels. (race column)

race_counts=df['race'].value_counts()
race_counts

White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: race, dtype: int64

In [6]:
# 2-Question.
# What is the average age of men?
avg_age_men=df[df['sex']=='Male']['age'].mean()
avg_age_men

39.43354749885268

In [11]:
# 3. What is the percentage of people who have a Bachelor's degree?
bachelor_percentage = (df[df['education']=='Bachelors'].shape[0]/df.shape[0])*100
bachelor_percentage

16.44605509658794

In [15]:
# 4. What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?
advanced_education = df[df['education'].isin(["Bachelors", "Masters", "Doctorate"])]
percentage_advanced_education_high_income = (advanced_education[advanced_education['salary'] == '>50K'].shape[0] / advanced_education.shape[0]) * 100
percentage_advanced_education_high_income

46.535843011613935

In [17]:
# 5. What percentage of people without advanced education make more than 50K?
no_advanced_education=df[~ df['education'].isin(["Bachelors", "Masters", "Doctorate"])]
percentage_no_advanced_education_high_income = (no_advanced_education[no_advanced_education['salary'] == '>50K'].shape[0] / no_advanced_education.shape[0]) * 100
percentage_no_advanced_education_high_income

17.3713601914639

In [19]:
# 6. What is the minimum number of hours a person works per week?
min_hours = df['hours-per-week'].min()
min_hours


1

In [20]:
# 7. What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?

min_hours_people = df[df['hours-per-week'] == min_hours]
percentage_min_hours_high_income = (min_hours_people[min_hours_people['salary'] == '>50K'].shape[0] / min_hours_people.shape[0]) * 100
percentage_min_hours_high_income

10.0

In [30]:
# 8. What country has the highest percentage of people that earn >50K and what is that percentage?
country_salary_percentage = df.groupby('native-country')['salary'].value_counts(normalize=True).unstack().fillna(0)
highest_earning_country = (country_salary_percentage['>50K'] * 100).idxmax()
highest_percentage = (country_salary_percentage['>50K'] * 100).max()
highest_earning_country

'Iran'

In [27]:
# 9. Identify the most popular occupation for those who earn >50K in India.
india_high_income = df[(df['native-country'] == 'India') & (df['salary'] == '>50K')]
most_popular_occupation_india = india_high_income['occupation'].mode()[0]
most_popular_occupation_india

'Prof-specialty'

In [29]:
# Display the results
print("1. Race Counts:\n", race_counts)
print("\n2. Average age of men:", avg_age_men)
print("\n3. Percentage of people with a Bachelor's degree:", bachelor_percentage)
print("\n4. Percentage of people with advanced education making >50K:", percentage_advanced_education_high_income)
print("\n5. Percentage of people without advanced education making >50K:", percentage_no_advanced_education_high_income)
print("\n6. Minimum hours per week:", min_hours)
print("\n7. Percentage of people working minimum hours with >50K salary:", percentage_min_hours_high_income)
print("\n8. Country with highest percentage of people earning >50K:", highest_earning_country, "with", highest_percentage, "%")
print("\n9. Most popular occupation for those earning >50K in India:", most_popular_occupation_india)

1. Race Counts:
 White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: race, dtype: int64

2. Average age of men: 39.43354749885268

3. Percentage of people with a Bachelor's degree: 16.44605509658794

4. Percentage of people with advanced education making >50K: 46.535843011613935

5. Percentage of people without advanced education making >50K: 17.3713601914639

6. Minimum hours per week: 1

7. Percentage of people working minimum hours with >50K salary: 10.0

8. Country with highest percentage of people earning >50K: Iran with 41.86046511627907 %

9. Most popular occupation for those earning >50K in India: Prof-specialty
