# Demographic Data Analyzer

![separator](files/sep.jpeg)

We analyze demographic data using Pandas:

In [200]:
import numpy as np
import pandas as pd

Given a dataset of demographic data that was extracted from the 1994 Census database. Here is a sample of what the data looks like:

In [219]:
df = pd.read_csv('files/adult.data.csv', na_values = ['?'])

df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


Using Pandas to get the answer to the following questions:
* How many people of each race are represented in this dataset?

In [212]:
race_count = df['race'].value_counts()

print(race_count)

White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: race, dtype: int64


* What is the average age of men?

In [213]:
mask = df['sex'] == 'Male'
average_age_men = df[mask]['age'].mean().round(1)

print(average_age_men)

39.4


* What is the percentage of people who have a Bachelor's degree?

In [221]:
mask = (df['education'] == 'Bachelors')
percentage_bachelors = df[mask].size / df.size

print("{:.2%}".format(percentage_bachelors))

16.45%


* What percentage of people with advanced education (`Bachelors`, `Masters`, or `Doctorate`) make more than 50K?

In [247]:
mask = ((df['education'] == 'Bachelors') | (df['education'] == 'Masters') | 
        (df['education'] == 'Doctorate'))
higher_education = df[mask]
higher_education_rich = higher_education[(higher_education['salary'] == '>50K')].size / higher_education.size 

print("{:.2%}".format(higher_education_rich))

46.54%


* What percentage of people without advanced education make more than 50K?

In [251]:
mask = ~((df['education'] == 'Bachelors') | (df['education'] == 'Masters') | 
        (df['education'] == 'Doctorate'))
lower_education = df[mask]
lower_education_rich = lower_education[(lower_education['salary'] == '>50K')].size / lower_education.size 

print("{:.2%}".format(lower_education_rich))

17.37%


* What is the minimum number of hours a person works per week?

In [252]:
min_work_hours = df['hours-per-week'].min()

print(min_work_hours)

1


* What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?

In [264]:
mask = (df['hours-per-week'] == min_work_hours)
num_min_workers = df[mask]
rich_percentage = num_min_workers[num_min_workers['salary'] == '>50K'].size / num_min_workers.size

print("{:.2%}".format(rich_percentage))

10.00%


* What country has the highest percentage of people that earn >50K?

In [317]:
mask = df['salary'] == '>50K'
rich_country = df['native-country'][mask].value_counts()

highest_earning_country = round(rich_country / df['native-country'].value_counts() * 100,2).max()
print(highest_earning_country)

Iran


and what is that percentage?

In [335]:
print("{}%".format(round(rich_country / df['native-country'].value_counts() * 100,2).max()))

41.86%


* Identify the most popular occupation for those who earn >50K in India. 

In [343]:
mask = ((df['salary'] == '>50K') & (df['native-country'] == 'India'))
top_IN_occupation = df[mask]['occupation'].value_counts().idxmax()

print(top_IN_occupation)

Prof-specialty
