# Demographic Data Analyzer

This project will use Pandas and Numpy to analyze demographic data from a 1994 Census database to answer the following questions:
1. How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (race column)
2. What is the average age of men?
3. What is the percentage of people who have a Bachelor's degree?
4. What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?
5. What percentage of people without advanced education make more than 50K?
6. What is the minimum number of hours a person works per week?
7. What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?
8. What country has the highest percentage of people that earn >50K and what is that percentage?
9. Identify the most popular occupation for those who earn >50K in India.


This project was completed as part of FreeCodeCamp's course on Data Analysis with Python. 


## Exploring the Data

Let's start with importing the necessary libaries, reading in the data, and checking out the dataset.

In [4]:
import pandas as pd

In [5]:
# Read data from file
df = pd.read_csv("adult.data.csv")
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [6]:
# How many of each race are represented in this dataset? This should be a Pandas series with race names as the index labels.
race_count = df['race'].value_counts()
race_count

White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: race, dtype: int64

In [10]:
# What is the average age of men?
average_age_men = round(df[df["sex"] == "Male"]["age"].mean(),1)
average_age_men

39.4

In [13]:
# What is the percentage of people who have a Bachelor's degree?
bachelors = df[df["education"] == "Bachelors"]
percentage_bachelors = round(len(bachelors) / len(df) * 100,1)
percentage_bachelors

16.4

In [18]:
# What percentage of people with advanced education (`Bachelors`, `Masters`, or `Doctorate`) make more than 50K?
# What percentage of people without advanced education make more than 50K?

# with and without `Bachelors`, `Masters`, or `Doctorate`
higher_education = df[df["education"].isin(["Bachelors","Masters","Doctorate"])]
lower_education = df[~df["education"].isin(["Bachelors","Masters","Doctorate"])]
lower_education

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
6,49,Private,160187,9th,5,Married-spouse-absent,Other-service,Not-in-family,Black,Female,0,0,16,Jamaica,<=50K
7,52,Self-emp-not-inc,209642,HS-grad,9,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,45,United-States,>50K
10,37,Private,280464,Some-college,10,Married-civ-spouse,Exec-managerial,Husband,Black,Male,0,0,80,United-States,>50K
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32556,27,Private,257302,Assoc-acdm,12,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,38,United-States,<=50K
32557,40,Private,154374,HS-grad,9,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,40,United-States,>50K
32558,58,Private,151910,HS-grad,9,Widowed,Adm-clerical,Unmarried,White,Female,0,0,40,United-States,<=50K
32559,22,Private,201490,HS-grad,9,Never-married,Adm-clerical,Own-child,White,Male,0,0,20,United-States,<=50K


In [19]:
# percentage with salary >50K for higer education
high_income = higher_education[higher_education["salary"] == ">50K"]
higher_education_rich = round(len(high_income) / len(higher_education) *100,1)
higher_education_rich

46.5

In [20]:
# percentage with salary >50K for lower education
low_income = lower_education[lower_education["salary"] == ">50K"]
lower_education_rich = round(len(low_income) / len(lower_education) *100,1)
lower_education_rich

17.4

In [22]:
# What is the minimum number of hours a person works per week (hours-per-week feature)?
min_work_hours = df["hours-per-week"].min()
min_work_hours

1

In [25]:
# What percentage of the people who work the minimum number of hours per week have a salary of >50K?
num_min_workers = df[df["hours-per-week"] == min_work_hours]
rich_percentage = round(len(num_min_workers[num_min_workers["salary"] == ">50K"]) /len(num_min_workers) * 100,1)
rich_percentage

10.0

In [26]:
# What country has the highest percentage of people that earn >50K?
country_count = df["native-country"].value_counts()
country_rich = df[df["salary"] == ">50K"]["native-country"].value_counts()
country_rich  

United-States         7171
?                      146
Philippines             61
Germany                 44
India                   40
Canada                  39
Mexico                  33
England                 30
Italy                   25
Cuba                    25
Japan                   24
Taiwan                  20
China                   20
Iran                    18
South                   16
Puerto-Rico             12
France                  12
Poland                  12
Jamaica                 10
El-Salvador              9
Greece                   8
Cambodia                 7
Hong                     6
Yugoslavia               6
Vietnam                  5
Ireland                  5
Portugal                 4
Ecuador                  4
Haiti                    4
Scotland                 3
Hungary                  3
Guatemala                3
Thailand                 3
Laos                     2
Columbia                 2
Nicaragua                2
Peru                     2
D

In [28]:
highest_earning_country = (country_rich/country_count *100).idxmax()
highest_earning_country

'Iran'

In [29]:
highest_earning_country_percentage = round((country_rich / country_count *100).max(),1)
highest_earning_country_percentage

41.9

In [30]:
# Identify the most popular occupation for those who earn >50K in India.
rich_indians = df.loc[(df["native-country"]=="India") & (df["salary"]==">50K")]
rich_indians

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
11,30,State-gov,141297,Bachelors,13,Married-civ-spouse,Prof-specialty,Husband,Asian-Pac-Islander,Male,0,0,40,India,>50K
968,48,Private,164966,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,Asian-Pac-Islander,Male,0,0,40,India,>50K
1327,52,Private,168381,HS-grad,9,Widowed,Other-service,Unmarried,Asian-Pac-Islander,Female,0,0,40,India,>50K
7258,42,State-gov,102343,Prof-school,15,Married-civ-spouse,Prof-specialty,Husband,Asian-Pac-Islander,Male,0,0,72,India,>50K
7285,54,State-gov,93449,Masters,14,Married-civ-spouse,Prof-specialty,Husband,Asian-Pac-Islander,Male,0,0,40,India,>50K
8124,36,Private,172104,Prof-school,15,Never-married,Prof-specialty,Not-in-family,Other,Male,0,0,40,India,>50K
9939,43,Federal-gov,325706,Prof-school,15,Married-civ-spouse,Prof-specialty,Husband,Asian-Pac-Islander,Male,0,0,50,India,>50K
10590,35,Private,98283,Prof-school,15,Never-married,Prof-specialty,Not-in-family,Asian-Pac-Islander,Male,0,0,40,India,>50K
10661,59,Private,122283,Prof-school,15,Married-civ-spouse,Prof-specialty,Husband,Asian-Pac-Islander,Male,99999,0,40,India,>50K
10736,30,Private,243190,Prof-school,15,Married-civ-spouse,Prof-specialty,Husband,Asian-Pac-Islander,Male,0,0,20,India,>50K


In [32]:
occupation_count = rich_indians["occupation"].value_counts()
occupation_count

Prof-specialty      25
Exec-managerial      8
Other-service        2
Tech-support         2
Adm-clerical         1
Sales                1
Transport-moving     1
Name: occupation, dtype: int64

In [33]:
top_IN_occupation = occupation_count.idxmax()
top_IN_occupation

'Prof-specialty'

### END OF ASSIGNMENT