# Demographic Data Analyzer

### Questions:
1. How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (race column)
2. What is the average age of men?
3. What is the percentage of people who have a Bachelor's degree?
4. What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?
5. What percentage of people without advanced education make more than 50K?
6. What is the minimum number of hours a person works per week?
7. What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?
8. What country has the highest percentage of people that earn >50K and what is that percentage?
9. Identify the most popular occupation for those who earn >50K in India. #

In [1]:
import numpy as np
import pandas as pd

In [2]:
adata = pd.read_csv("adult_data.csv")

In [3]:
adata.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [4]:
adata.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   age             32561 non-null  int64 
 1   workclass       32561 non-null  object
 2   fnlwgt          32561 non-null  int64 
 3   education       32561 non-null  object
 4   education-num   32561 non-null  int64 
 5   marital-status  32561 non-null  object
 6   occupation      32561 non-null  object
 7   relationship    32561 non-null  object
 8   race            32561 non-null  object
 9   sex             32561 non-null  object
 10  capital-gain    32561 non-null  int64 
 11  capital-loss    32561 non-null  int64 
 12  hours-per-week  32561 non-null  int64 
 13  native-country  32561 non-null  object
 14  salary          32561 non-null  object
dtypes: int64(6), object(9)
memory usage: 3.7+ MB


### ~ Q1 ~ 

In [240]:
adata["race"].value_counts()

White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: race, dtype: int64

### ~ Q2 ~

In [241]:
round((adata.loc[adata["sex"]=="Male","age"].mean()), 1)

39.4

### ~ Q3 ~

In [242]:
(((adata.loc[adata["education"]== "Bachelors", "education"].count())/ adata["education"].count())*100).round(1)

16.4

### ~ Q4 ~ 

In [243]:
BMD = adata.loc[((adata["education"]== "Bachelors")|(adata["education"]== "Masters")|(adata["education"]== "Doctorate")), "salary"].count()
BMD50UP = adata.loc[((adata["education"]== "Bachelors")|(adata["education"]== "Masters")|(adata["education"]== "Doctorate"))&(adata["salary"] == ">50K"), "salary"].count()
((BMD50UP/BMD)*100).round(1) 

46.5

### ~ Q5 ~

In [244]:
WOED = adata.loc[(adata["education"] != "Bachelors")&(adata["education"] != "Masters")&(adata["education"] != "Doctorate"), "salary"].count() 
WOED50UP = adata.loc[((adata["education"] != "Bachelors")&(adata["education"] != "Masters")&(adata["education"] != "Doctorate"))&(adata["salary"] == ">50K"), "salary"].count()
((WOED50UP/WOED)*100).round(1) 

17.4

### ~ Q6 ~

In [245]:
adata["hours-per-week"].min()

1

### ~ Q7 ~

In [246]:
mh = adata.loc[(adata["hours-per-week"] == 1), "salary"].count()
mh50up = adata.loc[(adata["hours-per-week"] == 1)&(adata["salary"] == ">50K"), "salary"].count()
((mh50up/mh)*100).round(1)

10.0

### ~ Q8 ~

In [250]:
pc50up= adata.loc[(adata["salary"] == ">50K"), "native-country"].value_counts()
pc = adata["native-country"].value_counts()
A = (((pc50up/pc)*100).round(1))
print(A.idxmax(), A.max())

Iran 41.9


### ~ Q9 ~

In [248]:
adata.loc[(adata["native-country"] == "India")&(adata["salary"] == ">50K"), "occupation"].value_counts()

Prof-specialty      25
Exec-managerial      8
Other-service        2
Tech-support         2
Transport-moving     1
Sales                1
Adm-clerical         1
Name: occupation, dtype: int64

##### _Nicolás Beltrán_