<a href="https://colab.research.google.com/github/JakubPyt/Demographic_Data_Analyzer/blob/main/Demographic_Data_Analyzer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Demographic Data Analyzer

This project was carried out as part of the course on the free Code Camp website:

https://www.freecodecamp.org/learn/data-analysis-with-python/data-analysis-with-python-projects/demographic-data-analyzer 

In this challenge I had to analyze demographic data using pandas. I got a dataset of demographic data, that was extracted from the 1994 Census database.

In this dataset I had to extract specific demographic information. 

In [None]:
import pandas as pd

data = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/fCC Data Analysis with Python/Data/adult.data.csv", sep=";")
data.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [None]:
# ==============
# === TASK 1 ===
# ==============
# How many of each race are represented in this dataset? 
race_count = data.race.value_counts()

# ==============
# === TASK 2 ===
# ==============
# What is the average age of men?
average_age_men = round(data[data.sex == 'Male'].age.mean(),2)

# ==============
# === TASK 3 ===
# ==============
# What is the percentage of people who have a Bachelor's degree?
percentage_bachelors = round((data.education.value_counts(normalize=True).Bachelors)*100,2) 



# ==============
# === TASK 4 ===
# ==============
# What percentage of people with advanced education (`Bachelors`, `Masters`, or `Doctorate`) make more than 50K?

# First, I extract rows with `Bachelors`, `Masters`, or `Doctorate`
higher_education = ( (data['education'] == 'Doctorate')
                    | (data['education'] == 'Bachelors') 
                    | (data['education'] == 'Masters'))

# percentage with salary >50K
higher_education_rich = round((
            data[higher_education].salary.value_counts(normalize=True)['>50K']
                )*100 ,2)

# ==============
# === TASK 5 ===
# ==============
# What percentage of people without advanced education make more than 50K?

# Lower education is just not higher education
lower_education = ~higher_education 

# percentage with salary >50K
lower_education_rich = round((
            data[lower_education].salary.value_counts(normalize=True)['>50K']
                )*100 ,2)

# ==============
# === TASK 6 ===
# ==============
# What is the minimum number of hours a person works per week (hours-per-week feature)?
min_work_hours = data['hours-per-week'].min()

# ==============
# === TASK 7 ===
# ==============
# What percentage of the people who work the minimum number of hours per week have a salary of >50K?

# First, I extract rows with the minimum number of hours  
num_min_workers = data[data['hours-per-week'] == min_work_hours]

# value_counts(Normalize=True) returns percentage of people who have >50K salary from above data 
rich_percentage = round((num_min_workers.salary.value_counts(normalize=True)['>50K'])*100,2)

# ==============
# === TASK 8 ===
# ==============
# What country has the highest percentage of people that earn >50K?
highest_earning_country = data.groupby('native-country').salary.value_counts(
    normalize=True)[:,'>50K'].sort_values(ascending=False).index[0]

# ==============
# === TASK 9 ===
# ==============
# What is the percentage of rich people in above country?
highest_earning_country_percentage = round((data.groupby('native-country').salary.value_counts(
    normalize=True)[:,'>50K'].sort_values(ascending=False)[0])*100,2)

# ===============
# === TASK 10 ===
# ===============
# Identify the most popular occupation for those who earn >50K in India.
mask = (data['salary'] == '>50K') & (data['native-country'] == 'India')

top_IN_occupation = data[mask].occupation.value_counts()[:1].index[0]

# =====================
# === print section ===
# =====================
print("Analysis of demographic data".center(70, "."))
print("1. Number of each race:\n", race_count) 
print("2. Average age of men:", average_age_men)
print(f"3. Percentage with Bachelors degrees: {percentage_bachelors}%")
print(f"4. Percentage with higher education that earn >50K: {higher_education_rich}%")
print(f"5. Percentage without higher education that earn >50K: {lower_education_rich}%")
print(f"6. Min work time: {min_work_hours} hours/week")
print(f"7. Percentage of rich among those who work fewest hours: {rich_percentage}%")
print("8. Country with highest percentage of rich:", highest_earning_country)
print(f"9. Highest percentage of rich people in country: {highest_earning_country_percentage}%")
print("10. Top occupations in India:", top_IN_occupation)

.....................Analysis of demographic data.....................
1. Number of each race:
 White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: race, dtype: int64
2. Average age of men: 39.43
3. Percentage with Bachelors degrees: 16.45%
4. Percentage with higher education that earn >50K: 46.54%
5. Percentage without higher education that earn >50K: 17.37%
6. Min work time: 1 hours/week
7. Percentage of rich among those who work fewest hours: 10.0%
8. Country with highest percentage of rich: Iran
9. Highest percentage of rich people in country: 41.86%
10. Top occupations in India: Prof-specialty
