<a href="https://colab.research.google.com/github/blackcrowX/Data_Analytics_Projects/blob/main/Python/Demographic_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div align="center">
<h1>Analysis - Demographic Data</h1>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/fa/FreeCodeCamp_logo.svg/2560px-FreeCodeCamp_logo.svg.png"/>
</div>

## Table of Contents
* Introduction
* Setup
  * Import Libraries
  * Import Data
* Data Analysis
  * Race Count
  * Average Age Men
  * Percentage Bachelors
  * Higher & Lower Education Rich
  * Minimum Hours Worked Salary
  * Rich Percentage
  * Highest Earning Country Percentage
  * Top India Occupation

<h1 align="center">Introduction</h1>

In this challenge you must analyze demographic data using Pandas. You are given a dataset of demographic data that was extracted from the 1994 Census database. The dataset was published by <a href="https://www.freecodecamp.org/learn/data-analysis-with-python/data-analysis-with-python-projects/demographic-data-analyzer">freeCodeCamp</a>. Here is a sample of what the data looks like:

|    |   age | workclass        |   fnlwgt | education   |   education-num | marital-status     | occupation        | relationship   | race   | sex    |   capital-gain |   capital-loss |   hours-per-week | native-country   | salary   |
|---:|------:|:-----------------|---------:|:------------|----------------:|:-------------------|:------------------|:---------------|:-------|:-------|---------------:|---------------:|-----------------:|:-----------------|:---------|
|  0 |    39 | State-gov        |    77516 | Bachelors   |              13 | Never-married      | Adm-clerical      | Not-in-family  | White  | Male   |           2174 |              0 |               40 | United-States    | <=50K    |
|  1 |    50 | Self-emp-not-inc |    83311 | Bachelors   |              13 | Married-civ-spouse | Exec-managerial   | Husband        | White  | Male   |              0 |              0 |               13 | United-States    | <=50K    |
|  2 |    38 | Private          |   215646 | HS-grad     |               9 | Divorced           | Handlers-cleaners | Not-in-family  | White  | Male   |              0 |              0 |               40 | United-States    | <=50K    |
|  3 |    53 | Private          |   234721 | 11th        |               7 | Married-civ-spouse | Handlers-cleaners | Husband        | Black  | Male   |              0 |              0 |               40 | United-States    | <=50K    |
|  4 |    28 | Private          |   338409 | Bachelors   |              13 | Married-civ-spouse | Prof-specialty    | Wife           | Black  | Female |              0 |              0 |               40 | Cuba             | <=50K    |

<h1 align="center">Setup</h1>

## Step 1: Import Libraries

Import and configure libraries required for data analysis.

In [None]:
import pandas as pd

## Step 2: Import Dataset
Use Pandas to import the data from `adult_data.csv`.

In [None]:
url = "https://raw.githubusercontent.com/blackcrowX/Data_Analytics_Projects/main/Datasets/adult_data.csv"
df = pd.read_csv(url)

<h1 align="center">Data Analysis</h1>

## Step 3: Race Count

How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (`race` column)

In [None]:
race_count = df['race'].value_counts()
print(race_count)

White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: race, dtype: int64


## Step 4: Average Age Men

What is the average age of men?

In [None]:
average_age_men = df.loc[df['sex'] == 'Male']['age'].mean().round(decimals=1)
print(average_age_men)

39.4


## Step 5: Percentage Bachelors

What is the percentage of people who have a Bachelor's degree?

In [None]:
bachelors_count = df.loc[df['education'] == 'Bachelors']['education'].count()
total_count = df['education'].count()
percentage_bachelors = (bachelors_count / total_count * 100).round(decimals=1)
print(percentage_bachelors)

16.4


## Step 6: Higher & Lower Education Rich

What percentage of people with advanced education (`Bachelors`, `Masters`, or `Doctorate`) make more than 50K?

What percentage of people without advanced education make more than 50K?

In [None]:
education_salary_df = pd.DataFrame(df.groupby(df['education'])['salary'].value_counts())
education_salary_df = education_salary_df.rename(columns={"salary": "counts"})
high_salary_df = education_salary_df.loc[(slice(None), '>50K'), :]

In [None]:
higher_education = education_salary_df.loc[['Bachelors', 'Masters', 'Doctorate']].sum()
lower_education = education_salary_df.sum() - higher_education

In [None]:
high_education_rich_count = high_salary_df.loc[['Bachelors', 'Masters', 'Doctorate']].sum()
lower_education_rich_count = high_salary_df.sum() - high_education_rich_count

higher_education_rich = float((high_education_rich_count / higher_education * 100).round(decimals=1))
lower_education_rich = float((lower_education_rich_count / lower_education * 100).round(decimals=1))
print(higher_education_rich)
print(lower_education_rich)

46.5
17.4


## Step 7: Minimum Hours Worked Salary

What is the minimum number of hours a person works per week (hours-per-week feature)?

In [None]:
min_work_hours = df['hours-per-week'].min()

hours_worked_salary_df = pd.DataFrame(df.groupby(df['hours-per-week'])['salary'].value_counts())
hours_worked_salary_df = hours_worked_salary_df.rename(columns={"salary": "counts"})
min_hours_worked_salary_df = hours_worked_salary_df.loc[min_work_hours, :]
print(min_hours_worked_salary_df)

        counts
salary        
<=50K       18
>50K         2


## Step 8:  Rich Percentage

What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?

In [None]:
num_min_workers = min_hours_worked_salary_df.sum()
rich_percentage = float((min_hours_worked_salary_df.loc['>50K'] / num_min_workers * 100).round(decimals=1))
print(rich_percentage)

10.0


## Step 9: Highest Earning Country Percentage

What country has the highest percentage of people that earn >50K and what is that percentage?

In [None]:
country_counts_df = pd.DataFrame(df.groupby(df['native-country'])['salary'].count())
country_counts_df = country_counts_df.rename(columns={"salary": "counts"}).reset_index()
country_rich_counts_df = pd.DataFrame(df.groupby(df['native-country'])['salary'].value_counts())
country_rich_counts_df = country_rich_counts_df.loc[(slice(None), '>50K'), :]
country_rich_counts_df = country_rich_counts_df.rename(columns={"salary": "rich-counts"})
country_rich_counts_df = country_rich_counts_df.reset_index()[['native-country', 'rich-counts']]
country_counts_df = country_counts_df.merge(country_rich_counts_df, on='native-country')
country_counts_df['rich-percent'] = (country_counts_df['rich-counts'] / country_counts_df['counts'] * 100)
country_counts_df['rich-percent'] = country_counts_df['rich-percent'].round(decimals=1)
top_country = country_counts_df.sort_values('rich-percent', ascending=False).head(1)

highest_earning_country = top_country.iloc[0]['native-country']
highest_earning_country_percentage = top_country.iloc[0]['rich-percent']
print(highest_earning_country_percentage)

41.9


## Step 10: Top India Occupation

Identify the most popular occupation for those who earn >50K in India.

In [None]:
india_df = df.loc[df['native-country'] == 'India']
india_df = india_df.loc[df['salary'] == '>50K']
india_df = pd.DataFrame(india_df.groupby('native-country')['occupation'].value_counts())
india_df = india_df.rename(columns={'occupation': 'counts'})
india_df = india_df.reset_index().sort_values('counts', ascending=False).head(1)

top_IN_occupation = india_df.iloc[0]['occupation']
print(top_IN_occupation)

Prof-specialty
