# Demographic Data Analysis

This notebook analyzes demographic data to provide insights into various metrics, including the distribution of races, average age of men, education levels, and income disparities across different countries.

## 1. Importing Libraries

We start by importing the necessary libraries.


In [1]:
import pandas as pd

## 2. Loading the Data

Next, we load the data from the CSV file into a pandas DataFrame.


In [2]:
# Read data from file
df = pd.read_csv("adult.data.csv")
df.head()


Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


## 3. Race Distribution

We will calculate how many individuals of each race are represented in the dataset.


In [3]:
# Calculate the number of each race
races = df['race'].unique()
counts = [df[df['race'] == c]['race'].count() for c in races]
race_count = pd.Series(data=counts, index=races)

# Display the race count
race_count


White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
dtype: int64

## 4. Average Age of Men

We calculate the average age of men in the dataset.


In [4]:
# Calculate the average age of men
average_age_men = round(df[df['sex'] == "Male"]['age'].mean(), 1)

# Display the average age of men
average_age_men


39.4

## 5. Percentage of People with a Bachelor's Degree

We calculate the percentage of people who have a Bachelor's degree.


In [5]:
# Calculate the percentage of people with a Bachelor's degree
percentage_bachelors = round((df[df['education'] == "Bachelors"]['education'].count() / df['age'].count()) * 100, 1)

# Display the percentage of people with a Bachelor's degree
percentage_bachelors


16.4

## 6. Income Analysis Based on Education Level

We analyze the percentage of people with advanced education (`Bachelors`, `Masters`, or `Doctorate`) who earn more than 50K, compared to those without advanced education.


In [6]:
# Calculate the percentage of people with higher education who earn >50K
higher_education = df[((df['education'] == 'Bachelors') | (df['education'] == 'Masters') | (df['education'] == 'Doctorate')) & (df['salary'] == '>50K')]['salary'].count()
higher_education_count = df[((df['education'] == 'Bachelors') | (df['education'] == 'Masters') | (df['education'] == 'Doctorate'))]['salary'].count()

# Calculate the percentage of people without higher education who earn >50K
lower_education = df[(df['education'] != 'Bachelors') & (df['education'] != 'Masters') & (df['education'] != 'Doctorate') & (df['salary'] == '>50K')]['salary'].count()
lower_education_count = df[(df['education'] != 'Bachelors') & (df['education'] != 'Masters') & (df['education'] != 'Doctorate')]['salary'].count()

higher_education_rich = round((higher_education / higher_education_count) * 100, 1)
lower_education_rich = round((lower_education / lower_education_count) * 100, 1)

# Display the results
higher_education_rich, lower_education_rich


(46.5, 17.4)

## 7. Minimum Working Hours

We find the minimum number of hours a person works per week and analyze what percentage of those working the minimum number of hours have a salary of >50K.


In [7]:
# Calculate the minimum number of hours a person works per week
min_work_hours = df['hours-per-week'].min()

# Calculate the percentage of people who work the minimum number of hours and earn >50K
num_min_workers = df[df['hours-per-week'] == min_work_hours]['salary']
rich_percentage = round((num_min_workers[num_min_workers == ">50K"].count() / num_min_workers.count()) * 100, 1)

# Display the results
min_work_hours, rich_percentage


(1, 10.0)

## 8. Country with Highest Percentage of High Earners

We determine which country has the highest percentage of people earning >50K.


In [8]:
# Calculate the country with the highest percentage of people earning >50K
countries = df['native-country'].unique()
max_country = 0
selected = ""
for country in countries:
    country_data = df[df['native-country'] == country]['salary']
    all_workers = country_data.count()
    highest_workers = country_data[country_data == ">50K"].count()
    if (highest_workers / all_workers * 100) > max_country:
        selected = country
        max_country = (highest_workers / all_workers) * 100

highest_earning_country_percentage = round(max_country, 1)
highest_earning_country = selected

# Display the results
highest_earning_country, highest_earning_country_percentage


('Iran', 41.9)

## 9. Most Popular Occupation for High Earners in India

We identify the most popular occupation for those who earn >50K in India.


In [9]:
# Identify the most popular occupation for those who earn >50K in India
indian_occupations = df[(df['native-country'] == "India") & (df['salary'] == ">50K")]['occupation']
occupations = indian_occupations.value_counts()

# Display the top occupation
top_IN_occupation = occupations.idxmax()
top_IN_occupation


'Prof-specialty'

## 10. Summary of Results

The final cell provides a summary of all the calculated metrics.


In [10]:
# Summary of results
{
    'race_count': race_count,
    'average_age_men': average_age_men,
    'percentage_bachelors': percentage_bachelors,
    'higher_education_rich': higher_education_rich,
    'lower_education_rich': lower_education_rich,
    'min_work_hours': min_work_hours,
    'rich_percentage': rich_percentage,
    'highest_earning_country': highest_earning_country,
    'highest_earning_country_percentage': highest_earning_country_percentage,
    'top_IN_occupation': top_IN_occupation
}


{'race_count': White                 27816
 Black                  3124
 Asian-Pac-Islander     1039
 Amer-Indian-Eskimo      311
 Other                   271
 dtype: int64,
 'average_age_men': 39.4,
 'percentage_bachelors': 16.4,
 'higher_education_rich': 46.5,
 'lower_education_rich': 17.4,
 'min_work_hours': 1,
 'rich_percentage': 10.0,
 'highest_earning_country': 'Iran',
 'highest_earning_country_percentage': 41.9,
 'top_IN_occupation': 'Prof-specialty'}