In [4]:
import pandas as pd

def calculate_demographic_data(print_data=True):
    # === Data Input ===
    df = pd.read_csv('adult.data.csv')  # Load the dataset into a DataFrame

    # === Race Count ===
    race_count = round(df['race'].value_counts(), 1)  # Count occurrences of each race

    # === Average Age of Men ===
    average_age_men = round(df[df['sex'] == 'Male']['age'].mean(), 1)  # Calculate average age of men

    # === Percentage of People with Bachelor's Degree ===
    percentage_bachelors = round((df['education'] == 'Bachelors').mean()*100, 1)  # Calculate percentage with Bachelor's degree

    # === Higher vs Lower Education and Salary >50K ===
    higher_education = df[df['education'].isin(['Bachelors', 'Masters', 'Doctorate'])]  # Filter for higher education
    lower_education = df[~df['education'].isin(['Bachelors', 'Masters', 'Doctorate'])]  # Filter for lower education
    
    higher_education_rich = round((higher_education['salary'] == '>50K').mean()*100, 1)  # Percentage with higher education earning >50K
    lower_education_rich = round((lower_education['salary'] == '>50K').mean()*100, 1)  # Percentage with lower education earning >50K

    # === Minimum Work Hours ===
    min_work_hours = df['hours-per-week'].min()  # Find minimum work hours

    # === Percentage of People Earning >50K with Minimum Work Hours ===
    num_min_workers = df[df['hours-per-week'] == min_work_hours]  # Filter for minimum work hours
    rich_percentage = round((num_min_workers['salary'] == '>50K').mean()*100, 1)  # Percentage of rich among those working minimum hours

    # === Country with Highest Percentage of People Earning >50K ===
    country_counts = df['native-country'].value_counts()  # Count of people by country
    country_with_rich_earnings = df[df['salary'] == '>50K']['native-country'].value_counts()  # People earning >50K by country
    earning_percentage = round((country_with_rich_earnings / country_counts)*100, 1)  # Calculate percentage of rich people in each country
    highest_earning_country = earning_percentage.idxmax()  # Country with highest percentage of rich
    highest_earning_country_percentage = earning_percentage.max()  # Highest percentage of rich people in that country

    # === Most Popular Occupation for Rich People in India ===
    rich_earnings_IN_dataframe = df[(df['salary'] == '>50K') & (df['native-country'] == 'India')]  # Filter for rich people in India
    occupation_counts_IN = rich_earnings_IN_dataframe['occupation'].value_counts()  # Count occupations of rich people in India
    top_IN_occupation = occupation_counts_IN.idxmax()  # Identify the most popular occupation

    # === Data Output ===
    if print_data:
        print("Number of each race:\n", race_count) 
        print("Average age of men:", average_age_men)
        print(f"Percentage with Bachelors degrees: {percentage_bachelors}%")
        print(f"Percentage with higher education that earn >50K: {higher_education_rich}%")
        print(f"Percentage without higher education that earn >50K: {lower_education_rich}%")
        print(f"Min work time: {min_work_hours} hours/week")
        print(f"Percentage of rich among those who work fewest hours: {rich_percentage}%")
        print("Country with highest percentage of rich:", highest_earning_country)
        print(f"Highest percentage of rich people in country: {highest_earning_country_percentage}%")
        print("Top occupations in India:", top_IN_occupation)

    return {
        'race_count': race_count,
        'average_age_men': average_age_men,
        'percentage_bachelors': percentage_bachelors,
        'higher_education_rich': higher_education_rich,
        'lower_education_rich': lower_education_rich,
        'min_work_hours': min_work_hours,
        'rich_percentage': rich_percentage,
        'highest_earning_country': highest_earning_country,
        'highest_earning_country_percentage': highest_earning_country_percentage,
        'top_IN_occupation': top_IN_occupation
    }
calculate_demographic_data();

Number of each race:
 race
White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: count, dtype: int64
Average age of men: 39.4
Percentage with Bachelors degrees: 16.4%
Percentage with higher education that earn >50K: 46.5%
Percentage without higher education that earn >50K: 17.4%
Min work time: 1 hours/week
Percentage of rich among those who work fewest hours: 10.0%
Country with highest percentage of rich: Iran
Highest percentage of rich people in country: 41.9%
Top occupations in India: Prof-specialty


## Overview
This code performs demographic analysis on the adult dataset, focusing on various statistics related to race, education, salary, and work hours. It calculates key metrics and produces insights on the relationships between education level, work hours, and salary.

## Data Source
- **Dataset**: `adult.data.csv`
- **Relevant Columns**:
  - `age`: Age of the individual.
  - `sex`: Gender of the individual.
  - `race`: Race of the individual.
  - `education`: Education level of the individual.
  - `salary`: Salary classification (e.g., `>50K` or `<=50K`).
  - `hours-per-week`: Number of hours worked per week.
  - `native-country`: Country of the individual.

## Objective
- To analyze the demographic data, focusing on:
  - Race distribution.
  - Average age of men.
  - Percentage of individuals with a Bachelor's degree.
  - Salary statistics related to education level.
  - Work hours statistics and how they relate to salary.
  - The country with the highest percentage of individuals earning more than 50K.
  - The most popular occupation for individuals earning more than 50K in India.

## Methodology
1. **Data Loading**: Load the dataset from a CSV file using `pandas`.
2. **Race Count**: Count how many individuals from each race are represented in the dataset.
3. **Average Age of Men**: Calculate the average age of individuals who identify as male.
4. **Percentage with Bachelors**: Calculate the percentage of people with a Bachelor's degree.
5. **Salary Analysis by Education**:
   - Calculate the percentage of people with a higher education (Bachelors, Masters, or Doctorate) earning more than 50K.
   - Calculate the percentage of people with lower education earning more than 50K.
6. **Work Hours Analysis**:
   - Identify the minimum number of hours worked per week.
   - Calculate the percentage of individuals who work the minimum number of hours per week and earn more than 50K.
7. **Country Analysis**:
   - Determine which country has the highest percentage of individuals earning more than 50K.
8. **Occupation in India**:
   - Find the most popular occupation for those who earn more than 50K in India.

## Output
- **Printed Results**: The function prints the following information:
  - Race distribution.
  - Average age of men.
  - Percentage of people with a Bachelor's degree.
  - Percentage of individuals with higher or lower education earning more than 50K.
  - Minimum work hours per week.
  - Percentage of rich individuals who work the fewest hours.
  - Country with the highest percentage of individuals earning more than 50K.
  - The top occupation for individuals earning more than 50K in India.

- **Return Value**: The function returns a dictionary containing the calculated values for further use or analysis.
