# 1. Demographic Data Analyzer

### 1.1 About
This project is my first project working with sample data. This is part of the [freeCodeCamp Data Analysis with Python Certification](https://www.freecodecamp.org/learn/data-analysis-with-python/)
For this project, I will analyze demographic data using Pandas. A dataset of demographic data that was extracted from the 1994 Census database is provided.

### 1.2 Questions Asked
1. How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (race column)
2. What is the average age of men?
3. What is the percentage of people who have a Bachelor's degree?
4. What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?
5. What percentage of people without advanced education make more than 50K?
6. What is the minimum number of hours a person works per week?
7. What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?
8. What country has the highest percentage of people that earn >50K and what is that percentage?
9. Identify the most popular occupation for those who earn >50K in India.
    
*Round all decimals to the nearest tenth.*

In [12]:
import pandas as pd
df = pd.read_csv('adult.data.csv')

In [13]:
# Visualization of the first 5 rows of the data
df.head(5)

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


**1. How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (race column)**

In [14]:
race_count = df['race'].value_counts()
print ("Number of each race:\n ", race_count)

Number of each race:
  race
White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: count, dtype: int64


**2. What is the average age of men?**

In [15]:
average_age_men = df[df['sex'] == 'Male']['age'].mean().round(1)
print ("Average age of men: ", average_age_men)

Average age of men:  39.4


**3. What is the percentage of people who have a Bachelor's degree?**

In [16]:
percentage_bachelors = round((len(df[df.education == 'Bachelors']) / len(df.education) * 100), 1)
print ("Percentage with Bachelors degrees: ", percentage_bachelors)

Percentage with Bachelors degrees:  16.4


**4. What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?**

In [17]:
higher_education = df[(df.education == 'Bachelors') | (df.education == 'Masters')| (df.education == 'Doctorate')]
higher_education_rich = round((len(higher_education[higher_education['salary'] == '>50K']) / len(higher_education) * 100), 1)
print ("Percentage with higher education that earn >50K: ", higher_education_rich)

Percentage with higher education that earn >50K:  46.5


**5. What percentage of people without advanced education make more than 50K?**

In [18]:
lower_education = df[(df.education != 'Bachelors') & (df.education != 'Masters') & (df.education != 'Doctorate')] 
lower_education_rich = round((len(lower_education[lower_education['salary'] == '>50K']) / len(lower_education) * 100), 1)
print ("Percentage without higher education that earn >50K: ", lower_education_rich)

Percentage without higher education that earn >50K:  17.4


**6. What is the minimum number of hours a person works per week?**


In [19]:
min_work_hours = df['hours-per-week'].min()
print ("Min work time: ", min_work_hours, "hours/week")

Min work time:  1 hours/week


**7. What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?**

In [20]:
workers = df[df['hours-per-week'] == min_work_hours] 
rich_percentage = (len(workers[workers['salary'] == '>50K']) / len(workers) * 100)
print("Percentage of rich among those who work fewest hours: ", rich_percentage)

Percentage of rich among those who work fewest hours:  10.0


**8. What country has the highest percentage of people that earn >50K and what is that percentage?**

In [21]:
grouped_percent = df.groupby('native-country')['salary'].apply(lambda x: (x == '>50K').mean() * 100).reset_index()
highest_percent = grouped_percent[grouped_percent['salary'] == grouped_percent['salary'].max()]
highest_earning_country = highest_percent.iloc[0]['native-country']
highest_earning_country_percentage = round(highest_percent.iloc[0]['salary'], 1)
print ("Country with the highest percentage of rich: ", highest_earning_country, highest_earning_country_percentage)

Country with the highest percentage of rich:  Iran 41.9


**9. Identify the most popular occupation for those who earn >50K in India.**

In [22]:
india_50k = df[(df['native-country'] == 'India') & (df['salary'] == '>50K')]
top_IN_occupation = india_50k['occupation'].mode()[0]
print ("Top occupations in India: ", top_IN_occupation)

Top occupations in India:  Prof-specialty


### 1.3 Conlcusions
These conclusions provide insights into the distribution of demographic and income-related factors in the dataset. They can be useful for understanding patterns and making informed decisions or further analysis.

**Representation of Races:**
- White: 27,589 individuals
- Black: 3,518 individuals
- Asian-Pac-Islander: 1,227 individuals
- Amer-Indian-Eskimo: 286 individuals
- Other: 271 individuals

These figures provide a clear breakdown of racial representation within the dataset.


**Average Age of Men:** The average age of men in the dataset is approximately 39.4 years.

**Percentage with Bachelor's Degree:** Around 16.4% of individuals in the dataset hold a Bachelor's degree.

**Percentage of High Earners with Advanced Education:** Among individuals with advanced education (Bachelor's, Master's, or Doctorate), approximately 55.5% earn more than $50K annually.

**Percentage of High Earners without Advanced Education:** Conversely, among individuals without advanced education, approximately 17.4% earn more than $50K annually.

**Minimum Working Hours per Week:** The minimum number of hours worked per week is 1 hour.

**Percentage of High Earners among Minimum Wage Workers:** Surprisingly, 10% of individuals working the minimum hours per week earn above $50K, indicating potential outliers or part-time high earners.

**Country with Highest Proportion of High Earners:** The country with the highest percentage of individuals earning above $50K is Luxembourg, with approximately 44.4% of its residents falling into this category.

**Popular Occupations among High Earners in India:** Among high earners from India, the most common occupation is in the 'Prof-specialty' field, indicating a preference for professional roles among this group.
