# Make Sense of Census

# Problem Statement
Hello!

You have been hired by 'CACT'(Census Analysis and Collection Team) to help with your numpy programming skills. Your major work for today involves census record management and data analysis.

About the Dataset
The snapshot of the data, you will be working on : __*Census_data*__

![Sample1](images/file.png)
 

#### The dataset has details of 100 people with the following 8 features

![Sample2](images/Feature_Description.png)

### Why solve this project
After completing this project, you will have a better grip on working with numpy. In this project, you will apply the following concepts:

- Array Appending
- Array Slicing
- Array Filtering
- Array Aggregation

## __*Step 1*__ : Data Reading
In this first task, we will load the data to a numpy array and add a new record to it.


In [1]:
import numpy as np
path = './file.csv'

#New record
new_record=[[50,  9,  4,  1,  0,  0, 40,  0]]

#Loading data file and saving it into a new numpy array 
data = np.genfromtxt(path, delimiter=",", skip_header=1)
print(f"The Shape of the Data is :{data.shape}")

#Concatenating the new record to the existing numpy array
census=np.concatenate((data, new_record),axis = 0)
print(f"The Shape of the Census Data is : {census.shape}")

The Shape of the Data is :(1000, 8)
The Shape of the Census Data is : (1001, 8)


## __*Step 2*__ : Young Country? Old Country?
We often ass ociate the potential of a country based on the age distribution of the people residing there. We too want to do a simple analysis of the age distribution

In [2]:
# Subsetting the array to include only 'Age' column
age = census[:,0]
print(f"Age : {age}")

# Finding the max value of age
max_age = age.max()
print(f"Maximum value of age is : {max_age}")

# Find the min value of age
min_age = age.min()
print(f"Minimum value of age is : {min_age}")

# Find the mean of age
age_mean = age.mean()
print(f"Mean of Age is : {age_mean}")

# Find the standard deviation of age
age_std = age.std()
print(f"Standard Deviation of Age is : {age_std}")

Age : [39. 50. 38. ... 40. 39. 50.]
Maximum value of age is : 90.0
Minimum value of age is : 17.0
Mean of Age is : 38.06293706293706
Standard Deviation of Age is : 13.341478176165857


## Step 3 : Minority Report
The constitution of the country tries it's best to ensure that people of all races are able to live harmoniously. Let's check the country's race distribution to identify the minorities so that the government can help them.

In [3]:
#Creating new subsets based on 'Age'
race_0=census[census[:,2]==0]
race_1=census[census[:,2]==1]
race_2=census[census[:,2]==2]
race_3=census[census[:,2]==3]
race_4=census[census[:,2]==4]


#Finding the length of the above created subsets
len_0=len(race_0)
len_1=len(race_1)
len_2=len(race_2)
len_3=len(race_3)
len_4=len(race_4)

#Printing the length of the above created subsets
print('Race_0: ', len_0)
print('Race_1: ', len_1)
print('Race_2: ', len_2)
print('Race_3: ', len_3)
print('Race_4: ', len_4)

#Storing the different race lengths with appropriate indexes
race_list=[len_0, len_1,len_2, len_3, len_4]

#Storing the race with minimum length into a variable 
minority_race=race_list.index(min(race_list))

Race_0:  10
Race_1:  27
Race_2:  110
Race_3:  6
Race_4:  848


## __*Step 4*__ : Senior Welfare
As per the new govt. policy, all citizens above age 60 should not be made to work more than 25 hours per week. Let us look at the data and see if that policy is followed.

In [4]:
#Subsetting the array based on the age 
senior_citizens = census[census[:,0]>60]

#Calculating the sum of all the values of array
working_hours_sum = senior_citizens.sum(axis=0)[6]

#Finding the length of the array
senior_citizens_len = len(senior_citizens)

#Finding the average working hours
avg_working_hours = working_hours_sum/senior_citizens_len

#Printing the average working hours
print(f"The Average Working Hours is {avg_working_hours}")


The Average Working Hours is 31.42622950819672


## __*Step 5*__ : Education Matters!
Our parents have repeatedly told us that we need to study well in order to get a good(read: higher-paying) job. Let's see whether the higher educated people have better pay in general.


In [5]:
#Creating an array based on 'education' column
high=census[census[:,1]>10]

#Finding the average pay
avg_pay_high=high[:,7].mean()

#Printing the average pay
print(f"Average pay for higher studies : {avg_pay_high}")

#Creating an array based on 'education' column
low=census[census[:,1]<=10]

#Finding the average pay
avg_pay_low=low[:,7].mean()

#Printing the average pay
print(f"Average pay for lower studies : {avg_pay_low}")

Average pay for higher studies : 0.42813455657492355
Average pay for lower studies : 0.13649851632047477
