# Introduction

I have been hired by an organization that strives to improve educational outcomes for children and young people in Chicago. My job is to analyze the census, crime, and school data for a given neighborhood or district. 

I will identify causes that impact the enrollment, safety, health, environment ratings of schools.

## Selected Socioeconomic Indicators in Chicago

The city of Chicago released a dataset of socioeconomic data to the Chicago City Portal.
This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012.

Scores on the hardship index can range from 1 to 100, with a higher index number representing a greater level of hardship.

A detailed description of the dataset can be found on [the city of Chicago's website](https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork20127838-2022-01-01), but to summarize, the dataset has the following variables:

*   **Community Area Number** (`ca`): Used to uniquely identify each row of the dataset

*   **Community Area Name** (`community_area_name`): The name of the region in the city of Chicago

*   **Percent of Housing Crowded** (`percent_of_housing_crowded`): Percent of occupied housing units with more than one person per room

*   **Percent Households Below Poverty** (`percent_households_below_poverty`): Percent of households living below the federal poverty line

*   **Percent Aged 16+ Unemployed** (`percent_aged_16_unemployed`): Percent of persons over the age of 16 years that are unemployed

*   **Percent Aged 25+ without High School Diploma** (`percent_aged_25_without_high_school_diploma`): Percent of persons over the age of 25 years without a high school education

*   **Percent Aged Under** 18 or Over 64:Percent of population under 18 or over 64 years of age (`percent_aged_under_18_or_over_64`): (ie. dependents)

*   **Per Capita Income** (`per_capita_income_`): Community Area per capita income is estimated as the sum of tract-level aggragate incomes divided by the total population

*   **Hardship Index** (`hardship_index`): Score that incorporates each of the six selected socioeconomic indicators




# Method

First I generated all the three tables in the PostgreSQL database.

Then I insert the data and fill the tables.

Now these tables are ready to be analysed.

### Connect to the database


In [2]:
import psycopg2 as pg2
import pandas as pd 

conn = pg2.connect(database='Assignment IBM', user='postgres',password='password')

## Problems

### Problem 1

##### How many rows are in the dataset?


In [3]:
df=pd.read_sql("SELECT * FROM chicago_census", conn)
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 78 entries, 0 to 77
Data columns (total 9 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   community_area_number                         77 non-null     float64
 1   community_area_name                           78 non-null     object 
 2   percent_of_housing_crowded                    78 non-null     float64
 3   percent_households_below_poverty              78 non-null     float64
 4   percent_aged_16__unemployed                   78 non-null     float64
 5   percent_aged_25__without_high_school_diploma  78 non-null     float64
 6   percent_aged_under_18_or_over_64              78 non-null     float64
 7   per_capita_income                             78 non-null     int64  
 8   hardship_index                                77 non-null     float64
dtypes: float64(7), int64(1), object(1)
memory usage: 5.6+ KB




In [4]:
df=pd.read_sql("SELECT COUNT(*) FROM chicago_census", conn)
df




Unnamed: 0,count
0,78


### Problem 2

##### How many community areas in Chicago have a hardship index greater than 50.0?


In [8]:
df=pd.read_sql("SELECT COUNT(DISTINCT(community_area_name)) FROM chicago_census \
                WHERE hardship_index>50 ", conn)
df




Unnamed: 0,count
0,38


### Problem 3

##### What is the maximum value of hardship index in this dataset?


In [20]:
df=pd.read_sql("SELECT MAX(hardship_index) FROM chicago_census", conn)
df



Unnamed: 0,max
0,98


### Problem 4

##### Which community area which has the highest hardship index?


In [19]:
df=pd.read_sql("SELECT community_area_name, MAX(hardship_index) FROM chicago_census \
               WHERE hardship_index IS NOT NULL \
               GROUP BY community_area_name \
               ORDER BY MAX(hardship_index) DESC \
               LIMIT 1", conn)
df



Unnamed: 0,community_area_name,max
0,Riverdale,98


### Problem 5

##### Which Chicago community areas have per-capita incomes greater than $60,000?


In [23]:
df=pd.read_sql("SELECT community_area_name, MAX(per_capita_income) FROM chicago_census \
               WHERE per_capita_income IS NOT NULL AND per_capita_income>60000 \
               GROUP BY community_area_name \
               ORDER BY MAX(per_capita_income) DESC", conn)
df




Unnamed: 0,community_area_name,max
0,Near North Side,88669
1,Lincoln Park,71551
2,Loop,65526
3,Lake View,60058
