# 🧠 Final Project: Practicing MySQL and Python Integration

In this project, we will apply the knowledge gained in **MySQL** and **Python** to analyze real-world datasets from the city of Chicago.

---

## 📂 Datasets Used

### 1. 🏙️ Socioeconomic Indicators in Chicago
This dataset includes six key public health-related socioeconomic indicators and a hardship index for each Chicago community area, covering the years 2008–2012.

🔗 [View full dataset description on the Chicago Data Portal](https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2)  
📥 [Download CSV](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoCensusData.csv)

---

### 2. 🏫 Chicago Public Schools Performance
This dataset provides school-level performance data used to generate CPS School Report Cards for the 2011–2012 academic year.

🔗 [View full dataset description on the Chicago Data Portal](https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t)  
📥 [Download CSV](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoPublicSchools.csv)

---

### 3. 🚨 Chicago Crime Data
This dataset contains reported crime incidents (excluding murders, which are tracked per victim) from 2001 to the present, excluding the most recent seven days.

🔗 [View full dataset description on the Chicago Data Portal](https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2)  
📥 [Download CSV](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoCrimeData.csv)

---

## 🛠️ Project Requirements

- Load the datasets using **Pandas** from `.csv` files.
- Create and populate **MySQL tables** with relevant subsets of the data.
- Perform SQL queries to extract insights.
- Use **Python** to automate analysis and generate visualizations.

---

## 🎯 Objective

To combine data engineering and analysis skills by integrating MySQL and Python, enabling meaningful insights into Chicago’s socioeconomic conditions, public school performance, and crime patterns.

---

> 💡 *Tip:* Use `pandas.read_csv()` to load the datasets directly from the URLs. Validate data types before inserting into MySQL to avoid schema mismatches.

**Installing libraries**


In [135]:
!pip install pandas
!pip install ipython-sql prettytable 

import prettytable
import pandas as pd
import sqlite3 as sq

prettytable.DEFAULT = 'DEFAULT'



**Creating the conection with the DB and the Cursor**

In [136]:
con = sq.connect("FinalDB.db")

cur = con.cursor()

**Loading the CSV Files to the Database using pandas**

In [137]:
df_censusdata = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoCensusData.csv?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork20127838-2021-01-01')
df_censusdata.to_sql("CHICAGO_CENSUS_DATA", con, if_exists='replace', index=False)

df_publicschools = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoPublicSchools.csv?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork20127838-2021-01-01')
df_publicschools.to_sql("CHICAGO_PUBLIC_SCHOOLS", con, if_exists='replace', index=False)

df_crimedata = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoCrimeData.csv?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork20127838-2021-01-01')
df_crimedata.to_sql("CHICAGO_CRIME_DATA", con, if_exists='replace', index=False)


533

**Loading and coneccting the SQL Magic Module**

In [138]:
!pip install ipython-sql
%load_ext sql
%sql sqlite:///FinalDB.db

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [139]:
%sql SELECT name,type FROM PRAGMA_TABLE_INFO('CHICAGO_CRIME_DATA');

 * sqlite:///FinalDB.db
Done.


name,type
ID,INTEGER
CASE_NUMBER,TEXT
DATE,TEXT
BLOCK,TEXT
IUCR,TEXT
PRIMARY_TYPE,TEXT
DESCRIPTION,TEXT
LOCATION_DESCRIPTION,TEXT
ARREST,INTEGER
DOMESTIC,INTEGER


In [140]:
%sql SELECT name,type FROM PRAGMA_TABLE_INFO('CHICAGO_CENSUS_DATA');

 * sqlite:///FinalDB.db
Done.


name,type
COMMUNITY_AREA_NUMBER,REAL
COMMUNITY_AREA_NAME,TEXT
PERCENT_OF_HOUSING_CROWDED,REAL
PERCENT_HOUSEHOLDS_BELOW_POVERTY,REAL
PERCENT_AGED_16__UNEMPLOYED,REAL
PERCENT_AGED_25__WITHOUT_HIGH_SCHOOL_DIPLOMA,REAL
PERCENT_AGED_UNDER_18_OR_OVER_64,REAL
PER_CAPITA_INCOME,INTEGER
HARDSHIP_INDEX,REAL


In [141]:
%sql SELECT name,type FROM PRAGMA_TABLE_INFO('CHICAGO_PUBLIC_SCHOOLS');

 * sqlite:///FinalDB.db
Done.


name,type
School_ID,INTEGER
NAME_OF_SCHOOL,TEXT
"Elementary, Middle, or High School",TEXT
Street_Address,TEXT
City,TEXT
State,TEXT
ZIP_Code,INTEGER
Phone_Number,TEXT
Link,TEXT
Network_Manager,TEXT


**Obtanining the total number of crimes recorded in the CRIME table**

In [142]:
%sql SELECT COUNT(*) FROM CHICAGO_CRIME_DATA;

 * sqlite:///FinalDB.db
Done.


COUNT(*)
533


**Obtaining a list of community area names and numbers with per capita income less than 11000 from CENSUS DATA TABLE**

In [143]:
%sql SELECT COMMUNITY_AREA_NUMBER, COMMUNITY_AREA_NAME, PER_CAPITA_INCOME FROM CHICAGO_CENSUS_DATA WHERE PER_CAPITA_INCOME < 11000;



 * sqlite:///FinalDB.db
Done.


COMMUNITY_AREA_NUMBER,COMMUNITY_AREA_NAME,PER_CAPITA_INCOME
26.0,West Garfield Park,10934
30.0,South Lawndale,10402
37.0,Fuller Park,10432
54.0,Riverdale,8201


**Listing all case numbers for crimes involving minors? (children are not considered minors for the purposes of crime analysis)**

**This analysis was made using and comparing the community area number in the tables CHICAGO_CRIME_DATA, CHICAGO_CENSUS_DATA and using the PERCENT_AGED_UNDER_18_OR_OVER_64 from the table CHICAGO_CENSUS_DATA to see the percent of aged under 18 and at the same time the longitude and latitude of the tables  CHICAGO_PUBLIC_SCHOOLS and CHICAGO_CRIME_DATA were compared to see crimes in areas near to Schools related to involve minors.**

In [144]:
%sql select CRIME.CASE_NUMBER, CRIME.community_area_number, SCHOOLS.LONGITUDE, SCHOOLS.LATITUDE, CENSUS.PERCENT_AGED_UNDER_18_OR_OVER_64 from CHICAGO_CRIME_DATA AS CRIME, CHICAGO_CENSUS_DATA AS CENSUS, CHICAGO_PUBLIC_SCHOOLS AS SCHOOLS \
   WHERE CRIME.community_area_number =  CENSUS.community_area_number and PERCENT_AGED_UNDER_18_OR_OVER_64 > 30 and  ABS(SCHOOLS.LONGITUDE - CRIME.LONGITUDE) < 0.001 \
  AND ABS(SCHOOLS.LATITUDE - CRIME.LATITUDE) < 0.001;

 * sqlite:///FinalDB.db
Done.


CASE_NUMBER,COMMUNITY_AREA_NUMBER,Longitude,Latitude,PERCENT_AGED_UNDER_18_OR_OVER_64
JA550741,16.0,-87.70218806,41.95823045,31.6
HL718742,23.0,-87.71883334,41.89321142,38.0
HH639427,25.0,-87.76763207,41.89037849,37.9
HR286405,25.0,-87.75571809,41.87382439,37.9
HS200939,25.0,-87.75920297,41.87378243,37.9
HT315369,27.0,-87.70801612,41.87847194,43.2
HJ409442,30.0,-87.71598119,41.8485071,33.8
HP716225,35.0,-87.61767315,41.84018775,30.7
HJ723825,39.0,-87.60383761,41.81369471,35.4
HJ723825,39.0,-87.6038735,41.81534262,35.4


**Listing all kidnapping crimes involving a child**

In [145]:
%sql SELECT * FROM CHICAGO_CRIME_DATA WHERE PRIMARY_TYPE = 'KIDNAPPING' and DESCRIPTION LIKE '%CHILD%';

 * sqlite:///FinalDB.db
Done.


ID,CASE_NUMBER,DATE,BLOCK,IUCR,PRIMARY_TYPE,DESCRIPTION,LOCATION_DESCRIPTION,ARREST,DOMESTIC,BEAT,DISTRICT,WARD,COMMUNITY_AREA_NUMBER,FBICODE,X_COORDINATE,Y_COORDINATE,YEAR,LATITUDE,LONGITUDE,LOCATION
5276766,HN144152,2007-01-26,050XX W VAN BUREN ST,1792,KIDNAPPING,CHILD ABDUCTION/STRANGER,STREET,0,0,1533,15,29.0,25.0,20,1143050.0,1897546.0,2007,41.87490841,-87.75024931,"(41.874908413, -87.750249307)"


**Listing the kind of crimes that were recorded at schools. (No repetitions)**

In [146]:
%sql SELECT DISTINCT SCHOOL.COMMUNITY_AREA_NUMBER, SCHOOL.NAME_OF_SCHOOL, CRIME.PRIMARY_TYPE, CRIME.DESCRIPTION FROM CHICAGO_PUBLIC_SCHOOLS AS SCHOOL, CHICAGO_CRIME_DATA AS CRIME \
WHERE SCHOOL.COMMUNITY_AREA_NUMBER = CRIME.COMMUNITY_AREA_NUMBER LIMIT 20;

 * sqlite:///FinalDB.db
Done.


COMMUNITY_AREA_NUMBER,NAME_OF_SCHOOL,PRIMARY_TYPE,DESCRIPTION
7,Abraham Lincoln Elementary School,BATTERY,SIMPLE
7,Abraham Lincoln Elementary School,BURGLARY,FORCIBLE ENTRY
7,Abraham Lincoln Elementary School,LIQUOR LAW VIOLATION,SELL/GIVE/DEL LIQUOR TO MINOR
7,Abraham Lincoln Elementary School,THEFT,$500 AND UNDER
43,Adam Clayton Powell Paideia Community Academy Elementary School,ASSAULT,SIMPLE
43,Adam Clayton Powell Paideia Community Academy Elementary School,BATTERY,DOMESTIC BATTERY SIMPLE
43,Adam Clayton Powell Paideia Community Academy Elementary School,BATTERY,SIMPLE
43,Adam Clayton Powell Paideia Community Academy Elementary School,BURGLARY,FORCIBLE ENTRY
43,Adam Clayton Powell Paideia Community Academy Elementary School,CRIMINAL DAMAGE,TO VEHICLE
43,Adam Clayton Powell Paideia Community Academy Elementary School,NARCOTICS,POSS: CANNABIS 30GMS OR LESS


**Listing 5 community areas with highest % of households below poverty line**

In [147]:
%sql SELECT COMMUNITY_AREA_NUMBER, COMMUNITY_AREA_NAME, PERCENT_HOUSEHOLDS_BELOW_POVERTY FROM CHICAGO_CENSUS_DATA  ORDER BY PERCENT_HOUSEHOLDS_BELOW_POVERTY DESC LIMIT 5;

 * sqlite:///FinalDB.db
Done.


COMMUNITY_AREA_NUMBER,COMMUNITY_AREA_NAME,PERCENT_HOUSEHOLDS_BELOW_POVERTY
54.0,Riverdale,56.5
37.0,Fuller Park,51.2
68.0,Englewood,46.6
29.0,North Lawndale,43.1
27.0,East Garfield Park,42.4


**Which community area is most crime prone? Display the coumminty area number only.**

In [148]:
%sql SELECT COMMUNITY_AREA_NUMBER  FROM CHICAGO_CRIME_DATA GROUP BY COMMUNITY_AREA_NUMBER ORDER BY COUNT(*) DESC LIMIT 1;


 * sqlite:///FinalDB.db
Done.


COMMUNITY_AREA_NUMBER
25.0


**Obtaining the name of the community area with highest hardship index**

In [149]:
%sql SELECT COMMUNITY_AREA_NAME FROM CHICAGO_CENSUS_DATA WHERE HARDSHIP_INDEX = (SELECT MAX(HARDSHIP_INDEX) FROM CHICAGO_CENSUS_DATA)

 * sqlite:///FinalDB.db
Done.


COMMUNITY_AREA_NAME
Riverdale


**Obtaining the Community Area Name with most number of crimes?**

In [150]:
%sql SELECT COMMUNITY_AREA_NAME FROM CHICAGO_CENSUS_DATA WHERE COMMUNITY_AREA_NUMBER = (SELECT COMMUNITY_AREA_NUMBER FROM CHICAGO_CRIME_DATA GROUP BY COMMUNITY_AREA_NUMBER ORDER BY COUNT(*) DESC LIMIT 1);

 * sqlite:///FinalDB.db
Done.


COMMUNITY_AREA_NAME
Austin
