Skip to content

Onyi-RICH/nyc-schools-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🗽 NYC Schools Analysis

This nyc-schools-analysis project was completed as part of my Onboarding Process as an Intern at Webeet.io.

The primary objective was to gain hands-on experience with real-world data stacks and workflows, while strengthening my understanding of GitHub collaboration — including branch creation, commits, pull requests, and code review processes.

Throughout the project, I also honed my skills in Google Sheets, Python, SQL, and ETL (Extract, Transform, Load) workflows.


🧩 Project Structure

The repository is organized into four key sections, each focusing on a different stage of the data analytics and database integration workflow:

File: school-safety-report.csv Tool: Google Sheets

Description: Performed exploratory data analysis (EDA) using Google Sheets. Standardized column names, cleaned the dataset, and used pivot tables to explore school safety incidents across NYC. Summarized borough-level insights, identified trends, and evaluated incident types.


File: high-school-directory.csv Tool: Python (VS Code Notebook)

Description: Loaded and explored the NYC High School Directory dataset using Pandas. Cleaned data, analyzed demographics and programs, and answered guided analysis questions within a Jupyter notebook environment.


Tool: Python (VS Code Notebook) + SQL

Description: Connected to a PostgreSQL database using psycopg2.connect and executed SQL queries directly from the notebook. Practiced writing joins, aggregations, and filtering to extract meaningful insights from school data. Identified data inconsistencies and proposed standardization strategies for improved accuracy.


File: sat-results.csv Tool: Python (VS Code Notebook)

Description: Designed a database schema and implemented a full ETL workflow. Cleaned and validated SAT results data (e.g., score range checks, NULL handling, header normalization). Uploaded the cleaned dataset to the PostgreSQL database while ensuring data integrity and consistency.


🛠️ Tools & Technologies

Category Tools & Libraries
Data Exploration Google Sheets, Pandas
Programming Python (VS Code, Jupyter Notebook)
Database Management PostgreSQL, SQL, psycopg2
Version Control Git & GitHub
Environment Visual Studio Code

🚀 Key Takeaways

  • Strengthened understanding of ETL pipelines and data integration.
  • Improved proficiency with Python for data cleaning and SQL for analysis.
  • Gained hands-on experience with real-world datasets and relational databases.
  • Practiced collaborative workflows using Git and GitHub (branching, PRs, and reviews).
  • Developed a holistic view of how data flows through a modern analytics stack.

✅ Summary: This project provided end-to-end exposure to the modern data workflow — from raw CSV exploration to database integration — reinforcing technical and collaborative skills essential for real-world data engineering and analytics projects.

About

End-to-end data cleaning and database pipeline exploring NYC school data with Python, SQL, and PostgreSQL.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors