🗽 NYC Schools Analysis

This nyc-schools-analysis project was completed as part of my Onboarding Process as an Intern at Webeet.io.

The primary objective was to gain hands-on experience with real-world data stacks and workflows, while strengthening my understanding of GitHub collaboration — including branch creation, commits, pull requests, and code review processes.

Throughout the project, I also honed my skills in Google Sheets, Python, SQL, and ETL (Extract, Transform, Load) workflows.

🧩 Project Structure

The repository is organized into four key sections, each focusing on a different stage of the data analytics and database integration workflow:

1. Incident Analysis

File: school-safety-report.csv Tool: Google Sheets

Description: Performed exploratory data analysis (EDA) using Google Sheets. Standardized column names, cleaned the dataset, and used pivot tables to explore school safety incidents across NYC. Summarized borough-level insights, identified trends, and evaluated incident types.

2. School Directory Exploration

File: high-school-directory.csv Tool: Python (VS Code Notebook)

Description: Loaded and explored the NYC High School Directory dataset using Pandas. Cleaned data, analyzed demographics and programs, and answered guided analysis questions within a Jupyter notebook environment.

3. Database Queries

Tool: Python (VS Code Notebook) + SQL

Description: Connected to a PostgreSQL database using psycopg2.connect and executed SQL queries directly from the notebook. Practiced writing joins, aggregations, and filtering to extract meaningful insights from school data. Identified data inconsistencies and proposed standardization strategies for improved accuracy.

4. Database Population

File: sat-results.csv Tool: Python (VS Code Notebook)

Description: Designed a database schema and implemented a full ETL workflow. Cleaned and validated SAT results data (e.g., score range checks, NULL handling, header normalization). Uploaded the cleaned dataset to the PostgreSQL database while ensuring data integrity and consistency.

🛠️ Tools & Technologies

Category	Tools & Libraries
Data Exploration	Google Sheets, Pandas
Programming	Python (VS Code, Jupyter Notebook)
Database Management	PostgreSQL, SQL, psycopg2
Version Control	Git & GitHub
Environment	Visual Studio Code

🚀 Key Takeaways

Strengthened understanding of ETL pipelines and data integration.
Improved proficiency with Python for data cleaning and SQL for analysis.
Gained hands-on experience with real-world datasets and relational databases.
Practiced collaborative workflows using Git and GitHub (branching, PRs, and reviews).
Developed a holistic view of how data flows through a modern analytics stack.

✅ Summary: This project provided end-to-end exposure to the modern data workflow — from raw CSV exploration to database integration — reinforcing technical and collaborative skills essential for real-world data engineering and analytics projects.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
database_population		database_population
database_queries		database_queries
incident_analysis		incident_analysis
school_directory_exploration		school_directory_exploration
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🗽 NYC Schools Analysis

🧩 Project Structure

1. Incident Analysis

2. School Directory Exploration

3. Database Queries

4. Database Population

🛠️ Tools & Technologies

🚀 Key Takeaways

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🗽 NYC Schools Analysis

🧩 Project Structure

1. Incident Analysis

2. School Directory Exploration

3. Database Queries

4. Database Population

🛠️ Tools & Technologies

🚀 Key Takeaways

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages