This nyc-schools-analysis project was completed as part of my Onboarding Process as an Intern at Webeet.io.
The primary objective was to gain hands-on experience with real-world data stacks and workflows, while strengthening my understanding of GitHub collaboration — including branch creation, commits, pull requests, and code review processes.
Throughout the project, I also honed my skills in Google Sheets, Python, SQL, and ETL (Extract, Transform, Load) workflows.
The repository is organized into four key sections, each focusing on a different stage of the data analytics and database integration workflow:
File: school-safety-report.csv
Tool: Google Sheets
Description: Performed exploratory data analysis (EDA) using Google Sheets. Standardized column names, cleaned the dataset, and used pivot tables to explore school safety incidents across NYC. Summarized borough-level insights, identified trends, and evaluated incident types.
File: high-school-directory.csv
Tool: Python (VS Code Notebook)
Description: Loaded and explored the NYC High School Directory dataset using Pandas. Cleaned data, analyzed demographics and programs, and answered guided analysis questions within a Jupyter notebook environment.
Tool: Python (VS Code Notebook) + SQL
Description:
Connected to a PostgreSQL database using psycopg2.connect and executed SQL queries directly from the notebook.
Practiced writing joins, aggregations, and filtering to extract meaningful insights from school data.
Identified data inconsistencies and proposed standardization strategies for improved accuracy.
File: sat-results.csv
Tool: Python (VS Code Notebook)
Description: Designed a database schema and implemented a full ETL workflow. Cleaned and validated SAT results data (e.g., score range checks, NULL handling, header normalization). Uploaded the cleaned dataset to the PostgreSQL database while ensuring data integrity and consistency.
| Category | Tools & Libraries |
|---|---|
| Data Exploration | Google Sheets, Pandas |
| Programming | Python (VS Code, Jupyter Notebook) |
| Database Management | PostgreSQL, SQL, psycopg2 |
| Version Control | Git & GitHub |
| Environment | Visual Studio Code |
- Strengthened understanding of ETL pipelines and data integration.
- Improved proficiency with Python for data cleaning and SQL for analysis.
- Gained hands-on experience with real-world datasets and relational databases.
- Practiced collaborative workflows using Git and GitHub (branching, PRs, and reviews).
- Developed a holistic view of how data flows through a modern analytics stack.
✅ Summary: This project provided end-to-end exposure to the modern data workflow — from raw CSV exploration to database integration — reinforcing technical and collaborative skills essential for real-world data engineering and analytics projects.