A hands-on beginner-level data science project focused on data wrangling and cleaning using pure Python — without relying on external libraries like Pandas or NumPy.
This project demonstrates how to extract, clean, and process nested JSON data manually while simulating real-world data issues like missing values, duplicate records, and inactive users.
To develop a foundational understanding of how raw, unstructured data can be cleaned, transformed, and used for analysis or recommendation logic — using only core Python.
- Python (Standard Library only)
- Jupyter Notebook
- JSON file handling
- Logic building without third-party tools
| Notebook File | Input JSON File | Purpose |
|---|---|---|
01_introduction.ipynb |
data.json |
Explore and visualize structure |
02_data_cleaning.ipynb |
data2.json → cleaned_data2.json |
Manual data cleaning |
03_people_you_may_know.ipynb |
massive_data.json |
Recommend users using mutual friend logic |
04_pages_you_might_like.ipynb |
massive_data.json |
Recommend pages using similarity logic |
- Load and process nested JSON data
- Remove invalid users (missing names, empty connections)
- Eliminate duplicate pages and friend entries
- Generate simple friend and page recommendations using logic-based filtering
- Write cleaned data to a new JSON output file
- JSON file handling
- Python loops, conditions, and functions
- Data cleaning logic (without Pandas)
- File I/O
- Recommendation algorithms (rule-based)
🟢 Project Complete
📤 Uploaded to GitHub
📝 Can be added to resume and shared with recruiters
Your Name
Aspiring Data Scientist | BTech CSE Student | Python Enthusiast
GitHub • LinkedIn
This project shows that even without libraries like Pandas or NumPy, data analysis and cleaning is possible with strong logic and understanding of Python.