Welcome to my Data Analysis Hub π β a curated collection of my analytical projects where I explore, clean, visualize, and interpret diverse datasets using Python, Pandas, Matplotlib, Seaborn, and PySpark.
This repository combines multiple standalone projects into one central hub, showcasing my journey in data wrangling, statistical exploration, feature engineering, and storytelling with data.
-
Data Exploration
Basic data cleaning, descriptive statistics, and pattern discovery. -
COVID Data Exploration
Analyzing global COVID-19 datasets with trend insights and visualizations. -
World Happiness Analysis
Investigating happiness scores, socio-economic factors, and correlations. -
PySpark Data Analysis
Big data processing and exploration using Apache Spark with Python.
- Languages: Python, SQL
- Libraries: Pandas, NumPy, Matplotlib, Seaborn, Plotly
- Big Data: PySpark, Hadoop Ecosystem
- Tools: Jupyter Notebook, Google Colab
- End-to-end exploratory data analysis (EDA)
- Data cleaning and preprocessing pipelines
- Advanced visualization techniques for storytelling
- Real-world datasets with practical insights
- PySpark workflows for large-scale analysis
Want to collaborate? Fork the repo, create a branch, and submit a PR. Suggestions are always welcome!
- Author: Hamza
- Email: a275hamza@gmail.com
- GitHub: Hamzi275
β¨ Exploring data, uncovering stories, and making sense of the numbers.