Welcome to my comprehensive Data Analysis project using Python! This repository covers the full data analysis pipeline — from data importing and cleaning, through exploratory analysis and visualization, to model development and evaluation. It includes practical applications on real-world datasets like car prices, laptop specifications, house sales, and medical insurance.
This repository contains the following datasets in .csv format:
-
auto.csv, automobile.csv, module_5_auto.csv
: Automotive datasets for price prediction and regression modeling. -
kc_house_data_NaN.csv
: Real estate data for predicting house prices. -
medical_insurance_dataset.csv
: Data for predicting insurance charges. -
laptop_pricing_dataset_mod2.csv, Laptops.csv
: Laptop pricing and specification datasets for regression and classification. -
clean_df.csv
: Pre-cleaned dataset used in analysis workflows.
-
📥 Data Importing & Cleaning 1. Importing_and_Understanding_data.ipynb 2. Data-Wrangling.ipynb 3. practice_data_wrangling_LaptopData.ipynb
-
📊 Exploratory Data Analysis (EDA) 1. Exploratory_data_analysis_cars.ipynb
2. *Practice_Exploratory_data_analysis_Laptop.ipynb*
-
🧠 Model Development 1. Model-Development.ipynb
2. *Practice_Model_Development_Laptops.ipynb*
-
✅ Model Evaluation & Refinement 1. Model_Evaluation_and_Refinement_cars.ipynb
2. *Practice_Model_Evaluation_Laptops.ipynb*
-
💼 Real-World Projects 1. Practice_Project_Medical_Insurance.ipynb
2. *Practice_Loading Laptop_Pricing.ipynb*
This repository demonstrates and practices the following key data analysis skills:
-
📥 Data Importing & Cleaning: Handling missing values, formatting issues, and inconsistent data types.
-
📊 Exploratory Data Analysis: Using pandas, matplotlib, and seaborn for summarizing and visualizing data.
-
🔎 Feature Engineering: Creating new features and selecting important ones for modeling.
-
📈 Model Development: Building Linear, Multiple, and Polynomial Regression models using scikit-learn.
-
🧪 Model Evaluation: Assessing models with R² score, RMSE, MAE, and residual plots.
-
🛠️ Practical Projects: Real-world applications in medical insurance prediction, laptop pricing, and house price modeling.
-
Python 3.x
-
Pandas
-
NumPy
-
Matplotlib
-
Seaborn
-
Scikit-learn
-
Jupyter Notebook
📌 How to Use Clone this repository:
Run the notebooks to explore the data analysis workflows end-to-end.
📚 When Data Analysis Hands You a Lapse... "You pivot, visualize, refine — and let Python make sense of the mess." 💡
📬 Contact Feel free to reach out for collaborations or questions!