This repository contains three data science projects focusing on prediction and analysis tasks. Each project includes datasets, Python code, and a detailed report.
Objective:
Classify Iris flowers into three species (Setosa, Versicolor, Virginica) based on petal and sepal measurements.
Dataset:
- 150 samples with features: Sepal Length, Sepal Width, Petal Length, Petal Width.
- Target column: Species.
Workflow:
- Data Loading & Exploration: Loaded and inspected dataset with pandas.
- Visualization: Created pairplots using seaborn; observed clear clusters for each species.
- Model Building: Used LogisticRegression with an 80/20 train-test split.
- Evaluation: Achieved 100% accuracy on test data.
- User Interaction: Added prompts to predict species for new measurements.
Insights:
- Petal and sepal measurements are sufficient for accurate classification.
- The model can automate species identification in botanical applications.
Files:
CodeAlpha_Iris_Classification/iris_project.py— Python codeCodeAlpha_Iris_Classification/Iris.csv— DatasetCodeAlpha_Iris_Classification/IRIS FLOWER CLASSIFICATION (REPORT).pdf— Report
Objective:
Predict the selling price of used cars based on features like brand, year, mileage, fuel type, transmission, and ownership.
Dataset:
- 301 entries with columns: Car_Name, Year, Selling_Price, Present_Price, Driven_kms, Fuel_Type, Selling_type, Transmission, Owner.
Workflow:
- Data Loading & Cleaning: Handled missing values and cleaned columns.
- Feature Engineering: Converted categorical features using one-hot encoding.
- Model Preparation: Defined features (X) and target (y = Selling_Price); train-test split (80/20).
- Model Training: LinearRegression from scikit-learn.
- Prediction & Evaluation: MAE = 2.04, R² = 0.6; scatter plots for actual vs predicted prices.
- Visualization: Explored feature impact on prices.
Insights:
- Present_Price, Year, mileage, and fuel type significantly affect selling price.
- Useful for dealer valuations, online marketplaces, or financial assessments.
Files:
CodeAlpha_Car_Price_Prediction/car_price_prediction.py— Python codeCodeAlpha_Car_Price_Prediction/car_data.csv— DatasetCodeAlpha_Car_Price_Prediction/CAR PRICE PREDICTION (REPORT).pdf— Report
Objective:
Analyze unemployment trends in India, identify seasonal patterns, and examine Covid-19 impact.
Datasets:
CodeAlpha_Unemployment_Analysis/Unemployment in India.csv— historical employment dataCodeAlpha_Unemployment_Analysis/Unemployment_Rate_upto_11_2020.csv— monthly unemployment rates up to Nov 2020
Workflow:
- Data Loading & Cleaning: Converted date columns and removed missing values.
- Exploration & Visualization: Statistical summaries, time-series plots, boxplots for seasonal trends.
- Covid-19 Analysis: Compared pre-Covid and during-Covid unemployment; average rate rose from 9.23% → 12.96%.
- Insights Generation: Identified spikes during pandemic and recurring seasonal trends.
Insights:
- Sharp rise in unemployment during Covid-19.
- Seasonal patterns highlight temporary/informal employment effects.
- Results can guide policy and economic planning.
Files:
CodeAlpha_Unemployment_Analysis/unemployment_analysis.py— Python codeCodeAlpha_Unemployment_Analysis/Unemployment in India.csv— DatasetCodeAlpha_Unemployment_Analysis/Unemployment_Rate_upto_11_2020.csv— DatasetCodeAlpha_Unemployment_Analysis/EMPLOYMENT ANALYSIS (REPORT).pdf— Report
- Clone the repository:
git clone https://github.com/RafayImraan/codealpha_tasks.git