This repository contains solutions to three tasks involving data manipulation, visualization, and predictive modeling. Each task demonstrates specific data science techniques using Python.
Clean and analyze the employee_data.csv
dataset.
- Remove duplicate entries.
- Handle missing values (fill them with default values or drop the rows).
- Convert the
JoiningDate
column to a proper datetime format. - Filter out employees where the
Status
is "Resigned". - Analyze the data:
- Find the average salary by department.
- List employees who joined after 2020.
Refer to the script Task1_Data Manipulation and Cleaning.ipynb
.
- Cleaned DataFrame.
- Average salary per department.
- List of employees who joined after 2020.
Explore a public dataset through visualizations.
Any public dataset can be used (e.g., Titanic dataset). Titanic dataset
- Load the dataset into a Pandas DataFrame.
- Create four meaningful visualizations using Matplotlib or Seaborn.
- Examples: Bar plots, histograms, box plots, etc.
- Generate a correlation heatmap for numerical variables.
Refer to the script Task2_Data Visualization.ipynb
.
- Visualizations:
- Survival rate by passenger class.
- Histogram of passenger ages.
- Box plot of fare distribution.
- Correlation heatmap.
Build a classification model to predict the presence of diabetes based on health metrics.
- Data Preprocessing:
- Replace missing or undefined values.
- Convert categorical variables to numerical using encoding techniques.
- Normalize or scale features if necessary.
- Model Building:
- Split the dataset into training and testing sets.
- Train two classification models:
- Logistic Regression
- Decision Tree
- Model Evaluation:
- Evaluate models using accuracy, precision, recall, and F1 score.
Refer to the script Task3_Predictive Modeling.ipynb
.
- Preprocessed dataset.
- Performance metrics for Logistic Regression and Decision Tree models:
- Accuracy, Precision, Recall, F1 Score.
- Clone the repository:
git clone https://github.com/AbhinavSharma07/Coder_Roots.git cd Coder_Roots