Sara_Portfolio

Data Science Portfolio

Data Analysis

Project: HR Analytics Dashboard using Power BI

Project Overview

The key to success in any organization is attracting and retaining top talent. This dashboard gives an overview of the employee attriton data based on different gender groups, age groups, job roles, etc.

The dataset is available on kaggle

Project: Employee Turnover

Project Overview

The board of a human capital department of a large corporation is worried about the relatively high turnover, and the analytics team must look into ways to reduce the number of employees leaving the company. The team needs to understand better the situation, which employees are more likely to leave, and why. Once it is clear what variables impact employee churn, the team can present findings along with ideas on how to attack the problem.

This project was a part of datacamp competiton and the dataset was provided by them. Datacamp

Machine Learning

Project 1: Rainfall prediction

Project Overview

Obtained the dataset using Kaggle and implemented three rainfall prediction models, namely; Support Vector Regression, Random Forest regressor and Decision Tree Regressor.
Performed the pre-processing, feature engineering, EDA and implemented machine learning models using R Script.
Hypertuned the models for better accuracy.
All the three models with the least RMSE among all the experiments are as follows:
Decision Tree Regressor (70:30 split): 0.2859088
Random Forest Regressor (ntree=500): 0.1569539
Support Vector Regressor (Feature scaled data): 0.361147

Project 2: USA-Housing Price Prediction

Project Overview

Implemented Linear Regression model too predict the price of the houses in USA.
Checked missing values and also checked the correlation between the variables in the dataset.
Split the dataset into training - testing as 0.8:0.2
Used R^2 as the performance measure of the model and got the score as 92.69%
Predicted the price of the house based on user input.

Deep Learning

Project 1: X-Ray Image Classification using Deep Learning

Project Overview

The dataset for this project is the Chest X-Ray Images (Pneumonia), published by Daniel et al., (2018), and available on Kaggle by Paul Mooney.
Implemented simple CNN and transfer Learning Models using python.
Total of five models were implemented for this project - CNN Model 1, CNN Model 2, CNN Model 3, VGG16 and VGG19.
Performed data preparation for images (Image Scaling, Image Augmentation), Exploratory Data Analysis.
Fine Tuned the models to decide the best performing models.
Accuracy for the five models are as follows:
CNN Model 1: 94.17
CNN Model 2: 95.13
CNN Model 3: 92.48
VGG16: 82.85
VGG19: 84.45

Natural Language Processing

Project 1: SPAM SMS Classifier

Project Overview

Used a SMS Collection dataset stored in text format to classify the SMS as SPAM or HAM.
Used the concepts of Natural Language Processing for text pre-processing (removing stopwords, punctuations,etc.)
Implemented Naive Bayes Classification Model.
The accuracy of the model was high (97.91%), which means the model performed really well.
Classified the SMS as SPAM or HAM based on user input.

Exploratory Data Analysis

Project 1: Data Exploration on Iris Data

Project Overview

Performed Exploratory data analysis on Iris Dataset using python.
Several python libraries were used and the features and label columns were selected.
The visualization shows various plots such as Pie Chart, Historgram and Scatterplots.

Project 2: Data Exploration on 50 Startup Data

Performed Exploratory Data Analysis on 50 Startup Dataset using python.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
50StartupsEDAExample.ipynb		50StartupsEDAExample.ipynb
HOUSING_ASSINGMENT_DAY7.ipynb		HOUSING_ASSINGMENT_DAY7.ipynb
Iris_Data_EDA.ipynb		Iris_Data_EDA.ipynb
README.md		README.md
SMS_Spam_Classifier.ipynb		SMS_Spam_Classifier.ipynb
TransferLearning.ipynb		TransferLearning.ipynb
X_RayImageclassifier_CNN.ipynb		X_RayImageclassifier_CNN.ipynb

Sarakaleem22/Sara_Portfolio

Folders and files

Latest commit

History

Repository files navigation

Sara_Portfolio

Data Analysis

Project Overview

Project Overview

Machine Learning

Project Overview

Project Overview

Deep Learning

Project Overview

Natural Language Processing

Project Overview

Exploratory Data Analysis

Project Overview

Profiles

About

Resources

Stars

Watchers

Forks

Languages