Skip to content

Sarakaleem22/Sara_Portfolio

Repository files navigation

Sara_Portfolio

Data Science Portfolio

Data Analysis

Project Overview

The key to success in any organization is attracting and retaining top talent. This dashboard gives an overview of the employee attriton data based on different gender groups, age groups, job roles, etc.

The dataset is available on kaggle

Project Overview

The board of a human capital department of a large corporation is worried about the relatively high turnover, and the analytics team must look into ways to reduce the number of employees leaving the company. The team needs to understand better the situation, which employees are more likely to leave, and why. Once it is clear what variables impact employee churn, the team can present findings along with ideas on how to attack the problem.

This project was a part of datacamp competiton and the dataset was provided by them. Datacamp

Machine Learning

Project Overview

  • Obtained the dataset using Kaggle and implemented three rainfall prediction models, namely; Support Vector Regression, Random Forest regressor and Decision Tree Regressor.
  • Performed the pre-processing, feature engineering, EDA and implemented machine learning models using R Script.
  • Hypertuned the models for better accuracy.
  • All the three models with the least RMSE among all the experiments are as follows:
  • Decision Tree Regressor (70:30 split): 0.2859088
  • Random Forest Regressor (ntree=500): 0.1569539
  • Support Vector Regressor (Feature scaled data): 0.361147

Project Overview

  • Implemented Linear Regression model too predict the price of the houses in USA.
  • Checked missing values and also checked the correlation between the variables in the dataset.
  • Split the dataset into training - testing as 0.8:0.2
  • Used R^2 as the performance measure of the model and got the score as 92.69%
  • Predicted the price of the house based on user input.

Deep Learning

Project Overview

  • The dataset for this project is the Chest X-Ray Images (Pneumonia), published by Daniel et al., (2018), and available on Kaggle by Paul Mooney.
  • Implemented simple CNN and transfer Learning Models using python.
  • Total of five models were implemented for this project - CNN Model 1, CNN Model 2, CNN Model 3, VGG16 and VGG19.
  • Performed data preparation for images (Image Scaling, Image Augmentation), Exploratory Data Analysis.
  • Fine Tuned the models to decide the best performing models.
  • Accuracy for the five models are as follows:
  • CNN Model 1: 94.17
  • CNN Model 2: 95.13
  • CNN Model 3: 92.48
  • VGG16: 82.85
  • VGG19: 84.45

Natural Language Processing

Project Overview

  • Used a SMS Collection dataset stored in text format to classify the SMS as SPAM or HAM.
  • Used the concepts of Natural Language Processing for text pre-processing (removing stopwords, punctuations,etc.)
  • Implemented Naive Bayes Classification Model.
  • The accuracy of the model was high (97.91%), which means the model performed really well.
  • Classified the SMS as SPAM or HAM based on user input.

Exploratory Data Analysis

Project Overview

  • Performed Exploratory data analysis on Iris Dataset using python.
  • Several python libraries were used and the features and label columns were selected.
  • The visualization shows various plots such as Pie Chart, Historgram and Scatterplots.
  • Performed Exploratory Data Analysis on 50 Startup Dataset using python.

Profiles

About

Data Science Portfolio

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published