Data Science Portfolio
The key to success in any organization is attracting and retaining top talent. This dashboard gives an overview of the employee attriton data based on different gender groups, age groups, job roles, etc.
The dataset is available on kaggle
The board of a human capital department of a large corporation is worried about the relatively high turnover, and the analytics team must look into ways to reduce the number of employees leaving the company. The team needs to understand better the situation, which employees are more likely to leave, and why. Once it is clear what variables impact employee churn, the team can present findings along with ideas on how to attack the problem.
This project was a part of datacamp competiton and the dataset was provided by them. Datacamp
- Obtained the dataset using Kaggle and implemented three rainfall prediction models, namely; Support Vector Regression, Random Forest regressor and Decision Tree Regressor.
- Performed the pre-processing, feature engineering, EDA and implemented machine learning models using R Script.
- Hypertuned the models for better accuracy.
- All the three models with the least RMSE among all the experiments are as follows:
- Decision Tree Regressor (70:30 split): 0.2859088
- Random Forest Regressor (ntree=500): 0.1569539
- Support Vector Regressor (Feature scaled data): 0.361147
- Implemented Linear Regression model too predict the price of the houses in USA.
- Checked missing values and also checked the correlation between the variables in the dataset.
- Split the dataset into training - testing as 0.8:0.2
- Used R^2 as the performance measure of the model and got the score as 92.69%
- Predicted the price of the house based on user input.
- The dataset for this project is the Chest X-Ray Images (Pneumonia), published by Daniel et al., (2018), and available on Kaggle by Paul Mooney.
- Implemented simple CNN and transfer Learning Models using python.
- Total of five models were implemented for this project - CNN Model 1, CNN Model 2, CNN Model 3, VGG16 and VGG19.
- Performed data preparation for images (Image Scaling, Image Augmentation), Exploratory Data Analysis.
- Fine Tuned the models to decide the best performing models.
- Accuracy for the five models are as follows:
- CNN Model 1: 94.17
- CNN Model 2: 95.13
- CNN Model 3: 92.48
- VGG16: 82.85
- VGG19: 84.45
- Used a SMS Collection dataset stored in text format to classify the SMS as SPAM or HAM.
- Used the concepts of Natural Language Processing for text pre-processing (removing stopwords, punctuations,etc.)
- Implemented Naive Bayes Classification Model.
- The accuracy of the model was high (97.91%), which means the model performed really well.
- Classified the SMS as SPAM or HAM based on user input.
- Performed Exploratory data analysis on Iris Dataset using python.
- Several python libraries were used and the features and label columns were selected.
- The visualization shows various plots such as Pie Chart, Historgram and Scatterplots.
- Performed Exploratory Data Analysis on 50 Startup Dataset using python.