Skip to content

Minsifye/portfolio

Repository files navigation

Monika Bagyal

Data Science Portfolio

Sparkify is a fictional music streaming service like Spotify or Pandora. I will be using Sparkify Churn Prediction as a problem statement. The main findings of the code can be found at the Medium post available here.

Future Works

  • In the future, I can try to separate the whole dataset in monthly or weekly data and predict for next month or week churned customers.
  • I can use more advance machine learning techniques, by combining two or more algorithms to improve the overall prediction rate.
  • I can run this over the AWS cluster to see the model performance and use cross-validation for a better f1 score.

To analyze disaster data from Figure Eight to build a model for an API that classifies disaster messages.

  • The outcome of this project is to start from scratch with a dataset, create a ETL pipeline for data engineering job and create a Machine Learning pipeline to train a model which can read text data and predict 36 classification categories.
  • At the end, use that trained and tuned ML model and use to predict any new message and find which disaster category it will fit.
  • Create a front-end application using flask to showcase visualization and model disaster category prediction on a webpage.

Analyze the interactions that users have with articles on the IBM Watson Studio platform, and make recommendations to them about new articles you think they will like.

This project is divided into the following tasks:

  • I. Exploratory Data Analysis
  • II. Rank Based Recommendations
  • III. User-User Based Collaborative Filtering
  • V. Matrix Factorization

I have used CRISP-DM process during this analysis.

  1. Business Understanding - Started analysis with posed questions in mind.
  2. Data Understanding - To better understand the data, I started going through the dataset and noted points as how to use it for my analysis. For example: which columns will be helpful to answer a particular questions?
  3. Prepare Data - At various points, I have to do data wrangling and perform data transformation to achieve the results. Keeping DRY techniques in mind, I have also created a function to draw plotly barchart as this code was repeating often.
  4. Model Data - My analysis does not involve modeling step. I might add this in my future work. Results - I am using visualizations like barchart and piecharts to convey my findings, also added result statements at the end of every visualization for easy understanding of thought process.
  5. Deploy - I am not deploying this code anywhere right now. For now, it is available in jupyter notebook form only.

The main findings of the code can be found at the Medium post available here.

  • Natural Language Processing on Online Reviews Data, a BigView Idea. Created at Pearl Hackathon event at UNC Chapel Hill.
  • Simplifying Daily life buying decisions by getting insights from online reviews faster.
  • Identifying Suspicious Activities in Financial Data, in this project we are trying to find, how suspicious activities can be caught using a supervised learning algorithm in existing customer data under the compliance department of a bank or financial institution.
  • The results show that False Positives can be reduced using Supervised Machine Learning algorithms because these algorithms have the potential to differentiate between regular and suspicious patterns of customer activity. This project is created under ITCS 6156 - Machine Learning at UNC Charlotte. Created on Dec,2019 Quick View.
  • In this project, I have implemented an image classification application using a deep learning model on a dataset of images.
  • First I have trained the model to classify new images using Jupyter notebook and then converted it into a Python application that will run from the command line in a system. A Udacity Data Scientist Nanodegree Project-Term1. Created on Mar,2019 Quick View.
  • The data and design for this project were provided by Arvato Financial Services. I have applied unsupervised learning techniques on demographic and spending data for a sample of German households.
  • I have preprocessed the data, applied dimensionality reduction techniques, and implemented clustering algorithms to segment customers with the goal of optimizing customer outreach for a mail-order company. A Udacity Data Scientist Nanodegree Project-Term1. Created on Mar,2019 Quick View.
  • CharityML is a fictitious charity organization that provides financial support for people learning machine learning. In an effort to improve donor outreach effectiveness, I have built an algorithm that best identifies potential donors.
  • My goal was to evaluate and optimize several different supervised learners to determine which algorithm will provide the highest donation yield. A Udacity Data Scientist Nanodegree Project-Term1. Created on Jan,2019 Quick View.

Medium