Skip to content
This repository has been archived by the owner on Jan 14, 2022. It is now read-only.
Dolph Mathews edited this page Nov 8, 2018 · 7 revisions

Short Name

Build a Customer Churn Predictor using Watson Studio and Jupyter Notebooks

Short Description

Use IBM Watson Studio to go through the whole data science pipeline to solve a business problem and predict customer churn using Telco customer churn dataset.

Offering Type

Cognitive

Introduction

This journey will walk you through the full cycle of a data science project. You will begin by understanding the business perspective of the problem, here we used customer churn. Then, you will use the available dataset to gain insights and build a predictive model for use on future data. You will deploy the model into production and use it to score on data collected from a user interface.

Author

by Heba El-Shimy and Scott D'Angelo

Code

Demo

Video

Overview

Customer churn is one of the most basic factors in determining the revenues of a business. You need to know which of your customers are loyal and which are in the risk of churning and you need to know the factors that affect these decisions from a customer perspective. In this journey, you will build a machine learning model and use it to predict whether a customer will churn or not. This is a full data science project from A-Z. You can use the findings of your model for prescriptive analysis later or for targeted marketing.

When the reader has completed this journey, they will understand how to:

  • Use Jupyter Notebooks to load, visualize, and analyze data
  • Run Notebooks in IBM Watson Studio
  • Load data from IBM Cloud Object Storage
  • Build, test and compare different machine learning models using Scikit-Learn
  • Deploy a selected machine learning model to production using Watson Studio
  • Create a front-end application to interface with the client and start consuming your deployed model.

Flow

  1. Understand the business problem.
  2. Load the provided notebook into the Watson Studio platform.
  3. Telco customer churn data set is loaded into the Jupyter Notebook.
  4. Describe, analyze and visualize data in the notebook.
  5. Preprocess the data, build machine learning models and test them.
  6. Deploy a selected machine learning model into production.
  7. Interact and consume your model using a frontend application.

Included Components

  • IBM Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.

Featured Technologies

  • Jupyter Notebooks: An open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and explanatory text.
  • Pandas: An open source library providing high-performance, easy-to-use data structures, and data analysis tools for the Python programming language.
  • Seaborn: A Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
  • Scikit-Learn: Machine Learning in Python. Simple and efficient tools for data mining and data analysis.
  • Watson Machine Learning Client: A library that allows working with Watson Machine Learning service on IBM Cloud. Train, test and deploy your models as APIs for application development, share with colleagues using this python library.
  • NodeJS: A JavaScript runtime built on Chrome's V8 JavaScript engine, used for building full stack Javascript web applications.
  • ExpressJS: A minimal and flexible Node.js web application framework that provides a robust set of features for web and mobile applications.

Blog

In today's world, all businesses are collecting vast amounts of data to gain an advantage in the market. One of the uses of the data they are collecting is tracking their customers' behaviors and finding the patterns that may lead to them churning. This gives companies a lot of handy tools as where to spend on marketing campaigns, offers, service enhancements, etc.

Usually, this process is not straightforward and requires a lot of teams involved and cooperating. From the business executives who understand the problem and the needed outcomes, to IT specialists who store the data and maintain it, to data scientists who access the data, collect it from multiple places where it may reside, to engineers who deploy the models and maintain them in production with the help of data scientists and the latter should communicate the findings and results back to the business executives so they can make informed decisions.

This full cycle is called CRISPDM (Cross-Industry Process for Data Mining), and it's widely used when applying Data Science to business problems. In this pattern, you will learn how to use this methodology in practice and apply it to a real-world scenario using IBM Watson Studio.

Links