Skip to content
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.
Steve Martinelli edited this page Mar 6, 2018 · 6 revisions

Short Name

Analyze open medical data sets to gain insights

Short Description

Use Machine Learning to Predict U.S. Opioid Prescribers with DSX and Scikit Learn

Offering Type

Cognitive & Data Analytics

Introduction

This pattern dives into a dataset which looks at opioid overdose deaths by state as well as different, unique physicians, their credentials, specialties, whether or not they've prescribed opioids in 2014 as well as the specific prescriptions they've prescribed. Follow along to see how to explore the data in a DSX notebook, visualize a few initial findings using Pixie Dust, and then use scikit learn to use machine learning to train several models and evaluate which have the most accurate predictions of opioid prescriptions.

Author

Code

Overview

A pattern focusing on how to use scikit learn and python (in DSX) to predict opioid prescribers based off of a 2014 kaggle dataset. This code pattern was created for data scientists and data lovers who are interested in social justice issues, health issues and/or those who are new to DSX and machine learning. This will guide the user through exploring data, cleaning data, training models and evaluating them.

The user will learn:

  • How to use DSX.
  • How to explore multiple dataframes.
  • How to visualize explorations.
  • How to clean the data using python and pandas.
  • How to build several machine learning models to predict a target variable.
  • How to evaluate the models' performance.

Flow

  1. Log into IBM's DSX service.
  2. Upload the data as a data asset into DSX.
  3. Start a notebook in DSX and input the data asset previously created.
  4. Explore the data with pandas
  5. Create data visualizations with Pixie Dust.
  6. Train machine learning models with scikit learn.
  7. Evaluate their prediction performance.

Included Components

  • IBM Data Science Experience: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
  • Watson Analytics: Watson Analytics guides analysis with automated data visualization and discovery so you can uncover insights on your own.

Featured technologies

  • Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.
  • Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.
  • pandas: A Python library providing high-performance, easy-to-use data structures.

Blog

Links