Home

Short Name

Analyze open medical data sets to gain insights

Short Description

Use Machine Learning to Predict U.S. Opioid Prescribers with DSX and Scikit Learn

Offering Type

Cognitive & Data Analytics

Introduction

This pattern dives into a dataset which looks at opioid overdose deaths by state as well as different, unique physicians, their credentials, specialties, whether or not they've prescribed opioids in 2014 as well as the specific prescriptions they've prescribed. Follow along to see how to explore the data in a DSX notebook, visualize a few initial findings using Pixie Dust, and then use scikit learn to use machine learning to train several models and evaluate which have the most accurate predictions of opioid prescriptions.

Author

Madison J. Myers

Code

https://github.com/IBM/predict-opioid-prescribers

Overview

A pattern focusing on how to use scikit learn and python (in DSX) to predict opioid prescribers based off of a 2014 kaggle dataset. This code pattern was created for data scientists and data lovers who are interested in social justice issues, health issues and/or those who are new to DSX and machine learning. This will guide the user through exploring data, cleaning data, training models and evaluating them.

The user will learn:

How to use DSX.
How to explore multiple dataframes.
How to visualize explorations.
How to clean the data using python and pandas.
How to build several machine learning models to predict a target variable.
How to evaluate the models' performance.

Flow

Log into IBM's DSX service.
Upload the data as a data asset into DSX.
Start a notebook in DSX and input the data asset previously created.
Explore the data with pandas
Create data visualizations with Pixie Dust.
Train machine learning models with scikit learn.
Evaluate their prediction performance.

Included Components

IBM Data Science Experience: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
Watson Analytics: Watson Analytics guides analysis with automated data visualization and discovery so you can uncover insights on your own.

Featured technologies

Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.
Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.
pandas: A Python library providing high-performance, easy-to-use data structures.

Blog

http://developer.ibm.com/code/?p=31248&preview=1&_ppp=b4fbb9398d

Links

DSX: https://datascience.ibm.com/docs/content/analyze-data/creating-notebooks.html.
Pandas:http://pandas.pydata.org/
Pixie Dust: https://ibm-watson-data-lab.github.io/pixiedust/displayapi.html#introduction
Data Set used: https://www.kaggle.com/apryor6/us-opiate-prescriptions/data
Scikit Learn: http://scikit-learn.org/stable/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly