Skip to content
This repository has been archived by the owner on Jul 18, 2024. It is now read-only.
Rich Hagarty edited this page Dec 14, 2017 · 8 revisions

Short Name

Visualizing Food Insecurity in the US with DSX, PixieDust and Watson Analytics

Short Description

Conduct a data science pipeline of pre-processing and visualizing data by using a notebook, libraries and an analytic platform to build charts and plots so that you can communicate your findings to your viewer.

Offering Type

Cognitive & Data Analytics

Introduction

Often in data science we do a great deal of work to glean insights that have an impact on society or a subset of it and yet, often, we end up not communicating our findings or communicating them ineffectively to non data science audiences. That's where visualizations become the most powerful. By visualizing our insights and predictions, we, as data scientists and data lovers, can make a real impact and educate those around us that might not have had the same opportunity to work on a project of the same subject. By visualizing our findings and those insights that have the most power to do social good, we can bring awareness and maybe even change. This journey walks you through how to do just that, with IBM's Data Science Experience (DSX), Pandas, Pixie Dust and Watson Analytics.

For this particular journey, food insecurity throughout the US is focused on. Low access, diet-related diseases, race, poverty, geography and other factors are considered by using open government data. For some context, this problem is a more and more relevant problem for the United States as obesity and diabetes rise and two out of three adult Americans are considered obese, one third of American minors are considered obsese, nearly ten percent of Americans have diabetes and nearly fifty percent of the African American population have heart disease. Even more, cardiovascular disease is the leading global cause of death, accounting for 17.3 million deaths per year, and rising. Native American populations more often than not do not have grocery stores on their reservation... and all of these trends are on the rise. The problem lies not only in low access to fresh produce, but food culture, low education on healthy eating as well as racial and income inequality.

Author

Madison J. Myers https://www.linkedin.com/in/madisonjmyers/

Code

https://github.com/IBM/visualize-food-insecurity

Demo

N/A

Video

https://www.youtube.com/watch?v=TRvABjKkcqE

Overview

The user will learn:

  • How to use DSX.
  • How to remove NaNs and 0s from a pandas dataframe.
  • How to visualize correlations and other findings using matplotlib, bokeh, seaborn and PixieDust.
  • How to download your pandas dataframe from DSX.
  • How to upload your data into Watson Analytics.
  • How to use Watson Analytics to generate visualizations and share them with others.

This journey was created for data scientists and data lovers who are interested in social justice issues and/or those who are new to DSX and Watson Analytics. This will guide the user through the power of visualizations, how to select them and how to share them.

Flow

  1. Open DSX and create a notebook.
  2. Download the data in DSX and explore it.
  3. Load Pixie Dust and use for visualizations.
  4. Download dataframe as a csv from DSX.
  5. Upload the csv to Watson Analytics and visualize.

Included Components

  • IBM Data Science Experience: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
  • IBM Watson Analytics: Provides smart data discovery, automated predictive analytics and cognitive capabilities that enables users to interact with data conversationally.
  • Jupyter Notebook: An open source web application that allows you to create and share documents that contain live code, equations, visualizations, and explanatory text.
  • PixieDust: Provides a Python helper library for IPython Notebook.
  • Watson Discovery: A cognitive search and content analytics engine for applications to identify patterns, trends, and actionable insights.

Featured technologies

  • Cloud: Accessing computer and information technology resources through the Internet.
  • Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.
  • Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.
  • pandas: A Python library providing high-performance, easy-to-use data structures.

Links

Blog

https://developer.ibm.com/code/?p=24583&preview=1&_ppp=7279454b7e