Skip to content

Latest commit

 

History

History
170 lines (102 loc) · 11.3 KB

README.md

File metadata and controls

170 lines (102 loc) · 11.3 KB

Visualizing Food Insecurity with Watson Studio and PixieDust

This Code Pattern will guide you through downloading, cleaning and visualizing data using different tools. In particular this Code Pattern showcases food insecurity in the US, along with its associated factors.

Often in data science we do a great deal of work to glean insights that have an impact on society or a subset of it and yet, often, we end up not communicating our findings or communicating them ineffectively to non data science audiences. That's where visualizations become the most powerful. By visualizing our insights and predictions, we, as data scientists and data lovers, can make a real impact and educate those around us that might not have had the same opportunity to work on a project of the same subject. By visualizing our findings and those insights that have the most power to do social good, we can bring awareness and maybe even change. This Code Pattern walks you through how to do just that, with IBM's Watson Studio, Pandas, PixieDust.

For this particular Code Pattern, food insecurity throughout the US is focused on. Low access, diet-related diseases, race, poverty, geography and other factors are considered by using open government data. For some context, this problem is a more and more relevant problem for the United States as obesity and diabetes rise and two out of three adult Americans are considered obese, one third of American minors are considered obese, nearly ten percent of Americans have diabetes and nearly fifty percent of the African American population have heart disease. Even more, cardiovascular disease is the leading global cause of death, accounting for 17.3 million deaths per year, and rising. Native American populations more often than not do not have grocery stores on their reservation... and all of these trends are on the rise. The problem lies not only in low access to fresh produce, but food culture, low education on healthy eating as well as racial and income inequality.

The government data that I use in this Code Pattern has been conveniently combined into a dataset for our use, which you can find in this repo under combined_data.csv.zip. You can find the original, government data from the US Bureau of Labor Statistics https://www.bls.gov/cex/ and The United States Department of Agriculture https://www.ers.usda.gov/data-products/food-environment-atlas/data-access-and-documentation-downloads/.

Notebooks

Flow

  1. Open Watson Studio and create a notebook.
  2. Download the data in Watson Studio and explore it.
  3. Load Pixie Dust and use for visualizations.
  4. Download dataframe as a csv from Watson Studio.

Included components

  • IBM Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
  • Jupyter Notebook: An open source web application that allows you to create and share documents that contain live code, equations, visualizations, and explanatory text.
  • PixieDust: Provides a Python helper library for IPython Notebook.

Featured technologies

  • Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.
  • Pandas: A Python library providing high-performance, easy-to-use data structures.

Watch the Video

Steps

Run using a Jupyter notebook in the IBM Watson Studio

  1. Create a new Watson Studio project
  2. Create the notebook
  3. Upload data
  4. Run the notebook
  5. Save and Share

1. Create a new Watson Studio project

  • Log into IBM's Watson Studio. Once in, you'll land on the dashboard.

  • Create a new project by clicking + New project and choosing Data Science:

    studio project

  • Enter a name for the project name and click Create.

  • NOTE: By creating a project in Watson Studio a free tier Object Storage service and Watson Machine Learning service will be created in your IBM Cloud account. Select the Free storage type to avoid fees.

    studio-new-project

  • Upon a successful project creation, you are taken to a dashboard view of your project. Take note of the Assets and Settings tabs, we'll be using them to associate our project with any external assets (datasets and notebooks) and any IBM cloud services.

    studio-project-dashboard

2. Create the Notebook

  • From the new project Overview panel, click + Add to project on the top right and choose the Notebook asset type.

studio-project-dashboard

3. Upload data

  • This project uses the dataset in combined_data.csv.zip. We need to load this asset to our project.

  • Extract the zip file with your favorite unzip tool.

  • From the new project Overview panel, click + Add to project on the top right and choose the Data asset type.

    add asset

  • A panel on the right of the screen will appear to assit you in uploading data. Follow the numbered steps in the image below.

    • Ensure you're on the Load tab. [1]
    • Click on the browse option. From your machine, browse to the location of the combined_data.csv file in this repository, and upload it. [not numbered]
    • Once uploaded, go to the Files tab. [2]
    • Ensure the combined_data.csv appears. [3]

    add data

4. Run the notebook

  • Click the (►) Run button to start stepping through the notebook.

  • Stop at the second cell Insert your data as a Pandas DataFrame.

  • Click on the 1001 data icon in the top right. The data files should show up.

  • Click on each and select Insert Pandas Data Frame. Once you do that, a whole bunch of code will show up in the highlighted cell.

  • Make sure your combined_data.csv is saved as df_data_1, so that it is consistent with my notebook and so you do not have to change the code.

5. Save and Share

How to save your work:

Under the File menu, there are several ways to save your notebook:

  • Save will simply save the current state of your notebook, without any version information.
  • Save Version will save your current state of your notebook with a version tag that contains a date and time stamp. Up to 10 versions of your notebook can be saved, each one retrievable by selecting the Revert To Version menu item.

How to share your work:

You can share your notebook by selecting the Share button located in the top right section of your notebook panel. The end result of this action will be a URL link that will display a “read-only” version of your notebook. You have several options to specify exactly what you want shared from your notebook:

  • Only text and output: will remove all code cells from the notebook view.
  • All content excluding sensitive code cells: will remove any code cells that contain a sensitive tag. For example, # @hidden_cell is used to protect your dashDB credentials from being shared.
  • All content, including code: displays the notebook as is.
  • A variety of download as options are also available in the menu.

Analyzing output

By reviewing our visualizations in Watson Studio, we learn that obesity and diabetes almost go hand in hand, along with food insecurity. We can also learn that this seems to be an inequality issue, both in income and race, with Black and Hispanic populations being more heavily impacted by food insecurity and diet-related diseases than those of the White and Asian populations. We can also see that school-aged children who qualify for reduced lunch are more likely obese than not whereas those that have a farm-to-school program are more unlikely to be obese.

Like many data science investigations, this analysis could have a big impact on policy and people's approach to food insecurity in the U.S. What's best is that we can create many projects much like this in a quick time period and share them with others by using Pandas, PixieDust as well as Watson's predictive and recommended visualizations.

Links

Learn more

  • Data Analytics Code Patterns: Enjoyed this Code Pattern? Check out our other Data Analytics Code Patterns
  • AI and Data Code Pattern Playlist: Bookmark our playlist with all of our Code Pattern videos
  • Watson Studio: Master the art of data science with IBM's Watson Studio

License

This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.

Apache Software License (ASL) FAQ