
# Using PixieDust for Fast, Flexible, and Easier Data Analysis and Experimentation  

> Interactive notebooks are powerful tools for fast and flexible experimentation and data analysis. Notebooks can contain live code, static text, equations and visualizations. In this lab, you use Watson Studio to create a notebook to explore and visualize data to gain insight. We use PixieDust, an open source Python notebook helper library, to visualize the data in different ways (for example, charts, maps, etc.) with one simple call.  

![pixiedust](https://developer.ibm.com/clouddataservices/wp-content/uploads/sites/85/2017/03/pixiedust200.png)

You can access the complete tutorial with step by step instructions <a href="http://ibm.biz/pixiedustlab" target="_blank" rel="noopener no referrer">here.</a>  

This notebook runs on Python 3.6 with Spark.

## Table of contents
1. [Import PixieDust](#install)
2. [Load the data](#loaddata)
3. [View and visualize the data](#viewdata)
4. [Map the data](#mapdata)
5. [Summary](#summary)


## 1. Import PixieDust<a class="anchor" id="install"></a>
Before you can use the PixieDust library it must be imported into the notebook.

In [None]:
import pixiedust

## 2. Load the data<a class="anchor" id="loaddata"></a>
With PixieDust, you can easily load CSV data from a URL into a PySpark DataFrame in the notebook. 
In this example, we load a data set with information about restaurant inspections.

In [2]:
inspections = pixiedust.sampleData("https://opendata.lasvegasnevada.gov/resource/86jg-3buh.csv")

Downloading 'https://opendata.lasvegasnevada.gov/resource/86jg-3buh.csv' from https://opendata.lasvegasnevada.gov/resource/86jg-3buh.csv
Downloaded 363679 bytes
Creating pySpark DataFrame for 'https://opendata.lasvegasnevada.gov/resource/86jg-3buh.csv'. Please wait...
Loading file using 'SparkSession'
Successfully created pySpark DataFrame for 'https://opendata.lasvegasnevada.gov/resource/86jg-3buh.csv'


## 3. View and visualize the data<a class="anchor" id="view"></a>
Use PixieDust's **`display`** API to easily view and visualize the data.

3.1 [Filter the data set](#filter)<br/>
3.1 [Visualize the number of restaurants by category](#category)<br/>
3.2 [Visualize average number of inspection demerits per category clustered by the inspection grade](#inspectiongrade)<br/>
3.3 [Visualize current demerits vs inspection demerits](#demerits)

In [3]:
display(inspections)

serial_number,permit_number,restaurant_name,location_name,category_name,address,city,state,zip,current_demerits,current_grade,date_current,inspection_date,inspection_time,employee_id,inspection_type,inspection_demerits,inspection_grade,permit_status,inspection_result,violations,record_updated,location_1,location_1_address,location_1_city,location_1_state,location_1_zip,:@computed_region_tnyv_z3b7
DA1800458,PR0113745,TOKYO CAFE,TOKYO CAFE,Restaurant,2595 E CRAIG Rd B,North Las Vegas,Nevada,89030,9,A,2017-08-07 00:00:00,2017-08-07 00:00:00,2017-08-07 12:30:00,EE7000390,Routine Inspection,9,A,A,'A' Grade,212213216291029282930,2017-08-08 11:50:22,POINT (115.1147678 36.2395558),,,,,
DAB9TPYTX,PR0024015,ARIA CC LEVEL 3 BEVERAGE P10,ARIA HOTEL & CASINO,Pantry,3730 S Las Vegas Blvd,Las Vegas,Nevada,89109,0,A,2017-08-07 00:00:00,2017-08-07 00:00:00,2017-08-07 12:05:00,EE7001186,Routine Inspection,0,A,A,'A' Grade,2930,2017-08-07 14:02:03,POINT (115.1765836 36.1073485),,,,,
DA2EN30ER,PR0009660,Wild Wild West L00SE Trax Bar,WILD WILD WEST CASINO,Bar / Tavern,3330 W Tropicana Ave,Las Vegas,Nevada,89103,6,A,2018-02-16 00:00:00,2017-08-08 00:00:00,2017-08-08 09:30:00,EE7000594,Routine Inspection,0,A,A,'A' Grade,2930,2017-08-08 10:55:13,POINT (115.1849824 36.1016875),,,,,
DANJTINLD,PR0118599,SUGARCANE - OYSTER BAR,SUGARCANE RAW BAR AND GRILL @ GCS,Restaurant,3377 S LAS VEGAS,Las Vegas,Nevada,89109,0,A,2017-08-08 00:00:00,2017-08-08 00:00:00,2017-08-08 15:15:00,EE7001184,Routine Inspection,0,A,A,'A' Grade,2930,2017-08-08 15:09:19,POINT (115.1696529 36.1218691),,,,,
DAOGJPXI9,PR0118598,SUGARCANE - WAREWASH & STORAGE,SUGARCANE RAW BAR AND GRILL @ GCS,Special Kitchen,3377 S LAS VEGAS,Las Vegas,Nevada,89109,0,A,2017-08-08 00:00:00,2017-08-08 00:00:00,2017-08-08 14:15:00,EE7001176,Routine Inspection,0,A,A,'A' Grade,2911,2017-08-08 14:55:47,POINT (115.1696529 36.1218691),,,,,
DAUL284FZ,PR0120871,"GREAT AMERICAN PUB, THE","GREAT AMERICAN PUB, THE",Restaurant,9310 S EASTERN,Henderson,Nevada,89123,8,A,2017-08-03 00:00:00,2017-08-03 00:00:00,2017-08-03 13:15:00,EE7001118,Routine Inspection,8,A,A,'A' Grade,20921729122930,2017-08-03 14:33:29,POINT (115.117206 36.0195687),,,,,
DAUQAT4GE,PR0009136,Little Caesars #3374,Little Caesars,Restaurant,7785 N Durango Dr 115,Las Vegas,Nevada,89143,8,A,2017-08-07 00:00:00,2017-03-23 00:00:00,2017-03-23 14:00:00,EE7000327,Re-inspection,0,A,A,'A' Grade,,2017-03-23 14:20:43,POINT (115.2780793 36.173684),,,,,
DA1800641,PR0022364,Lindo Michoacan La Loma,Lindo Michoacan La Loma,Restaurant,645 Carnegie St,Henderson,Nevada,89052-5850,0,A,2017-12-05 00:00:00,2017-07-31 00:00:00,2017-07-31 13:00:00,EE7001275,Routine Inspection,14,B,B,'B' Downgrade,20821521629302955,2017-08-09 11:13:22,POINT (115.095671 36.004504),,,,,
DA00TYWRD,PR0022774,HILTON LLV FISH PREP,HILTON LAKE LAS VEGAS RESORT & SPA,Meat/Poultry/Seafood,1610 Lake Las Vegas Pkwy,Henderson,Nevada,89011-2802,0,A,2019-05-10 00:00:00,2019-05-10 00:00:00,2019-05-10 14:15:00,EE7001361,Routine Inspection,0,A,,'A' Grade,2928,2019-05-11 17:48:12,POINT (114.931908 36.101953),,,,,
DADHMV441,PR0005987,HARD ROCK ROOM SVC WAREWASH,HARD ROCK HOTEL & CASINO,Special Kitchen,4455 S Paradise Rd,Las Vegas,Nevada,89169-6574,0,A,2017-08-08 00:00:00,2017-08-08 00:00:00,2017-08-08 11:35:00,EE7001208,Routine Inspection,0,A,A,'A' Grade,2930,2017-08-08 15:12:42,POINT (115.1538714 36.1100828),,,,,


### 3.1 Filter the data set<a class="anchor" id="filter"></a>
Filter the data set to create a subset of only the Las Vegas restaurants.

In [4]:
inspections.registerTempTable("restaurants")
lasDF = sqlContext.sql("SELECT * FROM restaurants WHERE city='Las Vegas'")
lasDF.count()

811

<br/>  
### 3.2 Visualize the number of restaurants by category<a class="anchor" id="category"></a>  

Now display the number of restaurants for each category:

1. Click the **Chart** dropdown menu, then choose **Bar Chart**.
2. From the **Chart Options** dialog:
	1. Drag the **`category_name`** field and drop it into the **Keys** area.
	2. Drag the **`count`** field and drop it into the **Values** area.
	3. Set the **# of Rows to Display** to 1000.
	4. Click **OK**.
3. Click the **Renderer** dropdown menu, then choose **matplotlib**.
4. Toggle the **Show Legend** Bar Chart Option to show or hide the legend.


In [None]:
#import bokeh

In [5]:
# Number of restaurants by categories

bycat = lasDF.groupBy("category_name").count()
display(bycat)

<br/>  

### 3.3 Visualize the average number of inspection demerits per category clustered by the inspection grade<a class="anchor" id="inspectiongrade"></a>  

1. Click the Chart dropdown menu and choose **Bar Chart**.
2. From the **Chart Options** dialog:
	1. Drag the **`category_name`** field and drop it into the **Keys** area.
	2. Drag the **`inspection_demerits`** field and drop it into the **Values** area.
	3. Set the **Aggregation** to AVG.
	4. Set the **# of Rows to Display** to 1000. 
	5. Click **OK**.
3. Click the **Renderer** dropdown menu and choose **matplotlib**.
4. Click the **Cluster By** dropdown menu and choose **inspection_grade**.
5. Click the **Type** dropdown menu and choose the desired bar type, for example, **stacked**.

In [6]:
display(lasDF)

### 3.4 Visualize current demerits vs inspection demerits <a class="anchor" id="demerits"></a>

1. From the **Chart Options** dialog:
	1. Set the **Keys** to **`inspection_demerits`**.
	2. Set the **Values** to **`current_demerits`**.
	3. Set the **# of Rows to Display** to 1000.
	4. Click **OK**.
2. Click the Chart dropdown menu and choose **Scatter Plot**.
3. Select **bokeh** from the **Renderer** dropdown menu.
4. Select **inspection_grade** from the **Color** dropdown menu.

In [7]:
display(lasDF)

## 4. Map the data<a class="anchor" id="mapdata"></a>  

Now visualize the restaurant inspection data together with the restaurant location on a map.

Currently, PixieDust has two map renderers, Google and MapBox and a token is required for the map renderers for them to display correctly. For this section of the tutorial, you use the **MapBox** renderer and you need to create a <a href="https://www.mapbox.com/help/create-api-access-token/" target="_blank" rel="noopener no referrer">MapBox API Access Token</a> if you choose to continue.

4.1 [Create longitude and latitude fields](#longlat)<br/>
4.1 [Display the map](#viewmap)

### 4.1 Create longitude and latitude fields<a class="anchor" id="longlat"></a> 

The current data includes the longitude and latitude in the **`location_1`** field as a string like: `POINT (-114.923505 36.114434)`

However, the current map renderers in PixieDust expect the longitude and latitude as separate number fields, so the first thing you need to do is parse the **`location_1`** field into separate longitude and latitude number fields.

In [8]:
# Parse the location_1 field into separate longitude and latitude number fields

from pyspark.sql.functions import udf
from pyspark.sql.types import *

def valueToLon(value):
    lon = float(value.split('POINT (')[1].strip(')').split(' ')[0])
    return None if lon == 0 else lon if lon < 0 else (lon * -1)

def valueToLat(value):
    lat = float(value.split('POINT (')[1].strip(')').split(' ')[1])
    return None if lat == 0 else lat

udfValueToLon = udf(valueToLon, DoubleType())
udfValueToLat = udf(valueToLat, DoubleType())

lonDF = lasDF.withColumn("lon", udfValueToLon("location_1"))
lonlatDF = lonDF.withColumn("lat", udfValueToLat("location_1"))

lonlatDF.printSchema()

root
 |-- serial_number: string (nullable = true)
 |-- permit_number: string (nullable = true)
 |-- restaurant_name: string (nullable = true)
 |-- location_name: string (nullable = true)
 |-- category_name: string (nullable = true)
 |-- address: string (nullable = true)
 |-- city: string (nullable = true)
 |-- state: string (nullable = true)
 |-- zip: string (nullable = true)
 |-- current_demerits: integer (nullable = true)
 |-- current_grade: string (nullable = true)
 |-- date_current: timestamp (nullable = true)
 |-- inspection_date: timestamp (nullable = true)
 |-- inspection_time: timestamp (nullable = true)
 |-- employee_id: string (nullable = true)
 |-- inspection_type: string (nullable = true)
 |-- inspection_demerits: integer (nullable = true)
 |-- inspection_grade: string (nullable = true)
 |-- permit_status: string (nullable = true)
 |-- inspection_result: string (nullable = true)
 |-- violations: string (nullable = true)
 |-- record_updated: timestamp (nullable = true)
 |-- 

<br/>  

### 4.2 Display the map<a class="anchor" id="viewmapdata"></a>  

Now you have separate **`longitude`** and **`latitude`** fields, perform the the following steps to display the data on a map:

1. Click the **Chart** dropdown menu, then choose **Map**.
2. From the **Chart Options** dialog:
	1. Drag the **`lon`** field and the **`lat`** field and drop it into the **Keys** area.
	2. Drag the **`current_demerits`** field and drop it into the **Keys** area.
	3. Set the **# of Rows to Display** to 1000. 
	4. Enter your access token from MapBox into the **MapBox Access Token** field.
	5. Click **OK**.
3. Click the **Style** dropdown menu and choose **choropleth**.


In [9]:
display(lonlatDF)

## 5. Summary<a class="anchor" id="summary"></a>
Using a data set about restaurant inspections, this notebook has introduced you to how you can use a simple PixieDust call to easily  visualize the data in a variety of different ways to provide you with useful insights.

### Author

**Va Barbosa** is an IBM development advocate.

<hr>
Copyright © IBM Corp. 2017-2019. This notebook and its source code are released under the terms of the MIT License.

<div style="background:#F5F7FA; height:110px; padding: 2em; font-size:14px;">
<span style="font-size:18px;color:#152935;">Love this notebook? </span>
<span style="font-size:15px;color:#152935;float:right;margin-right:40px;">Don't have an account yet?</span><br>
<span style="color:#5A6872;">Share it with your colleagues and help them discover the power of Watson Studio!</span>
<span style="border: 1px solid #3d70b2;padding:8px;float:right;margin-right:40px; color:#3d70b2;"><a href="https://ibm.co/wsnotebooks" target="_blank" style="color: #3d70b2;text-decoration: none;">Sign Up</a></span><br>
</div>