# Visualization for Big Data - Using Tableau

For this exercise, we will be using data contained in the "homework" database on the Big Data for Social Science Class Server. This notebook will walk you through accessing the class homework data using IPython Notebook and help you familiarize you with the available class data.

## Table of Contents

- [Database Tables](#Database-Tables)

- [Using Tableau to Make a Database Connection](#Using-Tableau-to-Make-a-Database-Connection)

    - [Connecting Tableau to a Database - Mac](#Connecting-Tableau-to-Database---Mac)
    - [Connecting Tableau to a Database - Windows](#Connecting-Tableau-to-Database---Windows)

- [Tableau Exercises](#Tableau-Exercises)

    - [Selecting Data Tables](#Selecting-Data-Tables)
    - [Notes About Selecting Data](#Additional-Notes-about-Selecting-Data)
    - [Exercise 1: Create a Simple Bar Chart](#Exercise-1:-Create-a-Simple-Bar-Chart)
    - [Exercise 2: Create a Simple Timeline Graph](#Exercise-2:-Create-a-Simple-Timeline-Graph)
    - [Exercise 3: Create a Heat Map](#Exercise-3:-Create-a-Heat-Map)
    - [Exercise 4: Create Agency and Sub-Agency Breakdowns](#Exercise-4:-Create-Agency-and-Sub-Agency-Breakdowns)
    - [Exercise 5: Create a Timeline of Total Payments by Agency](#Exercise-5:-Create-a-Timeline-of-Total-Payments-by-Agency)
    - [Exercise 6: Create a Dashboard](#Exercise-6:-Create-a-Dashboard)
    - [Combining Tables in Tableau - JOINs](#Combining-Tables-in-Tableau---JOINs)

- [Troubleshooting](#Troubleshooting)
- [Resources for Tableau](#Resources-for-Tableau)
- [Resources for Keshif](#Resources-for-Keshif)

## Database Tables

- Back to the [Table of Contents](#Table-of-Contents)

For these exercises we will continue to use tables in the "homework" database. These tables were created using data from the broader starmetrics, umetricsgrants and usptopatents databases. Each of these databases contain different types of information and are available for your use during this class. 

You also have a personal database where you can create and modify tables as you wish, to support your work. Your databases have the same name as your username. 

For a refresher on the description of the "homework" tables, see the Database Assignment Data Dictionary Word document in moodle [http://jpsmonline.umd.edu/mod/resource/view.php?id=2436](http://jpsmonline.umd.edu/mod/resource/view.php?id=2436)

For a refresher on the description of the starmetrics, umetricsgrants and usptopatents databases, see [http://jpsmonline.umd.edu/mod/resource/view.php?id=2458](http://jpsmonline.umd.edu/mod/resource/view.php?id=2458)

## Connecting Tableau to a Database

- Back to the [Table of Contents](#Table-of-Contents)

For this assignment, you will need to use the Tableau Software (click here to download: http://www.tableau.com/). If you are using a Windows computer, you will also need to download putty.exe (click here to download: http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe). 

Unfortunately, Tableau does not yet have internal SSH capability and can be a little complicated to set up. Explicit directions on setting up a database connection using Tableau can be found in moodle.

Once you have connected Tableau to the database and saved the workbook, you won't need to do the Initial setup steps again, but each time you want to work with the workbook, you'll still need to first open the SSH Tunnel before opening Tableau, and then when you open your workbook in Tableau, it will just prompt you for your database password and reconnect you automatically.

### Connecting Tableau to Database - Mac

- Back to the [Table of Contents](#Table-of-Contents)

#### Initial Setup

##### Initial Setup - Install MySQL Connector/ODBC

- Go to the mac/mysql section of the Tableau "Download Drivers" page: [Tableau "Download Drivers" page](https://www.tableau.com/en-us/support/drivers?edition=pro&lang=en-us&platform=mac&cpu=64&version=0.0&__full-version=mariner.0.0000.0000#mysql)
- Download `TableauDrivers.dmg`: [http://downloads.tableausoftware.com/drivers/tableau/8.2/TableauDrivers.dmg](http://downloads.tableausoftware.com/drivers/tableau/8.2/TableauDrivers.dmg)
- Double-click the downloaded `TableauDrivers.dmg` file.
- In the Finder window that opens, double-click `MySQL Connector ODBC 5.3.pkg` to run the installer.

    - If you get unidentified developer error message and file won't run:

        - Control-click on `MySQL Connector ODBC 5.3.pkg`
        - choose "Open" from the resulting context menu.
        - then in the dialogue box that pops up, choose "Open".

- Accept the defaults.

#### Step 1) Set up SSH tunnel

To create an SSH tunnel from your computer to MySQL on the class server, open a terminal window (/Applications/Utilities/Terminal) and enter the following information:

- `ssh -L 127.0.0.1:3306:127.0.0.1:3306 <username>@bigdataforsocialscience.com -N`
- enter jupyter password
- If all is working correctly, after entering your password and pressing return, the cursor will move to the next line and just sit there.  That is OK.  The SSH tunnel is sitting patiently there in that window, listening.
- Minimize terminal window (do not close)

#### Step 2) Open Tableau and Connect to the Database

- Start Tableau.
- For a new workbook:

    - In the "Connect" panel on the left side of Tableau, scroll down to "To a server".
    - Click on "MySQL" (the second option after "Tableau Server").
    - Enter the following into the pop-up box:

        - Server: `127.0.0.1`
        - Port: `3306`
        - Username: "enter your mysql username" (should be the same as jupyter username)
        - Password: "enter your mysql password" (should be the same as jupyter password)
        - Don't check the checkbox next to "Require SSL".

- For an existing workbook, just open the workbook.  Tableau will use the database configuration information stored in the file to reconnect to the database, prompt you for your database password.

If you need additional instructions or troubleshooting, additional instructions and step-by-step pictorial instructions are available in moodle. 

### Connecting Tableau to Database - Windows

- Back to the [Table of Contents](#Table-of-Contents)

**_If you have MySQL installed on your local machine, before opening putty, do the following:_**

- Click on the "Windows" button on the bottom left side of your screen and type "services.msc" in the program/file search box
- Click Enter to open the Services box
- Right click on Mysql and select "Stop"
- Close Services

The first time you connect Tableau to the class database, you'll need to create an SSH tunnel from your computer to MySQL on the class server using putty.exe.

#### Initial Setup

##### Initial `putty` configuration - Set up SSH tunnel

To create an SSH tunnel from your computer to MySQL on the class server, open putty.exe and configure putty for port-forwarding:

- run `putty.exe`.
- in the navigation pane on the left:
    
    - Click on the [+] next to "Connection" (if not already expanded).
    - Click on the [+] next to "SSH" (if not already expanded).
    - Click on "Tunnels".

- In the screen that opened on the right when you clicked "Tunnels", under "Add new forwarded port:", at the bottom of that window, enter the following information:

    - Source port: `3306`
    - Destination: `127.0.0.1:3306`
    - Leave "Local" and "Auto" radio buttons selected.

- Click "Add".

<img src="http://data.jrn.cas.msu.edu/images/tableau/tableau-win-putty_ssh_tunnel_config.png" />

##### Initial `putty` configuration - Create SSH connection and save your configuration

To then configure putty to log in to SSH on the class server:

- Click Session in the navigation pane on the left.
- Under "Host Name", enter either of the following:

    - `54.85.16.34`
    - `bigdataforsocialscience.com`

- Under "Saved Sessions", enter:

    - Big Data Class

- Click "Save".

<img src="http://data.jrn.cas.msu.edu/images/tableau/tableau-win-putty_server_config.png" />

##### Initial setup - Install Microsoft ODBC Driver Manager and MySQL Connector/ODBC

- Go to the MySQL Connector/ODBC download page: [https://dev.mysql.com/downloads/connector/odbc/](https://dev.mysql.com/downloads/connector/odbc/)
- Choose your operating system (probably "Windows (x86, 64-bit), MSI Installer"), and click download.
- you'll be prompted to log in with an Oracle account.  Just click "No thanks, just start my download" at the bottom of the page.
- once your file has downloaded, install it.

#### Step 1) Open `putty`, load your saved "Big Data Class" configuration, and connect to the server

- Run `putty.exe`
- in the "Saved Sessions" list, click on "Big Data Class".
- Click "Load" to load your SSH tunnel and server configuration.
- Click "Open"

** If you are presented with a pop-up (The server's host key...), click "Yes". **

#### Step 2) Login through `putty` using Big Data Class Server Credentials

- Enter jupyter username, then jupyter password

#### Step 3) Minimize putty terminal window (do not close)

#### Step 4) Open Tableau and Connect to the Database

- Start Tableau.
- For a new workbook:

    - In the "Connect" panel on the left side of Tableau, scroll down to "To a server".
    - Click on "MySQL" (the second option after "Tableau Server").
    - Enter the following into the pop-up box:

        - Server: `127.0.0.1`
        - Port: `3306`
        - Username: "enter your mysql username" (should be the same as jupyter username)
        - Password: "enter your mysql password" (should be the same as jupyter password)
        - Don't check the checkbox next to "Require SSL".

- For an existing workbook, just open the workbook.  Tableau will use the database configuration information stored in the file to reconnect to the database, prompt you for your database password.

If you need additional instructions or troubleshooting, additional instructions and step-by-step pictorial instructions are available in moodle. 

## Tableau Exercises

- Back to the [Table of Contents](#Table-of-Contents)

Tableau is a data analysis visualization software that is easy to learn and very easy to use. The software allows you to connect your data and perform queries without writing a single line of code. It allows you to shift between views with drag and drop technology to build anything from single visualizations to a interactive dashboard.

Now that you've successfully connected Tableau to the Big Data for Social Science server, you can easily create visualizations to display, navigate and understand the data available to you. These are a few short exercises that will teach you the basics of navigating the Tableau software so you can create your own visuals. 


### Selecting Data Tables

- Back to the [Table of Contents](#Table-of-Contents)

Begin by selecting a table. Under the tab "Database" click on the dropdown menu and select "homework". Two tables will appear: OSU_grant and OSU_vendor (these are the same tables from the Database Basics notebook). 

Next, drag OSU_vendor to the section on the screen that says "Drag tables here" then choose "Automatically Update". If successful, you should see the data at the bottom of the screen. 

Now you are ready to create some visualizations!!

### Additional Notes about Selecting Data

- Back to the [Table of Contents](#Table-of-Contents)

Here are some additional things to consider about our data before you jump into visualization:

1. When combining tables in Tableau, the order datasets are added to Tableau dictates how you JOIN any two tables.  The type of JOIN you use tells Tableau how to combine the JOINed data sets, and if you choose the wrong JOIN type, you'll end up with bad data (usually either incomplete or with massive duplication).  A way to figure out which type of JOIN to use for simple data sets: decide which table you want to be the master table, from which all rows should be represented in your data set, then set Tableau to join on the side of that table.  For example, if you are joining OSU_grant and OSU_vendor, there are multiple vendors per grant, but only one grant per vendor.  In your data set, if you want to include all vendor expenditures, and then make sure related grant information is associated with each expenditure record, OSU_vendor is your master table, and so you'd join based on where OSU_vendor is in relation to OSU_grant in the Tableau "Drag Tables here" window - "LEFT JOIN" if you OSU_vendor is on the left side of OSU_grant, or "RIGHT JOIN" if OSU_vendor is on the right side.  You also need to set the appropriate JOIN columns, as well.

    - For more details on JOINing OSU_vendor and OSU_grant, see below: [Combining Tables in Tableau - JOINs](#Combining-Tables-in-Tableau---JOINs)
    - For more information on database table JOINs in general, see: [http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins](http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins)<br />

2. Before beginning to work with a dataset, check to make sure that the variable types are consistent to the types of visualizations you want to show. For example, all dates should be a "Date" type and all numeric variables should be a numeric type. If you forget, you can always change the variable types while building your visualizations. 

3. In the OSU datasets, the "Fipscode" variable is automatically reported as a string variable. For these exercises, you will need to give them a "Geographic Role" of county type. To do this, click "Abc" below the variable header "Fipscode". Select "Geographic Role" and then select "County".

### Layout of a worksheet

A Tableau worksheet is broken out into a few key areas:

- On the left is a vertical pane that contains "Data" and "Analytics" tabs.  To start, we'll refer to this as the Data pane.  The Data pane has two sections:

    - **Dimensions** - variables from your data set that contain non-numeric data such as strings, dates, or geographic identifiers.
    - **Measures** - variables from your data set that are numeric.
    
- To the right of the Data Pane is the visualization area.  The visualization area is broken out into:

    - a column of small windows called "Cards" that stack up on the left side of the visualization area, just to the right of the Data pane.
    - "Columns" and "Rows" "Shelves" that span across the top of the area, and where you can drag and drop dimensions or measures to add them as rows or columns to the current visualization.
    - the visualization area itself. which takes up the rest of the space to the right of the "cards" and under the "shelves".
    
- In the upper right corner of the screen by default, there is also a "Show Me" tab which expands to a window when clicked.  This window contains a set of available styles of visualizations that can be generated by Tableau, with those that can be applied to currently selected data highlighted and those that can not grayed out.

### Exercise 1: Create a Simple Bar Chart

- Back to the [Table of Contents](#Table-of-Contents)

To create a simple bar chart to describe the total payment amount of OSU vendors by federal agency: 

- Click on "Sheet 1" (bottom left side of the screen) to start a worksheet

Once you are at a new worksheet, you can do the following: 
- From the Measures section (bottom left side of the screen), drag "Paymentamount" measure to the "Rows Shelf"
- From the Dimensions section (top left side of the screen), drag "Agency Abbrev" to the "Columns Shelf"

Note that in the data area on the left side of the tableau screen, you can multi-select a set of items in the Dimensions and Measures areas that you want to include in analysis (by clicking each you want included in your analysis while holding either the "Control" key on Windows or the "Command" key on Mac), then you can go to the "Show Me" menu (top right side of the screen) to see the different graphs that are allowed with the selected variables (those not allowed will be grayed out). 

### Exercise 2: Create a Simple Timeline Graph

- Back to the [Table of Contents](#Table-of-Contents)

To create a simple timeline graph of total payment amounts of OSU vendors over time: 

- Create a new worksheet by clicking on the "New Worksheet" icon (bottom left side of the screen)

Once you are at a new worksheet, you can do the following: 
- From the Measures section, drag "Paymentamount" measures to the "Rows Shelf"
- From the Dimensions section, drag "Periodstartdate" to the "Columns Shelf"
- If you click on the [+] on Year(Periodstartdate) Rows Shelf, you can add quarterly information to the total paymentamount

In addition, you can click on the "Show Me" menu to see other types of charts that can be created to visualize the selected variables.


### Exercise 3: Create a Heat Map

- Back to the [Table of Contents](#Table-of-Contents)

To create a simple heat map of total distribution of payment amounts of OSU vendors by geographic location: 

- Create a new worksheet
- In the Dimensions section on the left side of the screen, click on "Fipscode" and then click "SHOW ME" to see the list of allowed charts.
- From Show Me, choose the second map - the middle icon in the second row in "Show me".
- Drag the map inside the visualization area to adjust the portion that is visible on the screen.
- From the Measures section on the left, drag "Paymentamount" to the "Color" button on the "Marks" card, just to the right of the column that contains "Dimensions" and "Measures", below "Pages" and "Filters".

This gives you a good visualization of the distribution of different amounts of money distributed by OSU, broken out by the county in which the money was distributed. The `SUM(Paymentamount)` card that appeared when you dragged "Paymentamount" to the "Color" button shows you what the range of colors represent. 


### Exercise 4: Create Agency and Sub-Agency Breakdowns

- Back to the [Table of Contents](#Table-of-Contents)

To create a visualization of the money spent per agancy, broken down within agencies into sub-agencies: 

- Create a new worksheet
- From Dimensions, drag "Agency Abbrev" into the "Text" button in the "Marks" card (the same card where the "Color" button was located in Exercise 3).
- From Dimensions, drag "Sub Agency Text" into "Text" button in the "Marks" card.
- From Measures, drag "Paymentamount" into the "Size" button in the "Marks" card.
- From Dimensions, drag "Agency Abbrev" into the "Color" button in the "Marks" card.


### Exercise 5: Create a Timeline of Total Payments by Agency

- Back to the [Table of Contents](#Table-of-Contents)

To create a timeline of total OSU payments by agency (one line in the chart per agency): 

- Create a new worksheet
- Drag "Paymentamount" from Measures to the Rows Shelf
- Drag "Periodstartdate" from Dimensions to the Columns Shelf
- Change the granularity of the timeline from annual to monthly (from "YEAR(Periodstartdate)" to "MONTH(Periodstartdate)") by either right-clicking on the cell for "YEAR(Periodstartdate)" in the Rows Shelf or clicking on the little triangle on the right side of that cell, then selecting "Month -- May 2015" rather than "Year -- 2015".

    - Make sure to select the "Month" whose description on the right includes the month and the year (in the second set of Date identifiers), rather than the "Month" that just includes the month name ("Month -- May", for example).  "Month -- May 2015" will break out the X axis by month and year - this is your basic timeline.  "Month -- May" will just lump all values of month together, regardless of year, giving you a visualization of spending per month, independent of year.

- Drag "Agency Text" onto the "Filters" card and select everything except "NULL". Click to apply and click "OK"
- Drag "Agency Abbrev" onto the "Color" button in the "Marks" card and choose "Add All Members" 
- Drag "Agency Abbrev" onto the "Label" button in the "Marks" card.
- Rename sheet to "Timeline, Total Payment ($), Per Agency"

To annotate lines in the resulting chart, right click one of the lines in the chart and choose "Annotate". 

### Exercise 6: Create a Dashboard

- Back to the [Table of Contents](#Table-of-Contents)

To create a dashboard of visualizations:  

- From the Tableau toolbar, click on Dashboard and create "New Dashboard"
- In a Dashboard, the pane on the left contains a new set of windows, the top one of which, labeled "Dashboard", contains a list of all of the worksheets in your workbook.
- From this "Dashboard" area in the left pane, add worksheets to your dashboard by either dragging them from the pane on the left to the visualization area on the right (which initially contains "Drop Sheets Here"), or by double-clicking on a given worksheet in the pane on the left.

### Combining Tables in Tableau - JOINs

- Back to the [Table of Contents](#Table-of-Contents)

A quick example on how to JOIN two tables together in Tableau: 

- To start, go to the "Data Source" section of your Tableau file (where you begin when you create a new file) by clicking on the "Data Source" button in the lower left corner of your Tableau window.
- Begin by selecting and adding the OSU_vendor table to the data area. (If you've gone through all the exercises above, OSU_vendor is already selected and added to the data area.)
- Drag the OSU_grant table to the data area to add it, as well.
- To choose the way the data from the two tables is "JOIN"ed, click on the Venn diagram (the two overlapping circles between and connected to the two datasets) between the two tables.
- In the pop-up window that results, JOIN on the side of OSU_vendor (should be "Left").
- Then, choose the column on which the two tables will be JOINed.  Beneath the options for JOIN types ("Inner", "Left", "Right", "Outer"), there will be a table in which join clauses are specified.  In the first row:

    - On the left, in the "Data Source" column, select "Uniqueawardnumber" from the dropdown ("Agency" will likely be selected by default, so you'll be changing that).
    - Leave the column in the middle set to "=" (you can change the operator here to test if a row in one table should be associated with another, which is advanced).
    - In the right column, labeled "OSU_vendor", click on the dropdown menu and choose "Uniqueawardnumber (OSU_vendor)".
    
- Once you've specified the JOIN criterai, a complete dataset will appear below where each row from OSU_vendor contains the information on its associated grant from OSU_grant.  Grant information is now appended to the end of any row where the OSU_vendor record's "Uniqueawardnumber" value matched a given grant's "Uniqueawardnumber".  This allows you to include information on grants in your visualization.
- Go play with the JOINed data!

## Troubleshooting

- Back to the [Table of Contents](#Table-of-Contents)

Troubleshooting notes:

- If you connet to the database, load some tables, then work with that data for a while, chances are your database connection will have timed out after about 20 minutes.  If you try to subsequently go back and work with "Data Source"s again, Tableau will likely freeze up.  If this happens, you can either:

    - wait for it to figure out that the connection has timed out and reconnect, which does tend to happen, but which can take a few minutes.
    - or you can force close Tableau and connect again.
    
    Either way, you shouldn't have to do anything to the SSH tunnel - the problem is likely the database connection, not the SSH tunnel.
    
- If Tableau doesn't come back after a few minutes of waiting or if restarting tableau doesn't fix the problem, however, it might be a problem with the SSH Tunnel.  If you think the SSH tunnel might be part of the problem, then:

    - stop Tableau
    - stop the SSH Tunnel
    - restart the SSH Tunnel
    - restart Tableau
    - try your database access again.

## Resources for Tableau

- Back to the [Table of Contents](#Table-of-Contents)

Below you will find a 5-minute video that describes how to create visualizations with Tableau using a very simple, but affective, approach. In addition, the handout follows the progression of the vidoes, but is heavily annotated. 

Video :: https://www.youtube.com/watch?v=-4uNv6wuGQ8 <br>
Handout :: https://docs.google.com/presentation/d/1bPn44W15Jq3csc87vld0FWXZpu4cnoqe1Qqob57KvTQ/edit#slide=id.p

## Resources for Keshif

- Back to the [Table of Contents](#Table-of-Contents)

Below you will find similar resources for a dashboard visualization program called Keshif.

Video :: https://www.youtube.com/watch?v=3Hmvms-1grU <br>
Handout :: https://docs.google.com/presentation/d/1beCw3KiFjWLdVfgp8EICFPNPiuu2UzX8PFbcirJFQVw/edit#slide=id.gc5246df19_0_81
