# Visualization for Big Data - Connecting to Tableau

For this exercise, we will be using data contained in the "homework" database on the Big Data for Social Science Class Server. This notebook will walk you through accessing the class homework data using IPython Notebook and help you familiarize you with the available class data.

## Table of Contents

- [Database Tables](#Database-Tables)

- [Using Tableau to Make a Database Connection](#Using-Tableau-to-Make-a-Database-Connection)

    - [Connecting Tableau to a Database - Mac](#Connecting-Tableau-to-Database---Mac)
    - [Connecting Tableau to a Database - Windows](#Connecting-Tableau-to-Database---Windows)

- [Tableau Exercises](#Tableau-Exercises)

    - [Selecting Data Tables](#Selecting-Data-Tables)
    - [Notes About Selecting Data](#Additional-Notes-about-Selecting-Data)
    - [Exercise 1: Create a Simple Bar Chart](#Exercise-1:-Create-a-Simple-Bar-Chart)
    - [Exercise 2: Create a Simple Timeline Graph](#Exercise-2:-Create-a-Simple-Timeline-Graph)
    - [Exercise 3: Create a Heat Map](#Exercise-3:-Create-a-Heat-Map)
    - [Exercise 4: Create Agency and Sub-Agency Breakdowns](#Exercise-4:-Create-Agency-and-Sub-Agency-Breakdowns)
    - [Exercise 5: Create a Timeline of Total Payments by Agency](#Exercise-5:-Create-a-Timeline-of-Total-Payments-by-Agency)
    - [Exercise 6: Create a Dashboard](#Exercise-6:-Create-a-Dashboard)

- [Resources for Tableau](#Resources-for-Tableau)

## Database Tables

- Back to the [Table of Contents](#Table-of-Contents)

For these exercises we will continue to use tables in the "homework" database. These tables were created using data from the broader starmetrics, umetricsgrants and usptopatents databases. Each of these databases contain different types of information and are available for your use during this class. 

You also have a personal database where you can create and modify tables as you wish, to support your work. Your databases have the same name as your username. 

For a refresher on the description of the "homework" tables, see the Database Assignment Data Dictionary Word document in moodle [http://jpsmonline.umd.edu/mod/resource/view.php?id=2436](http://jpsmonline.umd.edu/mod/resource/view.php?id=2436)

For a refresher on the description of the starmetrics, umetricsgrants and usptopatents databases, see (insert link here)

## Using Tableau to Make a Database Connection

- Back to the [Table of Contents](#Table-of-Contents)

For this assignment, you will need to use the Tableau Software (click here to download: http://www.tableau.com/). If you are using a Windows computer, you will also need to download putty.exe (click here to download: http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe). 

Unfortunately, Tableau does not yet have internal SSH capability and can be a little complicated to set up. Explicit directions on setting up a database connection using Tableau can be found in moodle (insert link here). 

### Connecting Tableau to Database - Mac

- Back to the [Table of Contents](#Table-of-Contents)

Step 1) Open terminal window and enter the following information <br>
- ssh -L 127.0.0.1:3306:127.0.0.1:3306 <username>@bigdataforsocialscience.com -N
- enter password
- Minimize terminal window (do not close)

Step 2) Open Tableau and Connect to the Database
- On the left side of Tableau, scroll down to connect to a server
- Click on MySQL (the second option after Tableau Server)
- Enter the following into the pop-up box:
    - Server: 127.0.0.1
    - Port: 3306
    - Username: "enter your username"
    - Password: "enter your password"

If you need additional instructions or troubleshooting, additional instructions and step-by-step pictorial instructions are available in moodle (insert link here). 

### Connecting Tableau to Database - Windows

Step 1) Open putty.exe and configure putty for port-forwarding <br>
- Click on the [+] next to Connection (if not already expanded)
- Click on the [+] next to SSH (if not already expanded) 
- Click on Tunnels  
- Under "Add new forwarded port:", enter the following information:
    - Source port: 3306
    - Destination: 127.0.0.1:3306
- Click "Add"

Step 2) Login using Server IP address <br>
- Click Session
- Under "Host Name", enter the following information:
    - 54.85.16.34
- Under "Saved Sessions", enter the following information:
    - Big Data Class
- Click "Save"
- Click "Open"

** If you are presented with a pop-up (The server's host key...), click "Yes".

Step 3) Login using Big Data Class Server Credentials
- Enter username and password

Step 4) Minimize terminal window (do not close)

Step 5) Open Tableau and Connect to the Database
- On the left side of Tableau, scroll down to connect to a server
- Click on MySQL (the second option after Tableau Server)
- Enter the following into the pop-up box:
    - Server: 127.0.0.1
    - Port: 3306
    - Username: "enter your username"
    - Password: "enter your password"
    
If you need additional instructions or troubleshooting, additional instructions and step-by-step pictorial instructions are available in moodle (insert link here). 

## Tableau Exercises

- Back to the [Table of Contents](#Table-of-Contents)

Tableau is a data analysis visualization software that is easy to learn and very easy to use. The software allows you to connect your data and perform queries without writing a single line of code. It allows you to shift between views with drag and drop technology to build anything from single visualizations to a interactive dashboard.

Now that you've successfully connected Tableau to the Big Data for Social Science server, you can easily create visualizations to display, navigate and understand the data available to you. These are a few short exercises that will teach you the basics of navigating the Tableau software so you can create your own visuals. 


### Selecting Data Tables

- Back to the [Table of Contents](#Table-of-Contents)

Begin by selecting a table. Under "Database" click on the dropdown menu and select "homework". Two tables will appear: OSU_grant and OSU_vendor (these are the same tables from the Database Basics notebook). 

Next, draft OSU_vendor to the section on the screen that says "Drag tables here" then choose "Automatically Update". If successful, you should see the data at the bottom of the screen. 

Lastly, drag the OSU_grant table toward the OSU_vendor table (top part of screen). Click on the Venn diagram between the two datasets and choose "LEFT JOIN". Clikc on the dropdown menu between "Data Source" and "OSU_vendor" and choose "UNIQUEAWARDNUMBER". A complete dataset will appear below. 

Now you are ready to create some visualizations!!

### Additional Notes about Selecting Data

- Back to the [Table of Contents](#Table-of-Contents)

1) The order of joining the two datasets is very important and can affect the MySQL joins in you data. For example, if you choose to drag OSU_grant first in the example above, you will need to choose "RIGHT JOIN" instead of "LEFT JOIN" to make sure you have a complete dataset. 

2) Before beginning to work with a dataset, check to make sure that the variable types are consistent to the types of visualizations you want to show. For example, all dates should be a "Date" type and all numeric variables should be a "numeric" type. If you forget, you can always change the variable types while building your visualizations. 

3) In the OSU datasets, the "FIPSCODE" variable is automatically reported as a string variable. For these exercises, you will need to give them a "Geographic Role" of county type. 

### Exercise 1: Create a Simple Bar Chart

- Back to the [Table of Contents](#Table-of-Contents)

To create a simple bar chart to describe the total payment amounts of OSU vendors by federal agency, follow these simple instructions: 

- Click on "Sheet 1" (bottom left side of the screen) to start a worksheet

Once you are at a new worksheet, you can do the following: 
- From the Measures section (bottom left side of the screen), drag "Payment Amount" measures to the "Rows Shelf"
- From the Dimensions section (top left side of the screen), drag "Agency" (from OSU_grant) to the "Columns Shelf"

Note that you can also click on the "Payment Amount" measure while holding the "Control" key, click on the "Agency" variable under Dimensions. Then, go to the "Show Me" menu (top right side of the screen). There you can see different graphs that are allowed with the selected variables. 

### Exercise 2: Create a Simple Timeline Graph

- Back to the [Table of Contents](#Table-of-Contents)

To create a simple timeline graph of total payment amounts of OSU vendors over time, follow these simple instructions: 

- Create a new worksheet by clicking on the "New Worksheet" icon (bottom left side of the screen)

Once you are at a new worksheet, you can do the following: 
- From the Meaures section, drag "Payment Amount" measures to the "Columns Shelf"
- From the Dimensions section, drag "Period End Date"
- If you click on the [+] on Year(Periodenddate) Rows Shelf, you can add quarterly information to the total paymentamount

In addition, you can click on the "Show Me" menu to see other types of charts that can be created to visualize the selected variables.


### Exercise 3: Create a Heat Map

- Back to the [Table of Contents](#Table-of-Contents)

To create a simple heat map of total distribution of payment amounts of OSU vendors by geographic location, follow these simple instructions: 

- Create a new worksheet
- From the Dimensions section, click on "FIPSCODE" and then click "SHOW ME" to see the list of allowed charts
- From Show Me, choose the second map (you can drag the map to change its position on the screen)
- From the Dimensions section, drag "Agency" (OSU grant) to Color on Marks card

This gives you a good visualization of the distribution of different categories of agency by county. The agency (OSU grant) card shows you what each color represents. 


### Exercise 4: Create Agency and Sub-Agency Breakdowns

- Back to the [Table of Contents](#Table-of-Contents)

To create an agency/sub agency breakdown, follow these simple instructions: 

- Create a new worksheet
- From Dimensions, drag Agency Abbrev into Text Marks
- From Dimensions, drag Sub Agency Text into
- From Measures, drag Paymentamount into Size Mark
- From Dimensions, drag Agency Abbrev into Color Mark


### Exercise 5: Create a Timeline of Total Payments by Agency

- Back to the [Table of Contents](#Table-of-Contents)

To create a timeline of total OSU payments by agency, follow these simple instructions: 

- Create a new worksheet
- Drag Paymentamount to Rows Shelf
- Drag Periodenddate to Columns Shelf
- Change from YEAR(Periodenddate) to MONTH(Periodenddate)
- Drag Agency Text onto the Filter and select everything except "NULL". Click to apply and click "OK"
- Drag Agency Abbrev onto the Color Mark and choose "Add All Members" 
- Drag Agency Abbrev onto Label Mark
- Rename sheet to "Timeline, Total Payment ($), Per Agency"

To annotate your chart, right click the line and choose "Annotate". 


### Exercise 6: Create a Dashboard

To create a dashboard of visualizations, follow these simple instructions:  

- From the Tableau toolbar, click on Dashboard and create "New Dashboard"
- From the Dashboard, drag your worksheet to "Drop Sheets Here" by double clicking on the worksheet


### Resources for Tableau

- Back to the [Table of Contents](#Table-of-Contents)

Below you will find a 5-minute video that describes how to create visualizations with Tableau using a very simple, but affective, approach. In addition, the handout follows the progression of the vidoes, but is heavily annotated. 

Video https://www.youtube.com/watch?v=-4uNv6wuGQ8 <br>
Handout https://docs.google.com/presentation/d/1bPn44W15Jq3csc87vld0FWXZpu4cnoqe1Qqob57KvTQ/edit#slide=id.p

### Resources for Keshif

- Back to the [Table of Contents](#Table-of-Contents)

Below you will find similar resources for a dashboard visualization program called Keshif.

Video :: https://www.youtube.com/watch?v=3Hmvms-1grU <br>
Handout :: https://docs.google.com/presentation/d/1beCw3KiFjWLdVfgp8EICFPNPiuu2UzX8PFbcirJFQVw/edit#slide=id.gc5246df19_0_81
