# Visualization for Big Data - Using Tableau

For this exercise, we will be using data contained in the "homework" database. This notebook will walk you through accessing the class homework data using IPython Notebook and help to familiarize you with the available class data.

## Table of Contents

- [Database Tables](#Database-Tables)

- [Using Tableau to Make a Database Connection](#Using-Tableau-to-Make-a-Database-Connection)

- [Tableau Exercises](#Tableau-Exercises)

    - [Selecting Data Tables](#Selecting-Data-Tables)
    - [Notes About Selecting Data](#Additional-Notes-about-Selecting-Data)
    - [Exercise 1: Create a Simple Bar Chart](#Exercise-1:-Create-a-Simple-Bar-Chart)
    - [Exercise 2: Create a Simple Timeline Graph](#Exercise-2:-Create-a-Simple-Timeline-Graph)
    - [Exercise 3: Create a Heat Map](#Exercise-3:-Create-a-Heat-Map)
    - [Exercise 4: Create Agency and Sub-Agency Breakdowns](#Exercise-4:-Create-Agency-and-Sub-Agency-Breakdowns)
    - [Exercise 5: Create a Timeline of Total Payments by Agency](#Exercise-5:-Create-a-Timeline-of-Total-Payments-by-Agency)
    - [Exercise 6: Create a Dashboard](#Exercise-6:-Create-a-Dashboard)
    - [Combining Tables in Tableau - JOINs](#Combining-Tables-in-Tableau---JOINs)

- [Troubleshooting](#Troubleshooting)
- [Resources for Tableau](#Resources-for-Tableau)
- [Resources for Keshif](#Resources-for-Keshif)

## Database Tables

- Back to the [Table of Contents](#Table-of-Contents)

For these exercises we will continue to use tables in the "homework" database. 

## Connecting Tableau to a Database

- Back to the [Table of Contents](#Table-of-Contents)

For this assignment, you will need to use the Tableau Software (click here to download: http://www.tableau.com/).  

See the installation guide to set up the database connection before continuing with this notebook.

## Tableau Exercises

- Back to the [Table of Contents](#Table-of-Contents)

Tableau is a data analysis visualization software that is easy to learn and very easy to use. The software allows you to connect your data and perform queries without writing a single line of code. It allows you to shift between views with drag and drop technology to build anything from single visualizations to a interactive dashboard.

Now that you've successfully connected Tableau to the class data, you can easily create visualizations to display, navigate and understand the data available to you. These are a few short exercises that will teach you the basics of navigating the Tableau software so you can create your own visuals. 


### Selecting Data Tables

- Back to the [Table of Contents](#Table-of-Contents)

Begin by selecting a table. Under the tab "Database" click on the dropdown menu and select "homework". Two tables will appear: ugrant and vendor (these are the same tables from the Database Basics notebook). 

Next, drag vendor to the section on the screen that says "Drag tables here" then choose "Automatically Update". If successful, you should see the data at the bottom of the screen. 

Now you are ready to create some visualizations!!

#### Additional Notes about Selecting Data

- Back to the [Table of Contents](#Table-of-Contents)

Here are some additional things to consider about our data before you jump into visualization:

1. When combining tables in Tableau, the order datasets are added to Tableau dictates how you JOIN any two tables.  The type of JOIN you use tells Tableau how to combine the JOINed data sets, and if you choose the wrong JOIN type, you'll end up with bad data (usually either incomplete or with massive duplication).  A way to figure out which type of JOIN to use for simple data sets: decide which table you want to be the master table, from which all rows should be represented in your data set, then set Tableau to join on the side of that table.  For example, if you are joining ugrant and vendor, there are multiple vendors per grant, but only one grant per vendor.  In your data set, if you want to include all vendor expenditures, and then make sure related grant information is associated with each expenditure record, vendor is your master table, and so you'd join based on where vendor is in relation to ugrant in the Tableau "Drag Tables here" window - "LEFT JOIN" if you vendor is on the left side of ugrant, or "RIGHT JOIN" if vendor is on the right side.  You also need to set the appropriate JOIN columns, as well.

    - For more details on JOINing vendor and ugrant, see below: [Combining Tables in Tableau - JOINs](#Combining-Tables-in-Tableau---JOINs)
    - For more information on database table JOINs in general, see: [http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins](http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins)<br />

### Verifying data types

1. Before beginning to work with a dataset, check to make sure that the variable types are consistent to the types of visualizations you want to show. For example, all dates should be a "Date" type and all numeric variables should be a numeric type. If you forget, you can always change the variable types while building your visualizations. 

2. Make sure Geographic identifiers are properly identified.  In the class datasets, for example, the "Fipscode" variable is automatically reported as a string variable. For these exercises, you will need to give them a "Geographic Role" of county type. To do this:

    - Click "Abc" below the variable header "Fipscode".
    - Select "Geographic Role".
    - Then select "County".

### Layout of a worksheet

A Tableau worksheet is broken out into a few key areas:

- On the left is a vertical pane that contains "Data" and "Analytics" tabs.  To start, we'll refer to this as the Data pane.  The Data pane has two sections:

    - **Dimensions** - variables from your data set that contain non-numeric data such as strings, dates, or geographic identifiers.
    - **Measures** - variables from your data set that are numeric.
    
- To the right of the Data Pane is the visualization area.  The visualization area is broken out into:

    - a column of small windows called "Cards" that stack up on the left side of the visualization area, just to the right of the Data pane.
    - "Columns" and "Rows" "Shelves" that span across the top of the area, and where you can drag and drop dimensions or measures to add them as rows or columns to the current visualization.
    - the visualization area itself. which takes up the rest of the space to the right of the "cards" and under the "shelves".
    
- In the upper right corner of the screen by default, there is also a "Show Me" tab which expands to a window when clicked.  This window contains a set of available styles of visualizations that can be generated by Tableau, with those that can be applied to currently selected data highlighted and those that can not grayed out.

### Exercise 1: Create a Simple Bar Chart

- Back to the [Table of Contents](#Table-of-Contents)

To create a simple bar chart to describe the total payment amount of vendors by federal agency: 

- Click on "Sheet 1" (bottom left side of the screen) to start a worksheet

Once you are at a new worksheet, you can do the following: 
- From the Measures section (bottom left side of the screen), drag "Paymentamount" measure to the "Rows Shelf"
- From the Dimensions section (top left side of the screen), drag "Agency Abbrev" to the "Columns Shelf"

Note that in the data area on the left side of the tableau screen, you can multi-select a set of items in the Dimensions and Measures areas that you want to include in analysis (by clicking each you want included in your analysis while holding either the "Control" key on Windows or the "Command" key on Mac), then you can go to the "Show Me" menu (top right side of the screen) to see the different graphs that are allowed with the selected variables (those not allowed will be grayed out). 

### Exercise 2: Create a Simple Timeline Graph

- Back to the [Table of Contents](#Table-of-Contents)

To create a simple timeline graph of total payment amounts of vendors over time: 

- Create a new worksheet by clicking on the "New Worksheet" icon (bottom left side of the screen)

Once you are at a new worksheet, you can do the following: 
- From the Measures section, drag "Paymentamount" measures to the "Rows Shelf"
- From the Dimensions section, drag "Periodstartdate" to the "Columns Shelf"
- If you click on the [+] on Year(Periodstartdate) Rows Shelf, you can add quarterly information to the total paymentamount

In addition, you can click on the "Show Me" menu to see other types of charts that can be created to visualize the selected variables.


### Exercise 3: Create a Heat Map

- Back to the [Table of Contents](#Table-of-Contents)

To create a simple heat map of total distribution of payment amounts of vendors by geographic location: 

- Create a new worksheet
- remember that for "Fipscode" to function as a geographical identifier, and so to work with maps, you need to have set it to have a "Geographic Role" of "County:

    - In the Data column to the left, in "Dimensions", look to the left of the field "Fipscode".

        - If there is a little globe there, then Fipscode has already been set to have a "Geographic Role" of "County".  Move on to the next step in Exercise 3.
        - if the letters "Abc" appear to the left of the variable header "Fipscode":

            - Click "Abc".  
            - Select "Geographic Role".
            - Then select "County".
- In the Dimensions section on the left side of the screen, click on "Fipscode".
- Click "SHOW ME" to see the list of allowed charts.
- From Show Me, choose the second map - the middle icon in the second row in "Show me".
- Drag the map inside the visualization area to adjust the portion that is visible on the screen.
- From the Measures section on the left, drag "Paymentamount" to the "Color" button on the "Marks" card, just to the right of the column that contains "Dimensions" and "Measures", below "Pages" and "Filters".

This gives you a good visualization of the distribution of different amounts of money distributed, broken out by the county in which the money was distributed. The `SUM(Paymentamount)` card that appeared when you dragged "Paymentamount" to the "Color" button shows you what the range of colors represent. 


### Exercise 4: Create Agency and Sub-Agency Breakdowns

- Back to the [Table of Contents](#Table-of-Contents)

To create a visualization of the money spent per agancy, broken down within agencies into sub-agencies: 

- Create a new worksheet
- From Dimensions, drag "Agency Abbrev" into the "Text" button in the "Marks" card (the same card where the "Color" button was located in Exercise 3).
- From Dimensions, drag "Sub Agency Text" into "Text" button in the "Marks" card.
- From Measures, drag "Paymentamount" into the "Size" button in the "Marks" card.
- From Dimensions, drag "Agency Abbrev" into the "Color" button in the "Marks" card.


### Exercise 5: Create a Timeline of Total Payments by Agency

- Back to the [Table of Contents](#Table-of-Contents)

To create a timeline of total payments by agency (one line in the chart per agency): 

- Create a new worksheet
- Drag "Paymentamount" from Measures to the Rows Shelf
- Drag "Periodstartdate" from Dimensions to the Columns Shelf
- Change the granularity of the timeline from annual to monthly (from "YEAR(Periodstartdate)" to "MONTH(Periodstartdate)") by either right-clicking on the cell for "YEAR(Periodstartdate)" in the Rows Shelf or clicking on the little triangle on the right side of that cell, then selecting "Month -- May 2015" rather than "Year -- 2015".

    - Make sure to select the "Month" whose description on the right includes the month and the year (in the second set of Date identifiers), rather than the "Month" that just includes the month name ("Month -- May", for example).  "Month -- May 2015" will break out the X axis by month and year - this is your basic timeline.  "Month -- May" will just lump all values of month together, regardless of year, giving you a visualization of spending per month, independent of year.

- Drag "Agency Text" onto the "Filters" card and select everything except "NULL". Click to apply and click "OK"
- Drag "Agency Abbrev" onto the "Color" button in the "Marks" card and choose "Add All Members" 
- Drag "Agency Abbrev" onto the "Label" button in the "Marks" card.
- Rename sheet to "Timeline, Total Payment ($), Per Agency"

To annotate lines in the resulting chart, right click one of the lines in the chart and choose "Annotate". 

### Exercise 6: Create a Dashboard

- Back to the [Table of Contents](#Table-of-Contents)

To create a dashboard of visualizations:  

- From the Tableau toolbar, click on Dashboard and create "New Dashboard"
- In a Dashboard, the pane on the left contains a new set of windows, the top one of which, labeled "Dashboard", contains a list of all of the worksheets in your workbook.
- From this "Dashboard" area in the left pane, add worksheets to your dashboard by either dragging them from the pane on the left to the visualization area on the right (which initially contains "Drop Sheets Here"), or by double-clicking on a given worksheet in the pane on the left.

### Combining Tables in Tableau - JOINs

- Back to the [Table of Contents](#Table-of-Contents)

A quick example on how to JOIN two tables together in Tableau: 

- To start, go to the "Data Source" section of your Tableau file (where you begin when you create a new file) by clicking on the "Data Source" button in the lower left corner of your Tableau window.
- Begin by selecting and adding the vendor table to the data area. (If you've gone through all the exercises above, vendor is already selected and added to the data area.)
- Drag the ugrant table to the data area to add it, as well.
- To choose the way the data from the two tables is "JOIN"ed, click on the Venn diagram (the two overlapping circles between and connected to the two datasets) between the two tables.
- In the pop-up window that results, JOIN on the side of vendor (should be "Left").
- Then, choose the column on which the two tables will be JOINed.  Beneath the options for JOIN types ("Inner", "Left", "Right", "Outer"), there will be a table in which join clauses are specified.  In the first row:

    - On the left, in the "Data Source" column, select "Award_id" from the dropdown ("Agency" will likely be selected by default, so you'll be changing that).
    - Leave the column in the middle set to "=" (you can change the operator here to test if a row in one table should be associated with another, which is advanced).
    - In the right column, labeled "ugrant", click on the dropdown menu and choose "Award_id (ugrant)".
    
- Once you've specified the JOIN criterai, a complete dataset will appear below where each row from vendor contains the information on its associated grant from grant.  Grant information is now appended to the end of any row where the vendor record's "Award_id" value matched a given grant's "Award_id".  This allows you to include information on grants in your visualization.
- Go play with the JOINed data!

## Troubleshooting

- Back to the [Table of Contents](#Table-of-Contents)

Troubleshooting notes:

- If you connet to the database, load some tables, then work with that data for a while, chances are your database connection will have timed out after about 20 minutes.  If you try to subsequently go back and work with "Data Source"s again, Tableau will likely freeze up.  If this happens, you can either:

    - wait for it to figure out that the connection has timed out and reconnect, which does tend to happen, but which can take a few minutes.
    - or you can force close Tableau and connect again.
    
    

## Resources for Tableau

- Back to the [Table of Contents](#Table-of-Contents)

Below you will find a 5-minute video that describes how to create visualizations with Tableau using a very simple, but affective, approach. In addition, the handout follows the progression of the vidoes, but is heavily annotated. 

Video :: https://www.youtube.com/watch?v=-4uNv6wuGQ8 <br>
Handout :: https://docs.google.com/presentation/d/1bPn44W15Jq3csc87vld0FWXZpu4cnoqe1Qqob57KvTQ/edit#slide=id.p

## Resources for Keshif

- Back to the [Table of Contents](#Table-of-Contents)

Below you will find similar resources for a dashboard visualization program called Keshif.

Video :: https://www.youtube.com/watch?v=3Hmvms-1grU <br>
Handout :: https://docs.google.com/presentation/d/1beCw3KiFjWLdVfgp8EICFPNPiuu2UzX8PFbcirJFQVw/edit#slide=id.gc5246df19_0_81
