# Visualization for Big Data - Exploring the Data

For this exercise, we will be using data contained in the "starmetrics" database on the Big Data for Social Science Class Server. This notebook will walk you through accessing the starmetrics data using IPython Notebook and help you familiarize you with the available class data.

## Table of Contents

- [Database Tables](#Database-Tables)

- [Creating Your Own Table Using Existing Data Tables](#Creating-Your-Own-Table-Using-Existing-Data-Tables)

   - [Exercise 1: Create Table](#Exercise-1:-Create-Table)
   - [Continued Practice with Tableau](#Continued-Practice-with-Tableau)

- [Resources for Tableau](#Resources-for-Tableau)
- [Resources for Keshif](#Resources-for-Keshif)


## Database Tables

- Back to the [Table of Contents](#Table-of-Contents)

For these exercises we will begin to branch out into the starmetrics and umetricsgrants databases to help you explore the available class data. Each of these databases contain different types of information and are available for your use during this class. 

You also have a personal database where you can create and modify tables as you wish, to support your work. Your databases have the same name as your username. 

For a refresher on the description of the starmetrics, umetricsgrants and usptopatents databases, see (insert link here).

For a refresher on how to make a database connection between Tableau and the class server, see (insert link here).

## Creating Your Own Table Using Existing Data Tables

- Back to the [Table of Contents](#Table-of-Contents)

For this class, it is not only important to understand how to query an existing database, but it is also important to be able to create your own tables. The Tableau software, although has amazing visualization capabilites, can be significantly slow the more large and complex your data becomes. Because of this, especially for this class, we suggest that you use SQL or IPython to filter your data before syncing to Tableau. 

Creating your own table on the class server is very simple. The SQL syntax for creating your own table is below for reference. 

  CREATE TABLE "username"."new_table" ( <br>
  SELECT * FROM "database"."existing_table" <br>
  );
  
### Exercise 1: Create Table

- Back to the [Table of Contents](#Table-of-Contents)

Look up the universities that are available to you in the starmetrics.vendor table and display which universities are available. Then, modify the SQL query that created the homework.OSU_vendor table to include two universities of your choice.

In [None]:
# imports
import MySQLdb

# declare variables - 
user = "<username>"
password = "<password>"
database = "starmetrics"

# invoke the connect() function, passing parameters in variables.
db = MySQLdb.connect( user = user, passwd = password, db = database )

# create mysql cursor that maps column names to values in the query result.
cursor = db.cursor( MySQLdb.cursors.DictCursor )

# declare variables
select_string = ""
column_value = -1 

# Query template
select_string = "SELECT DISTINCT university FROM vendor;"
cursor.execute( select_string )
row = cursor.fetchall()
column_value = row[ "university" ]
print(column_value)

Now using the output to modify the SQL CREATE TABLE query to create a new table in your user database.

In [None]:
# use cells at top to connect or re-connect to database and make cursor if needed

# declare variables
select_string = ""
column_value = -1

    # Query template
select_string = "CREATE TABLE "<username>.<table_name> ("
select_string += " SELECT periodstartdate, periodenddate, v.uniqueawardnumber, recipientaccountnumber, institutionid, paymentamount, v.university, v.cfda,"
select_string += " v.zipcode, fipscode, statecode, countycode, c.agency, agency_abbrev, agency_text, sub_agency_text, program_title"
select_string += " starmetrics.vendor v"
select_string += " LEFT JOIN starmetrics.zip_to_fip z on z.zipcode = v.zipcode"
select_string += " LEFT JOIN starmetrics.cfda c on c.cfda = v.cfda"
select_string += " WHERE v.university = 'OSU'"
select_string += " AND periodstartdate >= '2011-01-01' AND v.zipcode != "" );"
#cursor.execute( select_string )
#row = cursor.fetchone()
#column_value = row[ "payment_sum" ]

## Continued Practice with Tableau

- Back to the [Table of Contents](#Table-of-Contents)

Using the newly created table(s), design your own dashboard of visualizations to compare/contrast/describe the data using Tableau. 

### Resources for Tableau

- Back to the [Table of Contents](#Table-of-Contents)

Below you will find a 5-minute video that describes how to create visualizations with Tableau using a very simple, but affective, approach. In addition, the handout follows the progression of the vidoes, but is heavily annotated. 

Video https://www.youtube.com/watch?v=-4uNv6wuGQ8 <br>
Handout https://docs.google.com/presentation/d/1bPn44W15Jq3csc87vld0FWXZpu4cnoqe1Qqob57KvTQ/edit#slide=id.p

### Resources for Keshif

- Back to the [Table of Contents](#Table-of-Contents)

Below you will find similar resources for a dashboard visualization program called Keshif.

Video :: https://www.youtube.com/watch?v=3Hmvms-1grU <br>
Handout :: https://docs.google.com/presentation/d/1beCw3KiFjWLdVfgp8EICFPNPiuu2UzX8PFbcirJFQVw/edit#slide=id.gc5246df19_0_81
