# A Notebook to Visualize Data Using a Choropleth Map
This notebook shows an example of how to visualize data using a type of visualization called [choropleth map](https://en.wikipedia.org/wiki/Choropleth_map). A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per capita income.  

For those of you interested in the code, it uses predefined functions from the [plotly](https://plot.ly) library to plot data and the [pandas](http://pandas.pydata.org) library to store and manage data.

In [None]:
import pandas as pd
from plotly.offline import init_notebook_mode,iplot
import ipywidgets as widgets
from ipywidgets import interact_manual

init_notebook_mode(connected=True) 

def searchFile():
    import os
    topdir='.'
    f = []
    for (dirpath, dirnames, filenames) in os.walk(topdir):
        for filename in filenames:
            if filename.endswith('.csv'):
                f.append(os.path.realpath(os.path.join(dirpath,filename)))
    return f   

def update_breakdown(*args):
    df = pd.read_csv(filename.value)
    breakdown.options=list(df.columns)
    attribute.options=list(df.columns)

def ChoroplethMap(filename,title,attribute,breakdown):
    df=pd.read_csv(filename)
    scl = [[0.0,'rgb(255,255,255)'],[1.0,'rgb(0,0,0)']]
    df['breakdown']=''
    for col in breakdown:
        df['breakdown']=df['breakdown']+col+': '+df[col].astype('str')+'<br>'
    data = [ dict(
        type='choropleth',
        colorscale = scl,
        autocolorscale = False,
        locations = df['Code'],
        z = df[attribute].astype('float'),
        locationmode = 'USA-states',
        text = df['breakdown'],
        marker = dict(
            line = dict (
                color = 'rgb(255,255,255)',
                width = 2
            ) ),
        colorbar = dict(
            title = attribute+"<br>(USD)")
        ) ]
    layout = dict(
        title = title,
        geo = dict(
            scope='usa',
            projection=dict( type='albers usa' ),
            showlakes = True,
            lakecolor = 'rgb(255, 255, 255)'),
             )
    fig = dict( data=data, layout=layout )
    iplot( fig, filename='d3-cloropleth-map' )

def createWidgets():
    style={'description_width': 'initial'}
    filename=widgets.Dropdown(options=searchFile(),description='Choose a file',\
                              disabled=False,continuous_update=False)
    breakdown=widgets.SelectMultiple(description='Breakdown',disabled=False,\
                                     continuous_update=False)
    attribute=widgets.Select(description='Attribute to color',style=style,\
                             disabled=False,continuous_update=False)
    title=widgets.Text(value='input',description='Name your graph:',style=style,\
                       disabled=False,continuous_update=False)
    return filename,title,attribute,breakdown

def loadHead(dataset):
    import pandas as pd
    df=pd.read_csv(dataset)
    with pd.option_context('display.max_columns', None):
        display(df.head())

## Data Format

The functions above expect data to be in a specific format.  The first row contains feature names and the rest of the rows can be either categorical or numerical values of those different features. There cannot be any missing values for the features, otherwise the function returns an error. Also, there must be a state abbreviation column named 'Code' so the algorithm can figure out how to process the data. 

We provide an example dataset ["States ranked by per capita income.csv"](https://github.com/RupertMa/INF549/blob/master/Assignment_Visualization/States%20ranked%20by%20per%20capita%20income.csv) in this directory: "Choropleth Map". Run the following cell to display the dataset before you visualize it. The dataset includes multivariate data. Choosing how many and which specific features is at your discretion. 

In [None]:
dataset=input('Please enter the dataset you want to display: ')
loadHead(dataset)

## United States Choropleth Map
The following function will generate a visualization for your data using US choropleth map. When prompted, please choose the csv file containing the data, the attribute you want to color, and other attributes you want to see when you hover over the map. Please note that "Other attributes" is a multiple selection widget so you can press command (Mac) or control (Windows) to select multiple variables .   

**Tips:**    
**If you want to choose the first listed file, click other files first and then click the first listed file.**  


In [None]:
filename,title,attribute,breakdown=createWidgets()
interact_manual(ChoroplethMap,title=title,filename=filename,breakdown=breakdown,\
                attribute=attribute)    
filename.observe(update_breakdown,'value')

## Using Your Own Dataset
To use your own dataset, create a new file and put it in the directory "Choropleth Map".  Make sure it follows the format of the datasets in this directory. Specifically, the first row needs to be feature names and the rest of the rows can be either categorical or numerical values of those different features. Make sure there are no missing values in your dataset, otherwise you will get an error. You can use any dataset described by US states. **Also, please name your state abbreviation column as "Code" so the algorithm can know how to process your data. **  

**Note that the unit for the attribute you want to color is defaulted as 'USD'. If you want to change the unit, please change the text in the top cell as shown below.  **
<img src="https://github.com/RupertMa/INF549/blob/master/Assignment_Visualization/Picture1.png?raw=true" width="40%">

Once you have created the file, run the cell below.

In [None]:
filename,title,attribute,breakdown=createWidgets()
interact_manual(ChoroplethMap,filename=filename,breakdown=breakdown,\
                attribute=attribute,title=title)    
filename.observe(update_breakdown,'value')

Now you can print this notebook as a PDF file and turn it in.