# A Notebook to Visualize Data Using Parallel Coordinates

This notebook shows an example of how to visualize data using a type of visualization called parallel coordinates.

For those of you interested in the code, it uses predefined functions from the plotly library to plot data and the pandas library to handle coordinates.

YIBO: please add hyperlinks to plotly and pandas.

In [8]:
from plotly.offline import init_notebook_mode,iplot
import plotly.graph_objs as go
import pandas as pd 
#import plotly.plotly as py

init_notebook_mode(connected=True) 

def produceDict(df,label):
    dic={}
    if df.dtypes in ('int64','float64'):
        dic['range']=[df.min(),df.max()]
        dic['label']=label
        dic['values']=df
        #dic['visible']=False
    else:
        df=df.astype('category')
        encodedLabels=dict(enumerate(df.cat.categories))
        dic['range']=[0,len(encodedLabels)-1]
        dic['tickvals']=encodedLabels.keys()
        dic['label']=label
        dic['values']=df.cat.codes
        dic['ticktext']=encodedLabels.values()
    return dic

def ParallelCoordinates():
    filename=raw_input('Please enter the file you want to analyze:')
    attrs=raw_input('Please enter the attributes you want to plot:').split(',')
    classes=raw_input('Please enter the column name of the classes you want to observe:')
    df = pd.read_csv(filename)
    dimensions=[]
    for attr in attrs:
        dimensions.append(produceDict(df[attr],attr))
    lineDict=produceDict(df[classes],classes)
    line=dict(color=lineDict['values'],colorscale=[[0.0,'rgb(255,97,100)'],[0.5,'rgb(131,245,115)'],[1.0,'rgb(109,172,244)']],showscale = True,\
              colorbar=dict(title=classes,ticks='outside'))
    if df[classes].dtypes not in ('int64','float64'):
        line['colorbar']['tickvals']=lineDict['tickvals']
        line['colorbar']['ticktext']=lineDict['ticktext']
    data=[go.Parcoords(line=line,dimensions=dimensions)]
    layout = go.Layout(title='Parallel Coordinates',\
                       font=dict(color='#292A2A',size=13))
    fig = go.Figure(data = data, layout = layout)
    iplot(fig, filename = 'parcoords-basic')  #,image='svg'

## Data Format

The functions above expect data to be in a specific format.  The first row contains feature names and the rest of the rows can be either categorical or numerical values of those different features. There cannot be any missing values for the features, otherwise the function returns an error.

We provide several example datasets in this directory: YIBO please include a cell here that shows an excerpt of a dataset.

## Parallel Coordinates
The following function will generate a visualization for your data using parallel coordinates. When prompted, please input the filename of the dataset, the feature you want to analyze as the dimensions of the plot, and the classes you want to include. If you want to show several features, please separate them with commas.   

In [11]:
ParallelCoordinates()

Please enter the file you want to analyze:iris.csv
Please enter the attributes you want to plot:sepal width in cm
Please enter the column name of the classes you want to observe:petal width in cm


## Using Your Own Dataset
To use your own dataset, create a new file and put it in the directory where this notebook is.  Make sure it follows the format of the datasets in this directory. Specifically, the first row needs to be feature names and the rest of the rows can be either categorical or numerical values of those different features. Make sure there are no missing values in your dataset, otherwise you will get an error.

Once you have created the file, run the cell below.

In [None]:
ParallelCoordinates()

Now you can print this notebook as a PDF file and turn it in.