# An Introduction to Visualising In the Spotlight Data Using Python

In this notebook we will introduce a way of producing visualisations of [*In the Spotlight*](https://www.libcrowds.com/collection/playbills) results data, using Python. Our input will be the dataframe of transcription data produced in a [previous notebook](intro_to_analysing_its_data_using_python.ipynb).

[Plotly.py](https://plot.ly/d3-js-for-python-and-pandas-charts/) is a Python graphing library that can be used to produce over 30 chart types that can viewed in Jupyter notebooks. We will use it here to produce pie and bar charts. In future notebooks we will go on to explore some more complex chart types.

We begin by importing the required Python libraries, pandas and plotly.

In [128]:
import pandas
import plotly

## The dataset

In the notebook [An Introduction to Analysing In the Spotlight Data Using Python](intro_to_analysing_its_data_using_python.ipynb) we imported all of the results data from [*In the Spotlight*](https://www.libcrowds.com/collection/playbills) into a pandas dataframe. Towards the end of the notebook we stored that dataframe to disk. Here, we will load it back into memory so that we can use its contents for our visualisations.

In [129]:
df = pandas.read_json('../data/transcriptions.gz', compression='gzip')

## Pie charts

Pie charts are perhaps one of the most straightforward types of visualisation to get started with, all we need are a list of labels and a list of values. 

For this chart, we will plot the top ten genres found in our current dataset. We can get a count of all of our gentres by using the `value_counts()` method, which was introduced in an [earlier notebook](intro_to_analysing_its_data_using_python.ipynb). 

In [137]:
genre_df = df[df['tag'] == 'genre']
genre_counts = genre_df['transcription'].value_counts()

The first few rows are also displayed below to give us a quick snapshot of the data. It shows that we now have an index of genres against a count of the number of times each genre appears in our dataset.

In [138]:
genre_counts.head()

Comedy     450
Farce      441
Drama      210
Tragedy    161
Play       134
Name: transcription, dtype: int64

We can now define the labels and values to be used for our chart. Below, the first ten items of the index are converted to a list and assigned to the variable *labels*; a similar operation defines our *values*.

In [131]:
labels = genre_counts[:10].index.tolist()
values = genre_counts[:10].tolist()

Plotly charts are generated from a list of traces. A trace is just the name we give a collection of data and the specifications of which we want that data plotted. Some charts can plot multiple traces; in this case, we only have one.

Below, a pie chart trace is defined and used as the only item in a list, to become our chart data. 

In [132]:
trace = plotly.graph_objs.Pie(labels=labels, values=values)
chart_data = [trace]

Finally, we can plot the chart with the following line of code.

In [133]:
plotly.offline.iplot(chart_data)

As with the other types of chart we will see later, pie charts are additional options available for styling the chart, hiding the legend, displaying additional information when hovering over slices of the pie and so on. More details of the options available can be found in the [plotly documentation](https://plot.ly/python/pie-charts/).

## Bar charts

Bar charts can be produced with very similar code to the pie chart generated above. Again, we just need a list of labels and a list of values. For this chart, we will look at the date transcriptions to visualise the most popular months of the year identified in our current dataset. 

Before continuing, we need to know that we store our date transcriptions according to the [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) standard, in the format **dd-mm-yyy**.

The code below uses the pandas [str.split](http://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.Series.str.split.html) function to separate our dates on the hyphen delimiter and expand the results into seperate columns.

In [140]:
date_df = df[df['tag'] == 'date']
split_date_df = date_df['transcription'].str.split(pat='-', expand=True)
split_date_df.head()

Unnamed: 0,0,1,2
10000,1825,3,4
10001,1825,8,31
10002,1825,6,30
10003,1825,4,13
10004,1825,7,23


As shown above, we now have all of our years in column **0**, our months in column **1** and our days in column **2**. Similar to the operations we performed for our pie chart, we now take the counts of our months and assign the index to the variable **x** and the counts to variable **y**.

In [143]:
date_counts = split_date_df[1].value_counts()
x = date_counts.index.tolist()
y = date_counts.tolist()

The bar chart can now be produced as follows.

In [144]:
trace = plotly.graph_objs.Bar(x=x, y=y)
date_data = [trace]
plotly.offline.iplot(date_data)

To view the charts for the year or day, we could simply replace **1** in the code above with **0** or **3**.

## Summary

In this notebook we began visualising our transcription data using Python. We produced a pie chart to plot our top ten genres and a bar chart to plot the most popular months of the year.

Before moving on to produce more complex and interesting chart types we need to do some more work to manipulate our data. The next goal is to build up a dataset of performances, rather than just single transcriptions, so that we can plot the different aspects of the performances against one another.