## Sales Data

In this exercise, we're going to explore our sales data a bit more graphically.

Read the superstore data in from the `superstore_sales` csv by running the cell below.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Set the style to use the ggplot style as default
plt.style.use('ggplot')


sales = pd.read_csv('data/superstore_sales.csv')
sales.head()

Note that not all of the columns are shown above, if you want to see the names of all of the columns, you can run `df.columns`

### Bar Charts

You can view the unique values in a column by running `df['col_name'].unique()`, use this to view the unique shipping modes.

Do this on the `"Ship Mode"` column from the `sales` data frame.

In [None]:
# Write your code to view unique shipping modes here


Draw a bar graph of the three shipping modes and the **average shipping cost per order** for each one. Shipping cost is in the column `"Shipping Cost"`.

## Line Chart

Write a function to that takes a parameter of a string in the format 'DD/MM/YYYY' and returns only the year as an integer.

Map this function onto the column `"Order Date"` in order and set the result to a new column `"Order Year"`.

Create a Line Chart to show total annual profits for 2009-2012

## Boxplots

The manager for the Province `Newfoundland` would like to know the spread of the revenues from orders from each of their product categories. The product categories are found in the `"Product Category"` column, the revenues are found in the `"Sales"` column. 

Plot the boxplots of revenue for each of the product categories for only the province Newfoundland. 

## Graphing Challenge

It is not expected to be able to reproduce this graph, particularly not at this stage of the course, you might want to come back to it, but take it as a challenge to see how close you can get.

In [None]:
observed = pd.read_csv('data/river_flow_historical_data.csv', names=['Datetime', 'Flow'])
prediction = pd.read_csv('data/river_flow_prediction_data.csv', names=['Datetime', 'Flow'])

observed['Datetime'] = pd.to_datetime(observed['Datetime'])
prediction['Datetime'] = pd.to_datetime(prediction['Datetime'])

# Note that these are Series, not Data Frames
observed = pd.Series(observed.set_index('Datetime')['Flow'])
prediction = pd.Series(prediction.set_index('Datetime')['Flow'])

print(observed.head())
print(prediction.head())

The graph below was created using matplotlib. Try and reproduce it. The observed data and forcecast data is loaded for you using the cell above.

![graph challenge graph](data/graph.png)

Some hints:
* `plt.semilogy(x, y)` will give you a semilog line chart with a log base 10 x axis
* `plt.fill_between(x, y_1, y_2)` will fill the graph between x, y_1 and y_2
* `plt.axhline(y_scalar)` will give you a horizontal line at y
* You can order lines/scatter plots by providing a parameter `zorder` - the higher the z-order, the closer to the front of the graph (i.e. a z-order of 2 will be superimposed on top of 1)

In [None]:
# Don't change this code
import datetime

plt.style.use('fivethirtyeight')

now = max(observed.index)
yesterday = now - datetime.timedelta(days=1)
plus_12_hours = now + datetime.timedelta(hours=12)

x_ticks = pd.date_range('2016-12-24 00:00', '2016-12-26 16:00', freq='12H')
x_tick_labels = [x.strftime('%d %b %H:%M') for x in x_ticks]
y_ticks = [0.1, 1, 10, 100]

max_value = max(pd.concat([observed, prediction]))

observed_line_color = '#0050d1'
prediction_line_color = '#d10000'
now_scatter_color = '#f90000'
yesterday_scatter_color = '#00a1b7'
plus_12_hours_scatter_color = '#6a00b7'
observed_fill_color = '#96beff'
prediction_fill_color = '#fc7b7b'
max_value_text_color = '#00a1b7'
max_value_line_color = '#fc7b7b'

plt.figure(figsize=(15, 7))


# Your code here

# Start with your line graphs for observed and forecast points
# These are semilogy plots, the line width is 2, the colors are given above
# Remember to label your lines so they appear on the graph




# Now add the individual scatter plots
# These should be at the front (parameter name `zorder`),
# they have a white edgecolor (parameter name `edgecolor`)
# Their line width is 2 (parameter name `lw`)
# They have a size (parameter name `s`) of 140




# The white lines have no x length, they start and end at the same time
# The lines on the y axis start at 0.1 and extend up to the flow value at that time





# Fill the graphs below between 0.1 and the observed/predicted value
# The opacity (`alpha`) of the fill should be 0.5





# Create a horizontal line at the maximum value of the observed or prediction
# using plt.axhline(), the linewidth should be 2






# You can add text to your graph to indicate the maximum value
# using plt.text(x, y, text)
# the font size should be 14 (parameter name `fontsize`)





# x and y ticks take an array of values as the first parameter
# and an array of labels as their second parameter
# the tick positions and labels are already defined above





# The x limit should be set between the minimum observed date
# and the maximum prediction date






# Add a y label and a title for your graph






# Add your legend, you can put the legend outside of the graph
# by providing paramaters `loc='center left', bbox_to_anchor=(1, 0.5)`





plt.show()