# Try It 4.2: Publish to GitHub

In this activity, you will publish this plot to [GitHub](https://github.com/).

1. Follow the instructions in [this tutorial](https://reproducible-science-curriculum.github.io/sharing-RR-Jupyter/01-sharing-github/) to see how to add your Jupyter Notebook to GitHub
2. Navigate to `nbviewer.org`
3. Paste the link to your notebook on GitHub into the search bar on `nbviewer.org`
4. (Optional): Post the nbviewer link to your published notebook in the Forum for your peers to see!

**Data Source Citation**

Kaggle Link: https://www.kaggle.com/kyanyoga/sample-sales-data

_Originally Written by María Carina Roldán, Pentaho Community Member, BI consultant (Assert Solutions), Argentina. This work is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License. Modified by Gus Segura June 2014._

In [1]:
import pandas as pd
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource, FactorRange, NumeralTickFormatter

Set pandas display options and use `output_notebook()` to display plots within Jupyter Notebook.

> Note: The code below will display "BokehJS <version number> successfully loaded" if the `output_notebook()` function was successfully ran. This will enable backend JavaScript to work within your notebook and browser. You don't need to know JavaScript or code in it for this course, but it is working in the background.

In [2]:
pd.set_option("display.max_columns", 101)
output_notebook()

Read in data using Pandas.

In [3]:
PATH = '~/emeritus/design-work/PCBA-BA/datasets/mod4_datasets/sales_data.csv'

In [4]:
dtypes = {'deal_size': 'category', 'year_id': 'str'}

In [5]:
df = pd.read_csv(PATH, parse_dates=['order_date'], dtype=dtypes)
df['deal_size'] = df['deal_size'].cat.reorder_categories(['Small', 'Medium', 'Large',])

In [6]:
df.head()

Unnamed: 0,order_number,quantity_ordered,price_each,order_line_number,sales,order_date,status,qtr_id,month_id,year_id,product_line,msrp,product_code,customer_name,phone,address_line1,address_line2,city,state,postal_code,country,territory,contact_last_name,contact_first_name,deal_size
0,10107,30,95.7,2,2871.0,2018-02-24,Shipped,1,2,2003,Motorcycles,95,S10_1678,Land of Toys Inc.,2125557818,897 Long Airport Avenue,,NYC,NY,10022.0,USA,,Yu,Kwai,Small
1,10121,34,81.35,5,2765.9,2018-05-07,Shipped,2,5,2003,Motorcycles,95,S10_1678,Reims Collectables,26.47.1555,59 rue de l'Abbaye,,Reims,,51100.0,France,EMEA,Henriot,Paul,Small
2,10134,41,94.74,2,3884.34,2018-07-01,Shipped,3,7,2003,Motorcycles,95,S10_1678,Lyon Souveniers,+33 1 46 62 7555,27 rue du Colonel Pierre Avia,,Paris,,75508.0,France,EMEA,Da Cunha,Daniel,Medium
3,10145,45,83.26,6,3746.7,2018-08-25,Shipped,3,8,2003,Motorcycles,95,S10_1678,Toys4GrownUps.com,6265557265,78934 Hillside Dr.,,Pasadena,CA,90003.0,USA,,Young,Julie,Medium
4,10159,49,100.0,14,5205.27,2018-10-10,Shipped,4,10,2003,Motorcycles,95,S10_1678,Corporate Gift Ideas Co.,6505551386,7734 Strong St.,,San Francisco,CA,,USA,,Brown,Julie,Medium


## Data Cleaning

- Filter to rows where `status == "Shipped"`
- Group the rows by `order_date` and sum the `sales` column

In [7]:
df = df[df['status'] == 'Shipped'].copy()

In [8]:
df_grp = df.groupby(['order_date'])[['sales']].agg('sum').reset_index()

In [9]:
df_grp.head()

Unnamed: 0,order_date,sales
0,2018-01-06,12133.25
1,2018-01-09,11432.34
2,2018-01-10,6864.05
3,2018-01-29,54702.0
4,2018-01-31,44621.96


## Publish this notebook containing the vertical bar chart below to GitHub

Instantiate an instance of a Bokeh `figure` using the following arguments:

- `title="Overall Revenue over Time"`
- `x_axis_label="Date"`
- `y_axis_label="Revenue"`
- `tooltips=[("Date", "$order_date"), ("Revenue", "$y")]`
- `plot_width=900`
- `plot_height=500`


In [10]:
p = figure(
    title="Overall Revenue over Time",
    x_axis_label="Date",
    y_axis_label="Revenue",
    tooltips=[("Date", "$order_date"), ("Revenue", "$y")],
    plot_width=900,
    plot_height=500,
    tools="pan,wheel_zoom,reset,hover,save",
)

In [11]:
p.line(
    df_grp['order_date'],
    df_grp['sales'],
    legend_label="Sales",
    line_width=2,
)
show(p)