# Visualizations with Cosmic notebooks (Test Update)
In this notebook, we'll run some queries of our data and visualize it using bokeh.

### Setup

- Create a database **RetailDemo** and container **WebsiteData** to hold our data, if they do not already exist.
- Import sample data to visualize


In [1]:
import azure.cosmos
from azure.cosmos.partition_key import PartitionKey

database = cosmos_client.create_database_if_not_exists('RetailDemo')
print('Database RetailDemo created')

container = database.create_container_if_not_exists(id='WebsiteData', partition_key=PartitionKey(path='/CartID'))
print('Container WebsiteData created')

Database RetailDemo created
Container WebsiteData created


Here's a sample document we will import:

```
{"CartID":5399,
"Action":"Viewed",
"Item":"Cosmos T-shirt",
"Price":350,
"UserName":"Chadrick.Larkin87",
"Country":"Iceland",
"EventDate":"2015-06-25T00:00:00",
"Year":2015,"Latitude":-66.8673,
"Longitude":-29.8214,
"Address":"852 Modesto Loop, Port Ola, Iceland",
"id":"00ffd39c-7e98-4451-9b91-b2bcf2f9a32d"}
```

In [22]:
# Read data from storage
import urllib.request, json 

with urllib.request.urlopen("https://cosmosnotebooksdata.blob.core.windows.net/notebookdata/websiteData.json") as url:
    data = json.loads(url.read().decode())   

print("Importing data. This will take a few minutes...\n")    

for event in data:
    try: 
        container.upsert_item(body=event)
    except errors.CosmosHttpResponseError as e:
        raise
        
## Run a query against the container to see number of documents
query = 'SELECT VALUE COUNT(1) FROM c'
result = list(container.query_items(query, enable_cross_partition_query=True))

print('Container with id \'{0}\' contains \'{1}\' items'.format(container.id, result[0]))

Importing data. This will take a few minutes...

Container with id 'WebsiteData' contains '2654' items


### Getting our data into a DataFrame

We'll use the syntax:

```%%sql --database {database_id} --container {container_id} --output outputDataframeVar
{Query text}```

We'll run the query ```SELECT c.Action, c.Price as ItemRevenue, c.Country, c.Item FROM c```. The results will be saved into a Pandas dataframe named ```df_cosmos```.



In [23]:
%%sql --database RetailDemo --container WebsiteData --output df_cosmos
SELECT c.Action, c.Price as ItemRevenue, c.Country, c.Item FROM c

In [24]:
# See a sample of the result
df_cosmos.head(10)

Unnamed: 0,Action,ItemRevenue,Country,Item
0,Viewed,9.0,Tunisia,Black Tee
1,Viewed,19.99,Antigua and Barbuda,Flannel Shirt
2,Added,3.75,Guinea-Bissau,Socks
3,Viewed,3.75,Guinea-Bissau,Socks
4,Viewed,55.0,Czech Republic,Rainjacket
5,Viewed,350.0,Iceland,Cosmos T-shirt
6,Added,19.99,Syrian Arab Republic,Button-Up Shirt
7,Viewed,19.99,Syrian Arab Republic,Button-Up Shirt
8,Viewed,33.0,Tuvalu,Red Top
9,Viewed,14.0,Cape Verde,Flip Flop Shoes


### Analyzing our data
We'll run a simple group by on the dataframe to sum the total sales revenue for each country and display a sample of the results.

#### Sum revenue by country

In [None]:
df_revenue = df_cosmos.groupby("Country").sum().reset_index()

display(df_revenue.head(5))

#### Analyze top 5 popular purchased items

In [None]:
import pandas as pd

## What are the top 5 purchased items?
pd.DataFrame(df_cosmos[df_cosmos['Action']=='Purchased'].groupby('Item').size().sort_values(ascending=False).head(5), columns=['Count'])

## Visualization #1: Sales revenue by country on a world map

Now that we have our data on revenue from our Cosmos container, we'll visualize it using bokeh. Credit to https://towardsdatascience.com/a-complete-guide-to-an-interactive-geographical-map-using-python-f4c5197e23e0 for inspiration.

In [None]:
import sys
!{sys.executable} -m pip install bokeh --user

### Prepare our data to be plotted

In [25]:
import urllib.request, json 
import geopandas as gpd

# Load country information for mapping
countries = gpd.read_file("https://cosmosnotebooksdata.blob.core.windows.net/notebookdata/countries.json")

# Merge the countries dataframe with our data in Azure Cosmos DB, joining on country code
df_merged = countries.merge(df_revenue, left_on = 'admin', right_on = 'Country', how='left')

# Convert to GeoJSON so bokeh can plot it
merged_json = json.loads(df_merged.to_json())
json_data = json.dumps(merged_json)

### Plot the sales revenue on a world map
This may take a few seconds...

In [None]:
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer

#Input GeoJSON source that contains features for plotting.
geosource = GeoJSONDataSource(geojson = json_data)

#Choose our choropleth color palette: https://bokeh.pydata.org/en/latest/docs/reference/palettes.html
palette = brewer['YlGn'][8]

#Reverse color order so that dark green is highest revenue
palette = palette[::-1]

#Instantiate LinearColorMapper that linearly maps numbers in a range, into a sequence of colors.
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 1000)

#Define custom tick labels for color bar.
tick_labels = {'0': '$0', '250': '$250', '500':'$500', '750':'$750', '1000':'$1000', '1250':'$1250', '1500':'$1500','1750':'$1750', '2000': '>$2000'}

#Create color bar. 
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
border_line_color=None,location = (0,0), orientation = 'horizontal', major_label_overrides = tick_labels)

#Create figure object.
p = figure(title = 'Sales revenue by country', plot_height = 600 , plot_width = 1150, toolbar_location = None)
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None

#Add patch renderer to figure. 
p.patches('xs','ys', source = geosource,fill_color = {'field' :'ItemRevenue', 'transform' : color_mapper},
          line_color = 'black', line_width = 0.25, fill_alpha = 1)

#Specify figure layout.
p.add_layout(color_bar, 'below')

#Display figure inline in Jupyter Notebook.
output_notebook()

#Display figure.
show(p)

## Visualization #2: Conversion rate of Viewed -> Added to cart -> Purchased by item

In our WebsiteData container, we have a record of users who viewed an item, added to their cart, and purchased the item. We can visualize the conversion rate for each item. Credit to: https://bokeh.pydata.org/en/latest/docs/user_guide/categorical.html for inspiration.

### Plot our data

In [None]:
from bokeh.io import show, output_notebook

from bokeh.plotting import figure
from bokeh.palettes import Spectral3
from bokeh.transform import factor_cmap
from bokeh.models import ColumnDataSource, FactorRange


# Get the top 10 items as an array
top_10_items = df_cosmos[df_cosmos['Action']=='Purchased'].groupby('Item').size().sort_values(ascending=False)[:10].index.values.tolist()

# Filter our data to only these 10 items
df_top10 = df_cosmos[df_cosmos['Item'].isin(top_10_items)]

# Group by Item and Action, sorting by event count
df_top10_sorted = df_top10.groupby(['Item', 'Action']).count().rename(columns={'Country':'ResultCount'}, inplace=False).reset_index().sort_values(['Item', 'ResultCount'], ascending = False).set_index(['Item', 'Action'])

# Get sorted X-axis values - this way, we can display the funnel of view -> add -> purchase
x_axis_values = df_top10_sorted.index.values.tolist()

group = df_top10_sorted.groupby(['Item', 'Action'])

# Specifiy colors for X axis
index_cmap = factor_cmap('Item_Action', palette=Spectral3, factors=sorted(df_top10.Action.unique()), start=1, end=2)

# Create the plot

p = figure(plot_width=1200, plot_height=500, title="Conversion rate of items from View -> Add to cart -> Purchase", x_range=FactorRange(*x_axis_values), toolbar_location=None, tooltips=[("Number of events", "@ResultCount_max"), ("Item, Action", "@Item_Action")])

p.vbar(x='Item_Action', top='ItemRevenue_max', width=1, source=group,
       line_color="white", fill_color=index_cmap, )

#Configure how the plot looks
p.y_range.start = 0
p.x_range.range_padding = 0.05
p.xgrid.grid_line_color = None
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = "black"
p.xaxis.axis_label = "Item"
p.yaxis.axis_label = "Count"

#Display figure inline in Jupyter Notebook.
output_notebook()

#Display figure.
show(p)
