[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/databyjp/AcademyXI_DA/blob/main/notebooks/AcademyXi_DA_Module_9_dataviz.ipynb)

## AcademyXi Data Analysis - Explanatory Data Visualisation
### Workshop - Data storytelling with Python
In this workshop module, we will use Python and Plotly to build explanatory data visualisations.  

We've already seen how the Plotly library can be used to quickly build data visualisations in Python. Since those were primarily constructed for exploratory purposes to better understand and data, here we will focus on techniques for building explanatory data visualisations.

### Preparation

Let's prepare our notebook by installing required packages and loading the data.

In [None]:
# Install additional libraries required (fsspec and s3fs) to load files through AWS S3
%%capture tmp
!pip install fsspec s3fs

# Import libraries to be used
import plotly.express as px
import numpy as np
import pandas as pd

In [None]:
fname = "wk9_housing_data_pivoted.csv"
# Load data from S3
df = pd.read_csv(f"s3://databyjp/academyxi/{fname}")

In [None]:
df.head()

### Initial figure
Let's quickly build a bubble chart with our data.

In [None]:
fig = px.scatter(df, x="Average Distance", y="Average Price", size="Count of Properties", color="Region")
fig.show()

This provides us with a very basic graph similar to built using other tools. (Note how we managed to build & show this with just two lines of code!)

But as you know, it doesn't do a lot to guide the user; so let's start to make a few changes.

We will lean on Plotly's API to further modify these charts. It's perfectly normal to not know or remember the syntax even if you have made the same changes before. 

So as always, make a habit of consulting the excellent [documentation for Plotly](https://plotly.com/python/) or performing a web search for anything that you can't remember or isn't working as expected.


---

To begin with, let's add a title, and change the layout to something a little more elegant.

In [None]:
fig = px.scatter(df, x="Average Distance", y="Average Price", size="Count of Properties", color="Region",
                 title="Melbourne Housing Prices by Suburb", template="plotly_white")
# Note: Plotly's 'template' feature allows quick changes to the styling - see https://plotly.com/python/templates/
fig.show()

The size of this figure is dependent on our window size, so let's specify the chart dimensions instead. 

In [None]:
fig = px.scatter(df, x="Average Distance", y="Average Price", size="Count of Properties", color="Region",
                 title="Melbourne Housing Prices by Suburb", template="plotly_white",
                 width=900, height=600)
fig.show()

And let's change our colour palette so that the Northern and Southern Metropolitan areas stand out more. 

Read more on discrete colours: https://plotly.com/python/discrete-color/

In [None]:
# Build a Python dictionary where all categories are assigned a light grey colour except for our key values
color_discrete_map = {k: "lightgrey" for k in df["Region"].unique()}
color_discrete_map["Northern Metropolitan"] = "DodgerBlue"  # Colours chosen from https://en.wikipedia.org/wiki/Web_colors
color_discrete_map["Southern Metropolitan"] = "DarkOrange"

fig = px.scatter(df, x="Average Distance", y="Average Price", size="Count of Properties", color="Region",
                 title="Melbourne Housing Prices by Suburb", template="plotly_white",
                 color_discrete_map = color_discrete_map,
                 width=900, height=600)
fig.show()

We'll move the legend to the vacant top right corner; and also add Suburb name to the mousehover (hover) popup. (Compare the difference between the figure below and above!.)

More on hover text: https://plotly.com/python/hover-text-and-formatting/

In [None]:
fig = px.scatter(df, x="Average Distance", y="Average Price", size="Count of Properties", color="Region",
                 title="Melbourne Housing Prices by Suburb", template="plotly_white",
                 color_discrete_map = color_discrete_map,
                 hover_data=["Suburb"],
                 width=900, height=600)
fig.update_layout(legend=dict(
    yanchor="top", y=0.98,
    xanchor="right", x=0.98,
    bordercolor="lightgray", borderwidth=1,
    font=dict(
        size=11,
        color="black"
    ),    
))
fig.show()

And we will also add a few annotations. Plotly's annotations are by default based on locations with reference to the x & y data.

More on annotations: https://plotly.com/python/text-and-annotations/

In [None]:
fig = px.scatter(df, x="Average Distance", y="Average Price", size="Count of Properties", color="Region",
                 title="Melbourne Housing Prices by Suburb", template="plotly_white",
                 color_discrete_map = color_discrete_map,
                 hover_data=["Suburb"],
                 width=900, height=600)
fig.update_layout(legend=dict(
    yanchor="top", y=0.98,
    xanchor="right", x=0.98,
    bordercolor="lightgray", borderwidth=1,
    font=dict(
        size=11,
        color="black"
    ),    
))

# This is a little tricky if you're new - but this code iterates through the list of suburb names, 
# finds the X/Y coordinate data and then adds an annotation for each.
for suburb_name in ["Brighton", "Melbourne", "Beaumaris"]:
  fdf = df[df["Suburb"] == suburb_name]
  fig.add_annotation(x=fdf["Average Distance"].values[0], y=(fdf["Average Price"].values[0]),
              text=suburb_name,
              showarrow=True,
              arrowhead=1)

fig.show()

And update the displayed X & Y axes to clarify units.

More on axes: https://plotly.com/python/axes/

In [None]:
fig = px.scatter(df, x="Average Distance", y="Average Price", size="Count of Properties", color="Region",
                 title="Melbourne Housing Prices by Suburb", template="plotly_white",
                 color_discrete_map = color_discrete_map,
                 hover_data=["Suburb"],
                 labels={"Average Price": "Average Price (AUD)"},  # Note this change to y-axis label
                 width=900, height=600)
fig.update_layout(legend=dict(
    yanchor="top", y=0.98,
    xanchor="right", x=0.98,
    bordercolor="lightgray", borderwidth=1,
    font=dict(
        size=11,
        color="black"
    ),    
))

for suburb_name in ["Brighton", "Melbourne", "Beaumaris"]:
  fdf = df[df["Suburb"] == suburb_name]
  fig.add_annotation(x=fdf["Average Distance"].values[0], y=(fdf["Average Price"].values[0]),
              text=suburb_name,
              showarrow=True,
              arrowhead=1)

fig.update_yaxes(tickprefix="AU$")  # And changing display values
fig.update_xaxes(ticksuffix=" km")

fig.show()

And finally, we'll add a text note. 

There isn't a built-in way to do this, so what we will do is to increase the space (margin) between the chart title and the chart area, and then add an annotation.

In [None]:
fig = px.scatter(df, x="Average Distance", y="Average Price", size="Count of Properties", color="Region",
                 title="Melbourne Housing Prices by Suburb", template="plotly_white",
                 color_discrete_map = color_discrete_map,
                 hover_data=["Suburb"],
                 labels={"Average Price": "Average Price (AUD)"},  
                 width=900, height=600)
fig.update_layout(legend=dict(
    yanchor="top", y=0.98,
    xanchor="right", x=0.98,
    bordercolor="lightgray", borderwidth=1,
    font=dict(
        size=11,
        color="black"
    ),    
))

for suburb_name in ["Brighton", "Melbourne", "Beaumaris"]:
  fdf = df[df["Suburb"] == suburb_name]
  fig.add_annotation(x=fdf["Average Distance"].values[0], y=(fdf["Average Price"].values[0]),
              text=suburb_name,
              showarrow=True,
              arrowhead=1)

fig.update_yaxes(tickprefix="AU$")  
fig.update_xaxes(ticksuffix=" km")

# Increase the overall top margin, and move the title up slightly 
fig.update_layout(
    margin=dict(t=120),
    title={'y':0.95, 'yanchor': 'top'}
)

# Add the subhead
fig.add_annotation(text="Despite being at similar distances to the CBD, there is a clear divide in average housing prices between the suburbs of the <BR>Northern Metropolitan and Southern Metropolitan areas.",
                  xref="paper", yref="paper", align="left",
                  x=-0.06, y=1.15, showarrow=False)

fig.show()

We're done! We've customised many elements of our chart to produce a good-looking explanatory data visualisation. 

The chart can be simply saved as a screen grab, embedded into a web site, or exported as a static image using one of the options shown here (https://plotly.com/python/static-image-export/).

---


You can probably see now that any number of data visualisation tools are capable of producing good outputs. 

While Plotly is often used as an exploratory visualistion tool, it is also a very capable tool for producing good-looking explanatory visualisations as you see here with a little customisation of the outputs. 

As such the choice of tool in your workflow can be a matter of preference. 

For instance, many would prefer to carry out their analysis in Python and export the resulting data to be visualised in another tool such as Tableau, DataWrapper or others, while others might stick to Python libraries for the entire pipeline.

So our advice is to choose a tool that suits you best, and focus on producing the best visualisation that communicates your message best.

Good luck!