# Badge 12: Drawing Simple Graphs - preparing data and feeding them into a graph.

### We're using 'Plotly' an interactive graphs library, but there are many other really good graph libraries out there!

This notebook is very advanced, and introduces a lot of advanced concepts, so there are a few ways to interact with it:

- **To only learn how to make some basic diagrams** - read the first part of this Notebook all the way to the 3 stars and a wish, and the minitask.
- **To just see what Python can do with graphs** - run all the cells in this notebook (even without looking at the code) and see what wonderful and complex graphs can be produced.
- **To learn how to create these complex graphs**- try to understand all the code in this Notebook. 

This tutorial is a simplified version of and introduction to 'plot.ly/python/'.


**We will cover:**

- Creating graphs and combining them into more complex inphographics.
- Editing/Updating Graphs.
- Types of graphs - just briefly, you'll go to documentation by yourself to find out more.

### Some new concepts: 

**CLASSES** are similar to other complicated data types, like LISTS or DICTIONARIES etc. except we import CLASSES from libraries, or create them yourself. You usually use a class to ...

**CREATE AN OBJECT OF A CLASS** : By which you create a new item/instance of this class, like how we created a new list or dictionary with `[ 1,2,3]` or `{ 'name':'Pim' }`. Once you have an object you can interact with it - just as we can change a **List** object with `.pop()` or `.sort()`. You create graph objects by using constructors, like ```fig = Figure( some_attributes )``` or you can use a factory method that will create an object and return it to you ```fig = px.scatter( ... )```.

**OBJECT** - is your primary item you will work with when creating graphs.

Once you have created an object you can change perform methods on it like...

```
fig.show()   
fig.add_bar( ... )   
fig.update_yaxes( ... )
``` 

You can also change its attributes like...

```
fig.data[1].marker.line.color = "yellow"
```

**IMPORTING LIBRARIES** - Is a process via which we give our Notebook the ability to use CLASSES and code written by someone else. We specify a library we want and its short name: 

```
import library_you_want as short_name
```

And the ones we will use the most are: 

```
import pprint as pp

import plotly.graph_objects as go        
import plotly.express as px      
```

**IMPORTING INDIVIDUAL METHODS/CLASSES FROM A LIBRARY** - When we need omly a small part of a library, we have the option to specify it and give it a short handy name: 

```
from library_you_want import function_or_class_you_want
```

Which will look like this: 

```
from plotly.subplots import make_subplots
```

### Installing plotly on your Notebook:

First let's import plotly from Python Index of Packages. 

Note: ```pip install some_library``` is a command that will install any pip library you request from the Python Package Index. 

Python Package Index is like a phone book of all available Python libraries, it contains hundreds of thousands of libraries for anything you can imagine: Data, Graphs, Maps, AI, Internet... You can go and browse them yourself at https://pypi.org/.

For now we will only grab Plotly from there with the below line of code. Because every time you start a Noteable Notebook/session you are effectively startig a new instance of a server, you will need to run this code every time you open the new session. 

In [None]:
# You need to do this ONCE EVERY TIME YOU START A NOTEBOOK. 
# It will take about a MINUTE, and when done will print something like "Successfully installed plotly".
#!pip install plotly --upgrade
# depending on your computer you might see "command not found: pip", - then change !pip to !pip3
# if you're using a local anaconda installation (rather than a virtual computer) you might need to install plotly
!pip install plotly --upgrade


In [None]:
# If at some spoint your graphs stop showing - restart your Kernel, from the top menu. 

### Your first plot:

# Simple bar chart. Create an object of class Figure with its constructor.
# Constructor takes a number of possible arguments eg. data, layout_title_text.

# You can create an object and call on it a method '.show()':
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default='notebook'
import pprint as pp

In [None]:
# Your first graph :D
# have a close look at this code, and try to fugure out what is each part for
# can you change things, eg. the data and colour?

fig = go.Figure(
    data=[go.Bar(y=[5, 2, 4], 
                 x=["bananas", "kiwis", "plums"], 
                 marker_color='blue')],
    layout=go.Layout(
        title=go.layout.Title(text="How much I like these fruits")
    )
)
fig.show('notebook')

In [None]:
# Here's the same code, but cleaner:

values = [5, 2, 4]
categories = ["bananas", "kiwis", "plums"]
color = 'blue'

fig = go.Figure(
    data=[go.Bar(y=values, 
                 x=categories, 
                 marker_color=color)],
    layout=go.Layout(
        title=go.layout.Title(text="How much I like these fruits")
    )
)
fig.show('notebook')

In [None]:
# Or with colours and variables:

import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default='notebook'

colors = ['yellow', 'green', 'purple']
names = ["banana", "kiwi", "plum"]
values = [5, 2, 4]

fig = go.Figure(
    data=[go.Bar(y=values, 
                 x=names, 
                 marker_color = colors)],
    layout=go.Layout(
        title=go.layout.Title(text="How much I like these fruits")
    )
)
fig.show('notebook')

In [None]:
# And from data:

import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default='notebook'

colors = ['yellow', 'green', 'purple']
fruit_types = ["banana", "kiwi", "plum"]
fruits_I_ate = ["banana", "kiwi","banana", "kiwi","banana", "kiwi", "kiwi","kiwi","plum","banana"]

values = [
    fruits_I_ate.count(fruit)
    for fruit in fruit_types
]


fig = go.Figure(
    data=[go.Bar(y=values, 
                 x=names, 
                 marker_color = colors)],
    layout=go.Layout(
        title=go.layout.Title(text="Which fruits I eat the most")
    )
)
fig.show('notebook')

In [None]:
# And an even fancier example:

import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default='notebook'

# Raw data:

fruit_colors = {"banana":"yellow", 
                "kiwi": "green", 
                "plum": "purple"}

fruits_I_ate = ["banana", "kiwi","banana", "kiwi","banana", "kiwi",
                "kiwi","kiwi","plum","banana", "peach","peach", "quince"]

# Prepare data:
    
def create_and_show_fruit_graph(fruits, colors_dict):
    fruit_types = list(set(fruits))
    fruit_types.sort() # Alphabetically.
    fruit_counts = [
        fruits.count(fruit)
        for fruit in fruit_types
    ]
    fruit_colors =  [
        colors_dict.get(fruit, 'grey') # If colour is unknown, use grey.
        for fruit in fruit_types
    ]

    fig = go.Figure(
        data=[go.Bar(y=fruit_counts, 
                     x=fruit_types, 
                     marker_color = fruit_colors)],
        layout=go.Layout(
            title=go.layout.Title(text="Which fruits I eat the most")
        )
    )
    fig.show('notebook')
    
create_and_show_fruit_graph(fruits_I_ate, fruit_colors )

You can find more inspiration here: https://plotly.com/python/bar-charts/

## Create and Combine Graphs.

### Overlapping and neighbouring plots: 

In [None]:
from plotly.subplots import make_subplots

# 'Make_subplots' takes rows and columns for how many subplots you will want to create.
# E.g. To create a 3 by 3 grid of 9 plots, you would say rows=3, cols=3:

fig = make_subplots(rows=1, cols=4)

# And then we add graphs to that figure with 'add_trace'.
# It takes 3 arguments - the graph to display and it's position as row and col:

fig.add_trace(go.Scatter(y=[4, 2, 1], mode="lines"),
              row=1, 
              col=1)
fig.add_trace(go.Scatter(y=[10, 12, 15]), 
              row=1, 
              col=2)
fig.add_trace(go.Bar(y=[2, 1, 3]), row=1, col=3)
fig.add_trace(go.Bar(y=[6, 5, 3]), row=1, col=4)

fig.show('notebook')

Q: But why would I need to specify position of each 'trace' (graph/chart/drawing) separately?

A: Because you can overlay the graphs on top of each other if you use it skillfully:

In [None]:
# Here's the same data, presented in 2 different ways. Spot the difference in code:

from plotly.subplots import make_subplots
fig = make_subplots(rows=2, cols=2)
fig.add_trace(go.Scatter(y=[4, 2, 1], mode="lines"),row=1, col=1)
fig.add_trace(go.Scatter(y=[10, 12, 15]),           row=1, col=2)
fig.add_trace(go.Bar(y=[2, 1, 3]),                  row=2, col=1)
fig.add_trace(go.Bar(y=[6, 5, 3]),                  row=2, col=2)

fig.show('notebook')

In [None]:
# Here's the same data, presented in 2 different ways. Spot the difference in code:

from plotly.subplots import make_subplots
fig = make_subplots(rows=1, cols=2)
fig.add_trace(go.Scatter(y=[4, 2, 1], mode="lines"),row=1, col=1)
fig.add_trace(go.Scatter(y=[10, 12, 15]),           row=1, col=2)
fig.add_trace(go.Bar(y=[2, 1, 3]),                  row=1, col=1)
fig.add_trace(go.Bar(y=[6, 5, 3]),                  row=1, col=2)

fig.show('notebook')

In [None]:
# You can customise almost anything! See here https://plotly.com/python/line-and-scatter/.

hospital_admissions = [150,210,240,180,160,130]
years = [1990,1995,2000,2005,2010,2015,2020]
costs = [50,230,240,180,130,80]

# Let's start with hardcoded data. In a minute you will see how to make it cleaner. 
# E.g. How to automatically create colour from a number.

colours = ['Green','Orange','Red','Orange','Green','Green']

bubble_text = [f"cost was {cost}"
              for cost in costs]

fig = make_subplots(rows=1, cols=1)
fig.add_trace(go.Scatter(y=hospital_admissions, 
                         x=years, 
                         marker_color = colours, 
                         marker_size=costs,
                         text=bubble_text,
                         mode='markers' # Can be 'lines','markers' or 'lines+markers'.
                        ), row=1, col=1)

fig.update_layout(
    title="Cost of hospital admissions in relation to number of admissions, over years",
    xaxis_title="Year",
    yaxis_title="Hospital Admissions",
)
fig.show('notebook')

## Final part: From real data into a graph.

This is a solved example of how a datasrt can be used. The steps were:

1. look at the data. What could I visualise here? Population density and city area sounds interesting!
2. Create some functions that will destill the city information to only info I want
3. Transform that data into visualisation specific things, eg. colours representing some values
4. Put all that prepared info into a graph

In [None]:
# Here's an example with real data:

cities = [
 {'Geo': {'AreaKm2': 267.77,
          'Coordinates': {'Latitude': 52.48, 'Longitude': -1.9}},
  'Name': 'Birmingham',
  'Population': 1137100,
  'Year_Founded': 601},
 {'Geo': {'AreaKm2': 551.7,
          'Coordinates': {'Latitude': 53.8, 'Longitude': -1.55}},
  'Name': 'Leeds',
  'Population': 789194,
  'Year_Founded': 1893},
 {'Geo': {'AreaKm2': 142.06,
          'Coordinates': {'Latitude': 53.38, 'Longitude': -1.47}},
  'Name': 'Sheffield',
  'Population': 577800,
  'Year_Founded': 701},
 {'Geo': {'AreaKm2': 110,
          'Coordinates': {'Latitude': 51.45, 'Longitude': -2.58}},
  'Name': 'Bristol',
  'Population': 567111,
  'Year_Founded': 1155},
 {'Geo': {'AreaKm2': 115.6,
          'Coordinates': {'Latitude': 53.47, 'Longitude': -2.23}},
  'Name': 'Manchester',
  'Population': 547627,
  'Year_Founded': 1301},
 {'Geo': {'AreaKm2': 111.84,
          'Coordinates': {'Latitude': 53.41, 'Longitude': -2.99}},
  'Name': 'Liverpool',
  'Population': 513441,
  'Year_Founded': 1207},
 {'Geo': {'AreaKm2': 259,
          'Coordinates': {'Latitude': 55.95, 'Longitude': -3.2}},
  'Name': 'Edinburgh',
  'Population': 488050,
  'Year_Founded': 601}]



In [None]:
# Let's prepare the data first. Our goal is to have all we need in variables first:


def get_density(city):
    return round(city['Population']/city['Geo']['AreaKm2'])


print(get_density(cities[0]) )
assert get_density(cities[0]) == 4247 
# because looking at Birmingham population (1137100) divided by area (267.77) is 4246.55

In [None]:
def get_description(city):
    description = f"{city['Name']} has {city['Population']} people "
    description += f"living in area of {city['Geo']['AreaKm2']}km. "
    description += f"Density is {get_density(city)}"
    return description
    # Notice the usage of '=' and '+='.
    # These will appear when you hover over a bubble on the graph.
    
print(get_description(cities[0]))
assert get_description(cities[0]) == 'Birmingham has 1137100 people living in area of 267.77km. Density is 4247'

In [None]:
def density_into_colour(city):
    density = get_density(city)
    if density < 4000:
        return "Green"
    elif density < 5000:
        return "Orange"
    else:
        return "Red"
    
print(density_into_colour(cities[0]))
assert density_into_colour(cities[0]) =='Orange'

In [None]:
import plotly.graph_objects as go

x_axis = [int(city['Population']) 
          for city in cities]
y_axis = [int(city['Geo']['AreaKm2']) 
          for city in cities]

print(x_axis)
print(y_axis)

In [None]:
scale_multiplier = 0.01 # We can divide/multiply numbers to make graphs readable. 
# Once you've seen the graph, try changing the above to eg. 0.001 or 0.1. What changed?

scaled_densities = [int(get_density(city) * scale_multiplier  )
         for city in cities] 
print(scaled_densities)

In [None]:
colours = [density_into_colour(city) 
           for city in cities]
texts = [ get_description(city)
         for city in cities]


print(colours)
print(texts)

In [None]:
# You can customise almost anything! See here https://plotly.com/python/line-and-scatter/.

fig = go.Figure(
        data=[go.Scatter(y=y_axis, 
                         x=x_axis, 
                         marker_color = colours, 
                         marker_size=scaled_densities,
                         text=texts,
                         mode='markers' # Can be 'lines','markers' or 'lines+markers'.
                        )],
        layout=go.Layout(
            title=go.layout.Title(text="You can set title here, but you can also change it with update")
        )
    )
fig.update_layout(
    title="Density of population in British Cities",
    xaxis_title="City Population",
    yaxis_title="Area in Km2",
    font=dict(
        family="Courier New, monospace",
        size=20,
        color="RebeccaPurple"
    )
)
fig.show('notebook')

In [None]:
# Example of a very different graph type: a map

import plotly.express as px
# gapminder data from 2007
countries = [{'name':"UK", "iso":"GBR", "lifeExp":79.425},
             {'name':"France", "iso":"FRA", "lifeExp":80.657},
             {'name':"Germany", "iso":"DEU", "lifeExp":79.406},
             {'name':"Portugal", "iso":"PRT", "lifeExp":78.098},
             {'name':"Spain", "iso":"ESP", "lifeExp":80.941},
            {'name':"Poland", "iso":"POL", "lifeExp":75.563}]

# first let's do it the old way: change data into lists of just what we need:
names = [country['name'] for country in countries]
isos = [country['iso'] for country in countries]
lifeExps = [country['lifeExp'] for country in countries]

fig = px.choropleth(locations=isos, 
                    color=lifeExps, 
                    hover_name=lifeExps,
                    scope = "europe",
                    range_color=[75,81])
fig.show('notebook')

In [None]:
# and now for the simpler, easier way: 
# feed whole data into the graph, and tell it which key/column should be used for what

countries = [{'name':"UK", "iso":"GBR", "lifeExp":79.425},
             {'name':"France", "iso":"FRA", "lifeExp":80.657},
             {'name':"Germany", "iso":"DEU", "lifeExp":79.406},
             {'name':"Portugal", "iso":"PRT", "lifeExp":78.098},
             {'name':"Spain", "iso":"ESP", "lifeExp":80.941},
            {'name':"Poland", "iso":"POL", "lifeExp":75.563}]

fig = px.choropleth(countries,
                    locations='iso', 
                    color='lifeExp', 
                    hover_name='name',
                    scope = "europe",
                    range_color=[75,81])
fig.show('notebook')

# For completeness: here's another plotting library - 'matplotlib.pyplot'.

### More examples of matplotlib can be found here:

https://matplotlib.org/stable/gallery/index.html

Matplotlib is very simple. Which means is creates graphs quickly and does not take a lot of memory (unlike plotly). This means it is often very useful for things like online dashboards.

Matplotlib is different from plotly (in which we were operating on fig objects) in that in matplotlib there is an 'implied' graph. We keep adding stuff to that default graph using functions stored in library (`plt.`). See below

In [None]:
# the simplest possible graph example

import matplotlib.pyplot as plt

names = ['Kim','Natasha','Jules']
scores = [25, 34, 32]
plt.bar(names, scores)

plt.show('notebook') # this plt refers to the library, and not really to an individual plot. 

### Stacked Bar Graph.

This is an example of creating a stacked bar plot with error bars using ```~matplotlib.pyplot.bar```. To create stacked graphs, we create the bottom graph first, then add other graphs to it with a gap underneath them (see 'bottom' attribute).

In [None]:
#  Note, the parameters *yerr* used for error bars, 
# and *bottom* to stack the one set of bars on top of the the other bars.

import numpy as np
import matplotlib.pyplot as plt


catMeans = (20, 35, 30, 35, 27)
dogMeans = (25, 32, 34, 20, 25)
number_of_items = len(catMeans)
indexes = list(range(number_of_items)) # basically the [0,1,2,3,4]

plotCats = plt.bar(indexes, catMeans)
plotDogs = plt.bar(indexes, dogMeans,  bottom=catMeans)
# here we can (but don't have to) capture sub-graphs into variables
# you'll see that used soon

plt.show('notebook')

In [None]:
# with more annotations and fancy stuff:

import matplotlib.pyplot as plt

catMeans = (20, 35, 30, 35, 27)
dogMeans = (25, 32, 34, 20, 25)
# yes, this is a very 'made up' example :D here are nonsense whiskers :D
catStd = (2, 3, 4, 1, 2) 
dogStd = (3, 5, 2, 3, 3)
number_of_items = len(catMeans)
indexes = list(range(number_of_items)) # basically the [0,1,2,3,4]
width = 0.35       # The width of the bars

plotCats = plt.bar(indexes, catMeans, width, yerr=catStd)
plotDogs = plt.bar(indexes, dogMeans, width, yerr=dogStd,  bottom=catMeans)

plt.ylabel('Scores')
plt.title('Scores by group and animal')
plt.xticks(indexes, ('G1', 'G2', 'G3', 'G4', 'G5'))
plt.yticks(np.arange(0, max(catMeans)+ max(dogMeans), 10))
plt.legend((plotCats[0], plotDogs[0]), ('Cats', 'Dogs'))
# give subplots legend entries

plt.show('notebook')

## ⭐️⭐️⭐️💥 What you have learned in this session: Three stars and a wish.
**In your own words** write in your Learn diary:

- 3 things you would like to remember from this badge.
- 1 thing you wish to understand better in the future or a question you'd like to ask.


# ⛏ Minitask 1: Recreate a simple graph from a newspaper:

Use anything you saw in this badge to recreate a graph you have seen recently in a paper. 

You can choose something easy by just doing an google image search for words "graph example". This will show you a number of simple graphs, which might still be quite challanging to recreate in Plotly!. Here is one I found for you: https://img.chandoo.org/c/chart-with-lines-instead-of-columns.png


If you'd like to try something much harder, look one of these outlets:

Guardian: https://www.theguardian.com/data
Economist: https://www.economist.com/graphic-detail/

Keep in mind that you do not need to recreate the graph exactly, just try to mimic it a little. You do not need to have real data - just try to guess them, and eg. create a list with numbers.

# This concludes the basic part of the Notebook.

### Next parts show many more advanced types of graphs, and also using build-in training datasets. You really do not need to understand any of it. Just have a look at what's possible.

## ⭐️⭐️ TOP TIP ⭐️⭐️: If you have a minute, simply run the rest of this Notebook (Shift + Enter) and look at how interesting (and easy to make) some graphs are.



## Examples of different graph types.

### Advanced features of a 'Scatter Plot':

In [None]:
import plotly.express as px
iris = px.data.iris()
fig = px.scatter(iris, x="sepal_width", y="sepal_length", color="species", marginal_y="violin",
           marginal_x="box", trendline="ols")
fig.show('notebook')


In [None]:
import plotly.express as px
iris = px.data.iris()
iris["e"] = iris["sepal_width"]/100
fig = px.scatter(iris, x="sepal_width", y="sepal_length", color="species", error_x="e", error_y="e")
fig.show('notebook')


In [None]:
# For easier eyeballing of your data, you can use scatter matrix:

import plotly.express as px
iris = px.data.iris()
fig = px.scatter_matrix(iris, dimensions=["sepal_width", "sepal_length", "petal_width", "petal_length"], 
                        color="species")
fig.show('notebook')


In [None]:
# Here we're using a different data set: bills and tips in a restaurant - 'px.data.tips()':

import plotly.express as px
tips = px.data.tips()
fig = px.scatter(tips, x="total_bill", y="tip", facet_row="time", facet_col="day", color="smoker",
                 trendline="ols",
          category_orders={"day": ["Thur", "Fri", "Sat", "Sun"], "time": ["Lunch", "Dinner"]})
fig.show('notebook')


In [None]:
# Parallel coordinates:

import plotly.express as px
iris = px.data.iris()
fig = px.parallel_coordinates(iris, color="species_id", labels={"species_id": "Species",
                  "sepal_width": "Sepal Width", "sepal_length": "Sepal Length",
                  "petal_width": "Petal Width", "petal_length": "Petal Length", },
                    color_continuous_scale=px.colors.diverging.Tealrose, color_continuous_midpoint=2)
fig.show('notebook')


In [None]:
tips = px.data.tips()
fig = px.parallel_categories(tips, color="size", color_continuous_scale=px.colors.sequential.Inferno)
fig.show()

import plotly.express as px
tips = px.data.tips()
fig = px.scatter(tips, x="total_bill", y="tip", color="size", facet_col="sex",
           color_continuous_scale=px.colors.sequential.Viridis, render_mode="webgl")
fig.show('notebook')


In [None]:
import plotly.express as px
gapminder = px.data.gapminder()
fig = px.scatter(gapminder.query("year==2007"), x="gdpPercap", y="lifeExp", size="pop", color="continent",
           hover_name="country", log_x=True, size_max=60)
fig.show('notebook')


### Interactive Graphs.

In [None]:
# You can animate it yourself, or press play:

import plotly.express as px
gapminder = px.data.gapminder()
fig = px.scatter(gapminder, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
           size="pop", color="continent", hover_name="country", facet_col="continent",
           log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
fig.show('notebook')


In [None]:
import plotly.express as px
gapminder = px.data.gapminder()
fig = px.scatter(gapminder, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
           size="pop", color="continent", hover_name="country", facet_col="continent",
           log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
fig.show('notebook')


In [None]:
import plotly.express as px
election = px.data.election()
fig = px.scatter_3d(election, x="Joly", y="Coderre", z="Bergeron", color="winner", size="total", 
                    hover_name="district",
                  symbol="result", color_discrete_map = {"Joly": "blue", "Bergeron": "green", "Coderre":"red"})
fig.show('notebook')

# You can interact with it with your mouse!!!

In [None]:
import plotly.express as px
wind = px.data.wind()
fig = px.bar_polar(wind, r="frequency", theta="direction", color="strength", template="plotly_dark",
            color_discrete_sequence= px.colors.sequential.Plasma[-2::-1])
fig.show('notebook')


In [None]:
import plotly.express as px
gapminder = px.data.gapminder()
fig = px.line_geo(gapminder.query("year==2007"), locations="iso_alpha", color="continent", 
                  projection="orthographic")
fig.show()

In [None]:
import plotly.express as px
gapminder = px.data.gapminder()
fig = px.choropleth(gapminder, locations="iso_alpha", color="lifeExp", hover_name="country", 
                    animation_frame="year", range_color=[20,80])
fig.show('notebook')
