# Introduction to data reporting with Python

Guardian Visuals

7 December, 2022

Workbook 👉 [bit.ly/gv-python](https://bit.ly/gv-python)

## Aims

1. Introducing JupyterLab/basic data visualisation using Altair
2. Workflow for private collaborative data reporting
3. Introducing kepler.gl for geospatial visualisation

## 1. Introducing JupyterLab/basic Python data visualisation using Altair

* https://youtu.be/A5YyoCKxEOU
* https://youtu.be/A5YyoCKxEOU?t=281

In [None]:
# I'm a code cell. Type Shift-Enter to execute me!

x = 42

print(x)

In [1]:
# I'm a code cell. Type Shift-Enter to execute me!

x = 42

print(x)

42


### Load example data

In [None]:
from vega_datasets import data

df_cars = data.cars()

print(data.cars.description)

df_cars

In [2]:
from vega_datasets import data

df_cars = data.cars()

print(data.cars.description)

df_cars

Acceleration, horsepower, fuel efficiency, weight, and other characteristics of different makes and models of cars. This dataset was originally published by Donoho et al (1982) [1]_, and was made public at http://lib.stat.cmu.edu/datasets/


Unnamed: 0,Name,Miles_per_Gallon,Cylinders,Displacement,Horsepower,Weight_in_lbs,Acceleration,Year,Origin
0,chevrolet chevelle malibu,18.0,8,307.0,130.0,3504,12.0,1970-01-01,USA
1,buick skylark 320,15.0,8,350.0,165.0,3693,11.5,1970-01-01,USA
2,plymouth satellite,18.0,8,318.0,150.0,3436,11.0,1970-01-01,USA
3,amc rebel sst,16.0,8,304.0,150.0,3433,12.0,1970-01-01,USA
4,ford torino,17.0,8,302.0,140.0,3449,10.5,1970-01-01,USA
...,...,...,...,...,...,...,...,...,...
401,ford mustang gl,27.0,4,140.0,86.0,2790,15.6,1982-01-01,USA
402,vw pickup,44.0,4,97.0,52.0,2130,24.6,1982-01-01,Europe
403,dodge rampage,32.0,4,135.0,84.0,2295,11.6,1982-01-01,USA
404,ford ranger,28.0,4,120.0,79.0,2625,18.6,1982-01-01,USA


### An interactive graphic in <strike>three</strike> <strike>six</strike> a few lines of code

In [None]:
import altair as alt

alt.Chart(df_cars).mark_circle().encode(
    x="Horsepower",  # X axis variable!
    y="Miles_per_Gallon",  # Y axis variable!
    color="Origin",  # Colour variable!
    tooltip=["Name", "Origin", "Horsepower", "Miles_per_Gallon"]  # Tooltips!
)

In [None]:
import altair as alt

alt.Chart(df_cars).mark_circle().encode(
    x="Horsepower",  # X axis variable!
    y="Miles_per_Gallon",  # Y axis variable!
    color="Origin",  # Colour variable!
    tooltip=["Name", "Origin", "Horsepower", "Miles_per_Gallon"]  # Tooltips!
)

In [None]:
import altair as alt

alt.Chart(df_cars).mark_circle(size=60).encode(  # Increase size of circles
    x="Horsepower",
    y="Miles_per_Gallon",
    color="Origin",
    tooltip=["Name", "Origin", "Horsepower", "Miles_per_Gallon"]
)

In [None]:
import altair as alt

alt.Chart(df_cars).mark_circle(size=60).encode(
    x="Horsepower",
    y="Miles_per_Gallon",
    color="Origin",
    tooltip=["Name", "Origin", "Horsepower", "Miles_per_Gallon"]
).interactive()  # Enable pan and zoom :o

In [None]:
import altair as alt

alt.Chart(df_cars).mark_circle(size=60).encode(
    x="Horsepower",
    y="Miles_per_Gallon",
    color="Origin",
    tooltip=["Name", "Origin", "Horsepower", "Miles_per_Gallon"],
).interactive().properties(width=600, height=400)  # Adjust dimensions

# 👉 altair-viz.github.io/gallery/index.html

Try recreating one or two of the Altair examples in your workbook. Many of them use vega_datasets for their example data.

<img src="images/02-altair-gallery.png" alt="The Altair example gallery" style="height: 400px;"/>

# 2. Workflow for private collaborative data reporting

## Create a secret gist

1. Login to github.com (github.com/login).
2. Click the plus menu in the top right corner then ‘New gist’.

<img src="images/03-gist-create-new.png" alt="Creating a new GitHub gist" style="height: 400px;"/>

3. In the text box where it says ‘Filename including extension…’, give your gist the filename `ira-users-hashed.csv`. **The filename must end in .csv for it to format correctly on GitHub.**
4. Open your local copy of `ira-users-hashed.csv` in whichever of the text editors on your machine you like (you should find that Atom, Sublime Text and Visual Studio Code are all installed). Select the entire contents of the file (Command-A) then copy them to the clipboard (Command-C).
5. Paste the contents of the file into the text box (where there's a `1`).

<img src="images/04-gist-paste-data.png" alt="Pasting CSV data into a new GitHub gist" style="height: 400px;"/>

1. Click ‘Create secret gist’. You should see some nicely formatted data!
2. Click the ‘Raw’ button to navigate to the plain text URL of your new gist and copy the URL to the clipboard.

<img src="images/05-gist-formatted.png" alt="Formatted CSV gist" style="height: 400px;"/>

# Load data using pandas

In [None]:
import pandas as pd

df_twitter = pd.read_csv("your-plain-text-gist-url")

df_twitter

In [None]:
import pandas as pd

df_twitter = pd.read_csv("http://bit.ly/ira-users-hashed")

df_twitter

# Reverse-engineer account creation date

In [None]:
df_twitter["date_created"] = df_twitter["days_active"].apply(
    lambda x: pd.to_datetime("2018-10-01") - pd.DateOffset(days=x)
)

df_twitter

In [None]:
df_twitter["date_created"] = df_twitter["days_active"].apply(
    lambda x: pd.to_datetime("2018-10-01") - pd.DateOffset(days=x)
)

df_twitter

# Plot using Altair

In [None]:
alt.Chart(df_twitter).mark_circle(size=60).encode(
    x="date_created:T",
    y="tweet_count:Q",
    color="account_language:N",
    tooltip=["date_created", "tweet_count", "account_language"]
).interactive().properties(width=600, height=400)

In [None]:
alt.Chart(df_twitter).mark_circle(size=60).encode(
    x="date_created:T",
    y="tweet_count:Q",
    color="account_language:N",
    tooltip=["date_created", "tweet_count", "account_language"]
).interactive().properties(width=600, height=400)

# 3. Introducing kepler.gl for geospatial visualisation

New notebook 👉 bit.ly/exploratory-geo

# Load data using pandas

In [None]:
import pandas as pd

df_ghosn = pd.read_csv("http://bit.ly/flight-data")

df_ghosn

In [None]:
import pandas as pd

df_ghosn = pd.read_csv("http://bit.ly/flight-data")

df_ghosn

# Create empty map

In [None]:
from keplergl import KeplerGl

map_1 = KeplerGl(height=600)
config = {
    "version": "v1",
    "config": {
        "mapState": {
            "latitude": 29.9511,
            "longitude": -90.0715,
            "zoom": 5,
        }
    }
}
map_1.config = config

map_1

<code>User Guide: <a>https://github.com/keplergl/kepler.gl/blob/master/docs/keplergl-jupyter/user-guide.md</a></code>

<img src="images/06-keplergl-empty.png" alt="An empty kepler.gl map" style="height: 400px;"/>

# Add data to map

In [None]:
map_1.add_data(data=df_ghosn, name="Ghosn transponder data")

<img src="images/07-keplergl-data.png" alt="A kepler.gl map with data" style="height: 400px;"/>

# Save map to HTML

In [None]:
map_1.save_to_html(
    data={"Ghosn transponder data": df_ghosn},
    config=config,
    file_name="keplergl_map.html",
)

<code>Map saved to kepler_map.html!</code>

1. Click on the Jupyter logo in the top left corner to see all the files in the current project (just click yes if it asks you if you're sure)
2. Click on `keplergl_map.html`
3. Et voilà 🤓

<img src="images/08-repo-files.png" alt="The Jupyter notebook file browser" style="height: 400px;"/>

## 🤓 Further reading

* [pandas documentation](https://pandas.pydata.org/docs/)
* [Python for Data Analysis, 3rd Edition](https://wesmckinney.com/book/) - Wes McKinney
* [Data Visualization with Python and JavaScript, 2nd Edition](https://www.oreilly.com/library/view/data-visualization-with/9781098111861/) - Kyran Dale