# Introduction

This tutorial shows you how to use `pandas`, `altair`, `datapane` to build a COVID dashboard. We're going to be pulling today's COVID data down from OurWorldInData and will use Datapane to create a report the COVID situation in various countries.


In [1]:
import pandas as pd
import altair as alt
import datapane as dp

dataset = pd.read_csv('https://covid.ourworldindata.org/data/owid-covid-data.csv')

## Looking at the data

Now that we have our dataset, let's see what columns we can plot.

In [2]:
dataset.columns

Index(['iso_code', 'continent', 'location', 'date', 'total_cases', 'new_cases',
       'new_cases_smoothed', 'total_deaths', 'new_deaths',
       'new_deaths_smoothed', 'total_cases_per_million',
       'new_cases_per_million', 'new_cases_smoothed_per_million',
       'total_deaths_per_million', 'new_deaths_per_million',
       'new_deaths_smoothed_per_million', 'new_tests', 'total_tests',
       'total_tests_per_thousand', 'new_tests_per_thousand',
       'new_tests_smoothed', 'new_tests_smoothed_per_thousand',
       'tests_per_case', 'positive_rate', 'tests_units', 'stringency_index',
       'population', 'population_density', 'median_age', 'aged_65_older',
       'aged_70_older', 'gdp_per_capita', 'extreme_poverty',
       'cardiovasc_death_rate', 'diabetes_prevalence', 'female_smokers',
       'male_smokers', 'handwashing_facilities', 'hospital_beds_per_thousand',
       'life_expectancy', 'human_development_index'],
      dtype='object')

OK, we've got quite a lot of information here, including some columns which aren't COVID related but could be great for comparisons.

If we want to compare a few countries, what do we have to choose from?

In [8]:
dataset['location'].unique()

array(['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola',
       'Anguilla', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba',
       'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain',
       'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin',
       'Bermuda', 'Bhutan', 'Bolivia', 'Bonaire Sint Eustatius and Saba',
       'Bosnia and Herzegovina', 'Botswana', 'Brazil',
       'British Virgin Islands', 'Brunei', 'Bulgaria', 'Burkina Faso',
       'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde',
       'Cayman Islands', 'Central African Republic', 'Chad', 'Chile',
       'China', 'Colombia', 'Comoros', 'Congo', 'Costa Rica',
       "Cote d'Ivoire", 'Croatia', 'Cuba', 'Curacao', 'Cyprus',
       'Czech Republic', 'Democratic Republic of Congo', 'Denmark',
       'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt',
       'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia',
       'Ethiopia', 'Faeroe Islands', 'Falkland Isla

Let's pick a few we're interested in and filter the dataframe.

In [9]:
# Choose your countries from above! 
countries = ['Brazil', 'New Zealand', 'Iran']
df = dataset[dataset.location.isin(countries)]
df

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,...,gdp_per_capita,extreme_poverty,cardiovasc_death_rate,diabetes_prevalence,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index
6303,BRA,South America,Brazil,2019-12-31,0.0,0.0,,0.0,0.0,,...,14103.452,3.4,177.961,8.11,10.1,17.9,,2.20,75.88,0.759
6304,BRA,South America,Brazil,2020-01-01,0.0,0.0,,0.0,0.0,,...,14103.452,3.4,177.961,8.11,10.1,17.9,,2.20,75.88,0.759
6305,BRA,South America,Brazil,2020-01-02,0.0,0.0,,0.0,0.0,,...,14103.452,3.4,177.961,8.11,10.1,17.9,,2.20,75.88,0.759
6306,BRA,South America,Brazil,2020-01-03,0.0,0.0,,0.0,0.0,,...,14103.452,3.4,177.961,8.11,10.1,17.9,,2.20,75.88,0.759
6307,BRA,South America,Brazil,2020-01-04,0.0,0.0,,0.0,0.0,,...,14103.452,3.4,177.961,8.11,10.1,17.9,,2.20,75.88,0.759
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32131,NZL,Oceania,New Zealand,2020-10-05,1499.0,1.0,3.143,25.0,0.0,0.0,...,36085.843,,128.797,8.08,14.8,17.2,,2.61,82.29,0.917
32132,NZL,Oceania,New Zealand,2020-10-06,1502.0,3.0,3.286,25.0,0.0,0.0,...,36085.843,,128.797,8.08,14.8,17.2,,2.61,82.29,0.917
32133,NZL,Oceania,New Zealand,2020-10-07,1505.0,3.0,3.571,25.0,0.0,0.0,...,36085.843,,128.797,8.08,14.8,17.2,,2.61,82.29,0.917
32134,NZL,Oceania,New Zealand,2020-10-08,1508.0,3.0,4.000,25.0,0.0,0.0,...,36085.843,,128.797,8.08,14.8,17.2,,2.61,82.29,0.917


This is now a more manageable size for plotting with Altair. There are a few columns which might be interesting for our dashboard - but let's choose `new_cases_smoothed_per_million`, `total_deaths_per_million`, and `new_deaths_smoothed_per_million`.

The great thing about Altair is that we can create a common base which we reuse across multiple plots. We're setting a bit of stroke and adjusting the opacity to make our report a bit more visually appealing. We're also choosing to make it interactive, so our users can zoom in and out. Lastly, we're telling Altair to make the width responsive; this isn't strictly neccessary, but it lets us have a responsive chart which will fit the user's screen.

Once we have our base, we're 

In [12]:
base = alt.Chart(df).encode(x='date:T', color='location').mark_area(opacity=0.5, stroke='black').interactive().properties(width='container')


Once we have our base, we only need to choose the `y` axis for each (remember, we can put the `x` axis in our base, because both use `date`. 


In [17]:
c1 = base.encode(y='new_cases_smoothed_per_million')
c2 = base.encode(y='new_deaths_smoothed_per_million')
c3 = base.encode(y='total_deaths_per_million')

Let's preview what these plots look like (because we're in Jupyter, we have to specify a width)

In [15]:
c1.properties(width=500)

# Building a report

Now we have some plots and a DataFrame, we can build our report using `datapane`. Datapane provides components which wrap around the different plots and datasets inside our notebook, such as `Table` and `Plot`. We're also using the `Blocks` component, which allows us to place components in a column or grid layout. `Table` and `Plot` both have an optional caption parameter where we can provide some further info.

Because we're in Jupyter, we can use the `preview` method on our report to see what it will look like.

In [18]:
report = dp.Report(
  dp.Table(df, caption=f'Dataset for {countries}'),
  dp.Blocks(dp.Plot(c1), dp.Plot(c2), columns=2),
  dp.Plot(c3, caption='Total deaths per million')
)

report.preview()

Great - next, the plots look a little squashed, but thankfully they are responsive, so will expand to fit into our full-size report when we publish it.

Next, let's publish our new report to Datapane

> Make sure you've logged into your Datapane API with your token before publishing


In [None]:
dp.Report(
  dp.Table(df, caption=f'Dataset for {countries}'),
  dp.Blocks(dp.Plot(c1), dp.Plot(c2), columns=2),
  dp.Plot(c3, caption='Total deaths per million')
).publish(name='covid_report', open=True)
