# Oxford COVID-19 Government Response Tracker API

The Oxford Coronavirus Government Response Tracker (OxCGRT) project calculate a Stringency Index, a composite measure of nine of the response metrics (or individual indicators).

The individual indicators capture all government measures related to a specific domain, including formally adopted laws, policies promulgated by executive or regulatory authorities, and softer guidance or advice.

They are categorized into 5 groups:
* **C** - containment and closure policies
* **E** - economic policies
* **H** - health system policies
* **M** - miscellaneous policies
* **V** - vaccination policies

To make it easier to describe government responses in aggregate, OxCGRT calculates simple indices that combine individual indicators to provide an overall measure of the intensity of government response across a family of indicators. These indices are designed to provide a simple snapshot of the number and degree of government responses in a particular domain.

OxCGRT publishes four indices that group different families of policy indicators:

* Government response index or "GRI" (all categories)
* Stringency index (containment and closure policies, sometimes referred to as lockdown policies)
* Containment and health index or "CHI" (containment and closure and health policies)
* Economic support index or "ESI" (economic support measures)

Each index is composed of a series of individual policy response indicators. 

For each indicator, the authors created a score by taking the ordinal value and subtracting half a point if the policy is targeted rather than general, if applicable.

Then they rescaled each of these by their maximum value to create a score between 0 and 100, with a missing value contributing 0. These scores are then averaged to obtain the composite indices. 
This calculation is described in equation (1) below:

$index = \frac{1}{k}\sum \limits _{j=1} ^{k} I_{j}$

where k is the number of component indicators in an index and $I_{j}$ is the sub-index score of an individual indicator.

Each subindex score (I) for any given indicator (j) on any given day (t) is calculated by the function described in equation below based on the following parameters:
* The maximum value of the indicator $N_{j}$
* Whether that indicator has a flag ($F_{j} =1 $ if the indicator has a flag variable and $F_{j} =0 $  if the indicator does not have a flag variable)
* The recorded policy value on the ordinal scale ($v_{j,t}$)
* The recorded binary flag for that indicator ($f_{j,t}$)

This normalizes the different ordinal scales to produce a subindex score between 0 and 100, where each full point on the ordinal scale is equally spaced. For indicators that do have a flag variable, if this flag is recorded as 0 (so measures are targeted) this is treated as a half-step between ordinal values.

$I_{j,t} = 100 \frac{v_{j,t} - 0,5(F_{j}-f_{j,t})}{N_{j}}$

N.B. The database only contains flag values if the indicator has a non-zero value. If a government has no policy for a given indicator (that is, the indicator equals zero), the corresponding flag is blank/null in the database. For the purpose of calculating the index, this is equivalent to a subindex score of zero. In other words, $ I_{j,t}=0 $ if $ v_{j,t}=0 $ ( if $v_{j,t}=0$, the function $F_{j}-f_{j,t}$ is also treated as 0. 

For the Stringency Index the considered metrics are:

- **C - CONTAINMENT AND CLOSURE POLICIES**

    * **C1 - School Closures**: it closings of schools and universities. This indicator uses ordinal values and ranks the policies on a simple numerical scale that goes from 0 (no measures) to 3 (require closing all levels). It has an additional flag variable that can be either 0 (targeted - valid for individual school districts) or 1 (general - all schools in a jurisdiction are closed).
    * **C2- Workplace closing**: it records closings of workplaces. Its values can go from 0 (no measures) to 3 ( require closing (or work from home) for all-but-essential workplaces (eg grocery stores, doctors)). It has an additional flag variable that can be either 0 (targeted) or 1 (general).
    * **C3 - Cancel public events**: it records cancelling public events. Its values can go from 0 (no measures) to 2 (require cancelling). It has an additional flag variable that can be either 0 (targeted) or 1 (general).
    * **C4 - Restrictions on gatherings**: it records limits on gatherings. Its values can go from 0 (no restrictions) to 4 (restrictions on gatherings of 10 people or less). It has an additional flag variable that can be either 0 (targeted) or 1 (general).
    * **C5 - Close public transport**: it records closing of public transport. Its values can go from 0 (no measures) to 2 (require closing (or prohibit most citizens from using it)). It has an additional flag variable that can be either 0 (targeted) or 1 (general).
    * **C6 - Stay at home requirements**: it records orders to "shelter-in-place" and otherwise confine to the home. Its values can go from 0 (no measures) to 3 (require not leaving house with minimal exceptions (eg allowed to leave once a week, or only one person can leave at a time, etc)). It has an additional flag variable that can be either 0 (targeted) or 1 (general).
    * **C7 - Restrictions on internal movement**: it records restrictions on internal movement between cities/regions. Its values can go from 0 (no measures) to 2 (internal movement restrictions in place). It has an additional flag variable that can be either 0 (targeted) or 1 (general).
    * **C8 - International travel controls**: it records restrictions on international travel (for foreign travellers, not citizens). Its values can go from 0 (no measures) to 4 (ban on all regions or total border closure). It has an additional flag variable that can be either 0 (targeted) or 1 (general).

- **H - HEALTH SYSTEM POLICIES**

    * **H1 - Public information campaigns**: it records presence of public info campaigns. Its values can go from 0 (no Covid-19 public information campaign) to 2 (coordinated public information campaign (eg across traditional and social media)). It has an additional flag variable that can be either 0 (targeted) or 1 (general).

The API outputs two values for the stringency index – the actual index `stringency_actual`
which is the calculated value, `null` if the index has been rejected for that date for having insufficient
data, and a ‘smoothed’ value `stringency`.

The smoothed value is only different in the past week, and is equal to the most recent valid index (if there is one, else null). The motivation for this is to provide a stabilized value for display purposes.

The API also outputs `legacy stringency` index that approximates the logic of the former version of the Stringency Index, which only had seven components. This legacy indicator should only be used for continuity purposes.
The legacy indicator is calculated through the logic above, but only uses seven of the nine indicators.
Specifically, it chooses between C3 and C4, and between C6 and C7, selecting whichever of those
pairs provides a higher sub-index score. 

# Bibliography

https://www.nature.com/articles/s41562-021-01079-8#Sec6
https://ourworldindata.org/metrics-explained-covid19-stringency-index
https://www.bsg.ox.ac.uk/research/research-projects/covid-19-government-response-tracker
https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/index_methodology.md
https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/codebook.md
https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/interpretation_guide.md
https://www.bsg.ox.ac.uk/sites/default/files/2022-04/BSG-WP-2020-032-v13.pdf
https://www.bsg.ox.ac.uk/sites/default/files/Calculation%20and%20presentation%20of%20the%20Stringency%20Index.pdf

### CODE FOR DATA RETRIEVAL

#### Necessary Libraries

In [1]:
import requests
import pandas as pd

#### Choice of period of time and set URL for request

Input the desired start date and end date in the format `YYYY-MM-DD`.

To match transactions dataset we need Start date= `2020-01-01` and End Date= `2020-06-21`.

In [2]:
start_date= input('Set the start date(YYYY-MM-DD): ') 
end_date= input('Set the end date(YYYY-MM-DD): ')

url = 'https://covidtrackerapi.bsg.ox.ac.uk/api/v2/stringency/date-range/{}/{}'.format(start_date,end_date)

Set the start date(YYYY-MM-DD): 2020-01-01
Set the end date(YYYY-MM-DD): 2021-06-21


#### Data Request

In [4]:
response = requests.get(url)
my_data = response.json()
print(response.status_code)

200


#### Get dictionary keys

As we can see, the keys of the dictionary containing the data correspond to the dates of each day of the chosen period. Consequently we can create a list called `dates` that we will use in the next step to extract the corresponding values from the json file, to be able then to convert them into a DataFrame.

We create then a list with all the country codes, to be used in the next steps.

In [5]:
dates=list(my_data["data"].keys())
dates
countries= my_data['countries']

#### Data split by date

Now we use the dates list to get the corresponding values, which are actually nested dictionaries: doing this we create a list of dictionaries, each having the data of just one day for all countries.

In [6]:
daily_json=[]

for i in range(len(dates)):
  d=my_data["data"][dates[i]]
  daily_json.append(d)

#### Creating dataframe

Converting the data of the first element of the `daily_json` list, we create a dataframe to which we will append the data of the following elements to create one unique dataset.
The dataframe was transposed to have "date_value", "country_code", "confirmed", "deaths" etc... as column names.

In [7]:
df= pd.DataFrame(daily_json[0],).transpose()
df.head()

Unnamed: 0,date_value,country_code,confirmed,deaths,stringency_actual,stringency,stringency_legacy,stringency_legacy_disp
CUB,2020-01-01,CUB,,,0,0,0,0
MWI,2020-01-01,MWI,,,0,0,0,0
BIH,2020-01-01,BIH,,,0,0,0,0
PNG,2020-01-01,PNG,,,0,0,0,0
PRI,2020-01-01,PRI,,,0,0,0,0


#### Import all data into the created dataframe

Starting by the second element of the `daily_json` list, we convert json element into a dataframe that will be attached at the end of the one we've already created.

The resulting DataFrame has the country code as index: as we already have the country code in the dataset as a column, we can drop the index to avoid redundancy.

In [8]:
for i in range(1,len(daily_json)):
  dataframe=pd.DataFrame(daily_json[i],).transpose()
  df= pd.concat([df, dataframe])

df.reset_index(drop=True)

Unnamed: 0,date_value,country_code,confirmed,deaths,stringency_actual,stringency,stringency_legacy,stringency_legacy_disp
0,2020-01-01,CUB,,,0,0,0,0
1,2020-01-01,MWI,,,0,0,0,0
2,2020-01-01,BIH,,,0,0,0,0
3,2020-01-01,PNG,,,0,0,0,0
4,2020-01-01,PRI,,,0,0,0,0
...,...,...,...,...,...,...,...,...
99525,2021-06-21,IRL,269321,4979,50,50,63.81,63.81
99526,2021-06-21,BTN,1939,1,65.28,65.28,69.76,69.76
99527,2021-06-21,MNG,95819,459,59.26,59.26,60.71,60.71
99528,2021-06-21,EST,130818,1268,35.19,35.19,45.24,45.24


#### DataFrame check

To check if all the data have been converted and put into the unique DataFrame, we can calculate the number of the days considered in the given period of time (total number of dates) and the number of countries present in the dataset (which should be always equal to 185).

If the number of dates multipled by the number of countries is equal to the total number of rows, we have reasonable certainty to have all the data.

In [9]:
#total number of dates
dates_number=len(dates)
print(dates_number)

#total number of countries
countries_number=len(countries)
print(countries_number)

#total number of rows
total_rows=len(df.index)
print(total_rows)

538
185
99530


In [10]:
check_value= dates_number*countries_number
is_check_ok= check_value==total_rows
print(is_check_ok)

True


#### Export raw data into csv

If the row check is OK, we can now export the DataFrame into a csv file, which will have in the name the start date and the end date.

In [13]:
df.to_csv(f'data\Raw_data_COVID19API_{start_date}_{end_date}.csv', na_rep='na', index=False)