According to the ACLED history:<br>
"In 2019, ACLED introduced new event and sub-event types to improve the project’s core methodology. By 2020,
the project expanded geographic coverage to Europe, Central Asia and the Caucasus, East Asia, Latin America
and the Caribbean, and the United States. In 2022, ACLED completed a final geographic expansion to Canada,
Oceania, Antarctica, and all remaining small states and territories."

This means that events occurring in Canada and countries in Oceania (Australia, New Zealand, etc.) will not have entries before 2022 event_date values.

In [3]:
import pandas as pd
from ipynb.fs.full.data_pipeline import get_url, get_acled_dataframe

Since our project's focus is on COVID-19-related events, and the ACLED dataset contains all sorts of different events, we need to filter the ACLED data to only capture events that have to do with COVID-19. To do this, we add filters to the ACLED data API requests. We assume that any COVID-19 event will mention it in the <code>notes</code> column, so we add the filters that <code>notes</code> must contain either the string 'coronavirus' or 'COVID-19'.

In [7]:
url = get_url(limit=1000000, notes='coronavirus')
cor_df = get_acled_dataframe(url)
len(cor_df)

64480

In [8]:
url = get_url(limit=1000000, notes='COVID-19')
cov_df = get_acled_dataframe(url)
len(cov_df)

1929

There are likely rows that contain both strings, so there may be some duplicate rows we need to get rid of.

In [12]:
df_raw = pd.concat([cor_df, cov_df])
df = df_raw.drop_duplicates()

print(f'Dropping {len(df_raw) - len(df)} duplicates.')

Dropping 1303 duplicates.


We then save our filtered dataset as a local CSV file to easily read and use in our analysis.

In [14]:
df.to_csv('acled_covid19.csv')