In [4]:
import pandas as pd

df = pd.read_csv('../data/acled_covid19.csv')

We want to find what actually gets loaded into ACLED. Are there any news articles that can be found online about COVID-19
demonstrations that is not reflected in ACLED?

The Reuters article below should be in the dataset based on its contents.
https://www.reuters.com/world/europe/protest-against-coronavirus-restrictions-turns-violent-brussels-2021-12-05/

We will attempt to find the event and see if things match up.

In [5]:
df[(df['location'].str.contains('Brussels')) & (df['event_date'] == '2021-12-05')]

Unnamed: 0.1,Unnamed: 0,data_id,iso,event_id_cnty,event_id_no_cnty,event_date,year,time_precision,event_type,sub_event_type,...,location,latitude,longitude,geo_precision,source,source_scale,notes,fatalities,timestamp,iso3
8149,8149,8692936,56,BEL1617,1617,2021-12-05,2021,1,Protests,Peaceful protest,...,Brussels - Schaerbeek,50.8676,4.3738,1,La Capitale,National,"On 5 December 2021, around 200 people staged a...",0,1639520926,BEL
8162,8162,8876360,56,BEL1622,1622,2021-12-05,2021,1,Riots,Violent demonstration,...,Brussels,50.8468,4.3525,1,BRUZZ,Subnational,"On 5 December 2021, around 8,000 people staged...",0,1646327105,BEL


In [6]:
for i in (8692936, 8876360):
    notes = df[df['data_id'] == i]['notes'].iloc[0]
    print(f'data_id {i}:\n{notes}\n')

data_id 8692936:
On 5 December 2021, around 200 people staged a protest at the RTBF broadcasting service in Brussels - Schaerbeek to denounce the role of the media in legitimizing the coronavirus regulations of the Belgian government. [size=around 200]

data_id 8876360:
On 5 December 2021, around 8,000 people staged a demonstration march through Brussels against the coronavirus protection and vaccination policy of the Belgian government. Disaffected firefighters and health personnel, for whom the coronavirus vaccination is mandatory, were also present during the march. Towards the end of the march, rioters confronted police forces at several locations, who responded with water canons and tear gas. 23 people were arrested, and around 20 police officers were injured. [size=around 8,000]



Looks like the event is logged under data_id 8876360. Checking the row to see if the information matches up with the article.

In [7]:
print(f'''Source: {df[df['data_id'] == 8876360]['source'].iloc[0]}''')

Source: BRUZZ


This is not the source of the event that we found. Making sure Reuters articles are in the dataset.

Making sure that the event wasn't somehow missed.

In [8]:
df[(df['source'].str.contains('Reuters')) & df['event_date'] == '2021-12-05']

Unnamed: 0.1,Unnamed: 0,data_id,iso,event_id_cnty,event_id_no_cnty,event_date,year,time_precision,event_type,sub_event_type,...,location,latitude,longitude,geo_precision,source,source_scale,notes,fatalities,timestamp,iso3


Making sure there actually are articles from Reuters.

In [9]:
df[df['source'].str.contains('Reuters')]

Unnamed: 0.1,Unnamed: 0,data_id,iso,event_id_cnty,event_id_no_cnty,event_date,year,time_precision,event_type,sub_event_type,...,location,latitude,longitude,geo_precision,source,source_scale,notes,fatalities,timestamp,iso3
918,918,8984124,840,USA41347,41347,2022-04-01,2022,1,Protests,Peaceful protest,...,Greenbrier,36.4275,-86.8047,1,Reuters,International,"On 1 April 2022, about 100 vehicles affiliated...",0,1649186733,USA
1414,1414,8915380,840,USA40566,40566,2022-03-14,2022,1,Protests,Peaceful protest,...,Hagerstown,39.6428,-77.72,3,News2Share; New York Times; Breitbart; WUSA9; ...,New media-National,"On 14 March 2022, hundreds of trucks and other...",0,1647986195,USA
1610,1610,9424298,840,USA40346,40346,2022-03-06,2022,1,Protests,Peaceful protest,...,Hagerstown,39.6428,-77.72,3,Reuters; Epoch Times; Breitbart; Twitter; News...,New media-National,"On 6 March 2022, hundreds of trucks and other ...",0,1658870533,USA
16272,16272,8810461,840,USA33340,33340,2021-07-31,2021,1,Protests,Peaceful protest,...,Washington DC - National Mall,38.8875,-77.0364,1,Hill; Washington Post; In These Times; Reuters...,Subnational-National,"On 31 July 2021, after an overnight demonstrat...",0,1643756366,USA
16356,16356,8810462,840,USA33410,33410,2021-07-30,2021,1,Protests,Peaceful protest,...,Washington DC - National Mall,38.8875,-77.0364,1,Hill; CBS News; Washington Post; In These Time...,Subnational-National,"On 30 July 2021, and overnight to 31 July, US ...",0,1643756366,USA
18075,18075,8425696,364,IRN10184,10184,2021-07-04,2021,1,Strategic developments,Change to group/activity,...,Tehran,35.6944,51.4215,3,Iranian Student News Agency; Manoto; Al Jazeer...,National-Regional,"Security measures: On 4 July 2021, restriction...",0,1630957877,IRN
29711,29711,7708180,266,GAB283,283,2021-02-18,2021,1,Riots,Violent demonstration,...,Libreville,0.3901,9.4544,1,RFI; AFP; Gabo News; Africa 1; Gabon Actu; Reu...,National-Regional,"On 18 February 2021, residents demonstrated fo...",2,1618489769,GAB
29712,29712,7708181,266,GAB284,284,2021-02-18,2021,1,Riots,Violent demonstration,...,Port Gentil,-0.7167,8.7833,1,Gabon Actu; Africa 1; Reuters,National-Regional,"On 18 February 2021, residents demonstrated wi...",0,1614037121,GAB
29790,29790,7708715,266,GAB282,282,2021-02-17,2021,1,Protests,Peaceful protest,...,Libreville,0.3901,9.4544,1,Reuters,International,"On 17 February 2021, residents demonstrated wi...",0,1618489810,GAB
35405,35405,8867450,826,GBR2309,2309,2020-12-06,2020,1,Protests,Protest with intervention,...,London - Westminster,51.4973,-0.1372,1,Reuters,International,"On 6 December 2020, groups from Indian communi...",0,1646172106,GBR


For whatever reason, the Reuters article is not included. This raises a few questions:
- Why is this article not included as a multi-sourced event?
- What is different from the articles/sources of a multi-sourced event versus this?

ACLED 
https://acleddata.com/about-acled/our-partners/local-data-collection-partners/


Let's look at another article.

https://www.theguardian.com/world/2021/nov/19/the-netherlands-rotterdam-police-open-fire-as-covid-protest-turns-violent<br>
This article from The Guardian occured reportedly on 20 November 2021 in Rotterdam. Let's look at events in Rotterdam that occured in the month of November in 2021.

In [15]:
temp = df[(df['location'].str.contains('Rotterdam')) & (df['event_date'] > '2021-11-01') & (df['event_date'] < '2021-11-30')]
temp

Unnamed: 0.1,Unnamed: 0,data_id,iso,event_id_cnty,event_id_no_cnty,event_date,year,time_precision,event_type,sub_event_type,...,location,latitude,longitude,geo_precision,source,source_scale,notes,fatalities,timestamp,iso3
8947,8947,8885757,528,NLD1343,1343,2021-11-21,2021,1,Riots,Violent demonstration,...,Rotterdam,51.9214,4.4604,1,RTV Rijnmond,Subnational,"On 21 November 2021, hundreds of Feyenoord foo...",0,1646327173,NLD
9116,9116,8875211,528,NLD1323,1323,2021-11-19,2021,1,Riots,Violent demonstration,...,Rotterdam,51.9214,4.4604,1,De Telegraaf; RTV Rijnmond,Subnational-National,"On 19 November 2021, hundreds of people staged...",0,1646327101,NLD
9144,9144,8885737,528,NLD1322,1322,2021-11-18,2021,1,Protests,Peaceful protest,...,Rotterdam,51.9214,4.4604,1,RTV Rijnmond,Subnational,"On 18 November 2021, between 100 and 150 port ...",0,1646327173,NLD


In [17]:
for i in temp['notes']:
    print(i + '\n')

On 21 November 2021, hundreds of Feyenoord football supporters assembled at the Feyenoord stadium in Rotterdam to support their club who was playing at that moment, and to denounce the ban on attending football matches in the context of new coronavirus containment measures of the Dutch government. Several attendants turned violent, throwing fireworks to police forces. 26 people were arrested. [size=hundreds]


On 18 November 2021, between 100 and 150 port workers staged a protest at the Maasvlakte at the Port of Rotterdam, blocking traffic to denounce the coronavirus policy of the Dutch government and the the coronavirus vaccination requirements on the work floor. The protest was an initiative of Dockers United. [size=between 100 and 150]



data_id 8875211 appears to describe the event reported by The Guardian. The event in Rotterdam that The Guardian described as apparently occuring on Saturday appeared to have happened on Friday according to ACLED. Let's check the source of who reported this article.

In [18]:
print(f'''Source: {df[df['data_id'] == 8875211]['source'].iloc[0]}''')

Source: De Telegraaf; RTV Rijnmond


It looks like the sources do not include The Guardian interestingly enough. Let's check to see if The Guardian has any entries in this dataset.

In [28]:
distinct_guardian_sources = list()
for i in list(set(df[df['source'].str.contains('Guardian')]['source'])):
    for j in i.split(';'):
        distinct_guardian_sources.append(j.strip())
        
list(set([i for i in distinct_guardian_sources if 'Guardian' in i]))

['Guardian (United Kingdom)',
 'Guardian (Canada)',
 'Lancaster Guardian',
 'The Nassau Guardian',
 'Tamil Guardian',
 'South Wales Guardian',
 'Guardian (Nigeria)',
 'Mail & Guardian (South Africa)',
 'Guardian (Belize)',
 'Guardian',
 'Trinidad and Tobago Guardian']

Looks like The Guardian does in fact have logged events. This further shows that there is some criterion or shortcoming in data collection that causes some articles to not be included.