Because we're trying to look at the data in the context of usage, our focus is on how ACLED is using the information from souces to tag the data. One way of looking at this is to see if ACLED is using sources that are reporting on more than one country. The idea is that events can be tagged on if the source is known to report on multiple countries or not.

In [1]:
import pandas as pd

from pandasql import sqldf

sql = lambda q: sqldf(q, globals())

base_df = pd.read_csv('../data/acled_covid19.csv')
extd_df = pd.read_csv('../data/EXPANDED_acled_covid19.csv')

Here we obtain a count of how many distinct countries each source used in the ACLED dataset is reporting on.

In [2]:
df = sql('''
SELECT source_singular, count(*) country_count FROM
    (SELECT DISTINCT source_singular, country
    FROM extd_df) a
    GROUP BY source_singular
''')

print(f'{len(df[df.country_count > 1])} out of {len(df)} sources report on multiple countries.')
df[df.country_count > 1]

297 out of 5362 sources report on multiple countries.


Unnamed: 0,source_singular,country_count
26,168 Hours,2
33,20 Minutes,2
42,24 Heures,3
44,24 Horas,2
79,7 Sur 7,2
...,...,...
5276,YNA,2
5279,Yabiladi,3
5281,Yahoo News,4
5302,Ynet,2


We can see here that some sources do in fact report in multiple countries. Let's try and figure out if these countries are geographically close to each other. To do that, we can check if the sources are reporting from multiple regions as well.

In [3]:
multi_cntry_src_df = df[df.country_count > 1]

df2 = sql('''
SELECT source_singular, count(*) region_count FROM
    (SELECT DISTINCT main.source_singular, main.region
    FROM extd_df main
    JOIN multi_cntry_src_df m ON m.source_singular = main.source_singular
    ) a
    GROUP BY source_singular
''')

print(f'{len(df2[df2.region_count > 1])} out of {len(multi_cntry_src_df)} sources that report on multiple countries are also reporting in multiple regions.')
df2[df2.region_count > 1]

96 out of 297 sources that report on multiple countries are also reporting in multiple regions.


Unnamed: 0,source_singular,region_count
1,20 Minutes,2
2,24 Heures,2
3,24 Horas,2
4,7 Sur 7,2
6,AFP,13
...,...,...
289,World Socialist Web Site,2
291,Xinhua,7
293,Yabiladi,2
294,Yahoo News,3


In [4]:
count_df = sql('''
SELECT a.source_singular, country_count, region_count
FROM df a JOIN df2 b ON a.source_singular = b.source_singular
''')
count_df

Unnamed: 0,source_singular,country_count,region_count
0,168 Hours,2,1
1,20 Minutes,2,2
2,24 Heures,3,2
3,24 Horas,2,2
4,7 Sur 7,2,2
...,...,...,...
292,YNA,2,1
293,Yabiladi,3,2
294,Yahoo News,4,3
295,Ynet,2,1


In [5]:
src_ctry_df = sql('''

    SELECT source_singular, group_concat(country) countries FROM (
        SELECT DISTINCT a.source_singular, a.country FROM extd_df a
        JOIN count_df b
            ON a.source_singular = b.source_singular
        ) a
    GROUP BY a.source_singular
''')
src_ctry_df

Unnamed: 0,source_singular,countries
0,168 Hours,"Azerbaijan,Armenia"
1,20 Minutes,"France,Martinique"
2,24 Heures,"France,Switzerland,Canada"
3,24 Horas,"Mexico,Angola"
4,7 Sur 7,"Belgium,Democratic Republic of Congo"
...,...,...
292,YNA,"South Korea,North Korea"
293,Yabiladi,"Morocco,Spain,Algeria"
294,Yahoo News,"United States,Japan,Sri Lanka,India"
295,Ynet,"Israel,Palestine"


In [6]:
src_rgn_df = sql('''
    SELECT source_singular, group_concat(region) regions FROM (
        SELECT DISTINCT a.source_singular, a.region FROM extd_df a
        JOIN count_df b
            ON a.source_singular = b.source_singular
        WHERE b.region_count > 1
        ) a
    GROUP BY a.source_singular
''')
src_rgn_df

Unnamed: 0,source_singular,regions
0,20 Minutes,"Caribbean,Europe"
1,24 Heures,"Europe,North America"
2,24 Horas,"Middle Africa,North America"
3,7 Sur 7,"Europe,Middle Africa"
4,AFP,"Caribbean,Caucasus and Central Asia,Central Am..."
...,...,...
91,World Socialist Web Site,"North America,South Asia"
92,Xinhua,"Caucasus and Central Asia,East Asia,Europe,Mid..."
93,Yabiladi,"Europe,Northern Africa"
94,Yahoo News,"East Asia,North America,South Asia"


In [7]:
sql('''
    SELECT source_singular,countries FROM (
        SELECT source_singular, group_concat(country) countries FROM (
            SELECT DISTINCT a.source_singular, a.country FROM extd_df a
            JOIN count_df b
                ON a.source_singular = b.source_singular
            ) a
        GROUP BY a.source_singular
        ) b
        WHERE countries LIKE '%United States%'
''')

Unnamed: 0,source_singular,countries
0,AP,"North Korea,Tonga,Guadeloupe,Romania,Greece,Uk..."
1,Albanian Daily News,"Albania,United States"
2,BBC News,"Iran,United Kingdom,Guatemala,Cuba,India,Leban..."
3,CITY News,"Canada,United States"
4,CNBC,"Indonesia,United States"
5,CNN,"United States,Philippines,Israel"
6,Daily Kos,"Canada,United States"
7,Daily Mail,"United States,United Kingdom"
8,Daily Mail (United Kingdom),"United States,United Kingdom"
9,Dominion Post,"New Zealand,United States"


In [9]:
# QUESTION
# should we be answering the question "This source that ACLED used for this event ________"
# 
#     ex: "The source ACLED used for information on this event is the most commonly used source associated with violent events"
#         "The source ACLED used for information on this event has not been used since DD-MM-YYYY"
base_df['sourc']

Unnamed: 0.1,Unnamed: 0,data_id,iso,event_id_cnty,event_id_no_cnty,event_date,year,time_precision,event_type,sub_event_type,...,location,latitude,longitude,geo_precision,source,source_scale,notes,fatalities,timestamp,iso3
0,0,9498574,862,VEN12964,12964,2022-09-17,2022,1,Protests,Peaceful protest,...,Ciudad Bolivar,8.1292,-63.5409,1,Diario Primicia,Subnational,"On 17 September 2022, in Ciudad Bolivar (Boliv...",0,1664226314,VEN
1,1,9491030,410,KOR25174,25174,2022-09-16,2022,1,Protests,Peaceful protest,...,Seoul City - Seocho,37.4744,127.0304,1,EDaily,National,"On 16 September 2022, members of the All-Korea...",0,1663685720,KOR
2,2,9491038,156,CHN12137,12137,2022-09-15,2022,1,Protests,Peaceful protest,...,Hong Kong - Central and Western,22.2811,114.1598,1,HK01,Subnational,"On 15 September 2022, three representatives of...",0,1663685720,CHN
3,3,9491260,410,KOR25204,25204,2022-09-15,2022,1,Protests,Peaceful protest,...,Seoul City - Yeongdeungpo,37.5223,126.9075,1,YNA,National,"On 15 September 2022, members of the COVID-19 ...",0,1663685720,KOR
4,4,9492137,250,FRA18626,18626,2022-09-15,2022,1,Protests,Peaceful protest,...,Pau,43.2951,-0.3708,1,France Bleu,National,"On 15 September 2022, around 30 opponents of c...",0,1663691322,FRA
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65101,1922,8873745,380,ITA521,521,2020-02-06,2020,1,Protests,Peaceful protest,...,Como,45.8115,9.0829,1,Varese7press; Qui Como,Subnational,"On 6 February 2020, about 50 people, including...",0,1646327097,ITA
65102,1924,9441128,380,ITA477,477,2020-02-03,2020,1,Protests,Peaceful protest,...,Busto Arsizio,45.6054,8.8434,1,Malpensa24,National,"On 3 February 2020, a group of New Force activ...",0,1660059936,ITA
65103,1925,8873632,380,ITA473,473,2020-02-02,2020,1,Protests,Peaceful protest,...,Milano,45.4613,9.1595,1,Milano Today; La Repubblica,Subnational-National,"On 2 February 2020, around 100 people, most of...",0,1646327097,ITA
65104,1926,9441129,380,ITA468,468,2020-02-02,2020,1,Protests,Peaceful protest,...,Brescia,45.5354,10.2236,1,Rai News,National,"On 2 February 2020, an unspecified number of N...",0,1660059936,ITA
