# osiris

![img](https://dm2301files.storage.live.com/y4mmRC1xelS6Y6MEqUnZ-k2vjpADHpo6UMZAaZWROunr9-Ml5FYDlZ6WMxCGedy7NDhwDpusZdF5E1oLR5Qn6momydHe7tYUOMwNeFeGW7pUWkBjGPSnZp2sacYWs9IKkose6xjhSySL_v2tbfItRI7T_Pw_Tayhaa2F_vrwW6ucyr6WPa6s9DWH_if9Y5Y3yAU?width=375&height=250&cropmode=none)


osiris is a Python data processing and analysis environment for data-based computational conflict forecasting using very large datasets and graph-based methods and models and visualization, powered by scalable graph databases.

You can use osiris to analyze causal chains and networks of confict and violence around the world from realtime-updated, [automatically-encoded political event data](https://parusanalytics.com/eventdata/papers.dir/Schrodt_Yonamine_NewDirectionsInText.pdf) from projects like GDELT. This notebook gives an overview of the osiris project, the [GDELT project](https://www.gdeltproject.org/) data that osiris uses, how to import political event data using osiris either from the GDELT file server or from Google BigQuery, how to visualize and analyze it using Python, and how to load it into a TigerGraph graph server instance to efficiently run graph-centric queries on it to retrieve vertex-edge event data that can then be further analyzed.

## Notebook Environment Setup

In [115]:
import os, sys
# Check if running inside Colab or Kaggle
IN_COLAB = 'COLAB_GPU' in os.environ
IN_KAGGLE = 'KAGGLE_KERNEL_RUN_TYPE' in os.environ
IN_HOSTED_NB = IN_COLAB or IN_KAGGLE
# If we're not in a hosted nb env assume we're running Jupyter from the osiris project directory root
OSIRIS_PATH = '..' if not IN_HOSTED_NB else 'osiris'

In [None]:
# Uncomment and run below if you need to install osiris from GitHub e.g. if running inside Colab or Kaggle
# !if [ -d "osiris" ]; then rm -Rf osiris; fi
# !git clone https://github.com/allisterb/osiris --recurse-submodules
# !cd osiris && ./install

In [116]:
# Import the osiris code and set the runtime env. 
sys.path.append(os.path.join(OSIRIS_PATH, 'osiris'))
sys.path.append(os.path.join(OSIRIS_PATH, 'ext'))
from osiris_global import set_runtime_env
set_runtime_env(interactive_nb=True)

## GDELT Event Data

*From the  [GDELT project](https://www.gdeltproject.org/) website*:
>The GDELT Project is a realtime network diagram and database of global human society for open research.
![gf](https://www.gdeltproject.org/images/spinningglobe.gif)

>The GDELT Project is an initiative to construct a catalog of human societal-scale behavior and beliefs across all countries of the world, connecting every person, organization, location, count, theme, news source, and event across the planet into a single massive network that captures what's happening around the world, what its context is and who's involved, and how the world is feeling about it, every single day.

The GDELT [event data](http://data.gdeltproject.org/documentation/GDELT-Event_Codebook-V2.0.pdf) contains hundreds of millions of automatically coded events extracted from news stories daily using NLU methods and models. Each event data row contains the following fields:
1. *Actors*: Humans or organizations or states which initiate and are the target of event actions. Actors may have geographic information but not temporal. An event references exactly 2 actors: Actor1 and Actorr2.
2. *Actions*: Codes and other information which describe each event. Actions have both temporal and spatial attributes: an event time plus some geo information like latitude / longitude.  
3. *SourceURL*: a URL that locates the *story* from which the event data was extracted.

osiris can extract data directly from the GDELT file server. The advantage of this method is that you don't need to have any special credentials or server access (remember we're interested *open-source* indicators.). All the data is downloaded directly to your client machine or notebook environment.

In [161]:
# Import data directly from GDELT file server
from data.gdelt import DataSource
import pandas as pd
gdelt = DataSource()

In [119]:
# Get event data for a 1 week period
events = gdelt.import_data('events', 'Apr-14-2022', 'Apr-20-2022')

Importing GDELT events data for 7 day(s) from 04-14-2022 to 04-20-2022...


Import GDELT events data:   0%|          | 0/7 [00:00<?, ?day/s]

Importing GDELT events data for 7 day(s) from 04-14-2022 to 04-20-2022 completed in 67.87 s.


About a week's worth of event data in 2022 consists of about 700K events takes up about 340MB RAM.

In [120]:
events.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 707186 entries, 0 to 125669
Data columns (total 62 columns):
 #   Column                 Non-Null Count   Dtype  
---  ------                 --------------   -----  
 0   GLOBALEVENTID          707186 non-null  int64  
 1   SQLDATE                707186 non-null  int64  
 2   MonthYear              707186 non-null  int64  
 3   Year                   707186 non-null  int64  
 4   FractionDate           707186 non-null  float64
 5   Actor1Code             640700 non-null  object 
 6   Actor1Name             640700 non-null  object 
 7   Actor1CountryCode      408112 non-null  object 
 8   Actor1KnownGroupCode   9610 non-null    object 
 9   Actor1EthnicCode       3423 non-null    object 
 10  Actor1Religion1Code    10452 non-null   object 
 11  Actor1Religion2Code    2561 non-null    object 
 12  Actor1Type1Code        296023 non-null  object 
 13  Actor1Type2Code        19713 non-null   object 
 14  Actor1Type3Code        495 non-null 

In [5]:
events

Unnamed: 0,GLOBALEVENTID,SQLDATE,MonthYear,Year,FractionDate,Actor1Code,Actor1Name,Actor1CountryCode,Actor1KnownGroupCode,Actor1EthnicCode,...,ActionGeo_Type,ActionGeo_FullName,ActionGeo_CountryCode,ActionGeo_ADM1Code,ActionGeo_ADM2Code,ActionGeo_Lat,ActionGeo_Long,ActionGeo_FeatureID,DATEADDED,SOURCEURL
0,1039303078,20210414,202104,2021,2021.2849,CAN,CANADA,CAN,,,...,4,"Port Elgin, Ontario, Canada",CA,CA08,12643,44.4333,-81.38330,-571576,20220414014500,https://www.lakeshoreadvance.com/news/local-ne...
1,1039303079,20210414,202104,2021,2021.2849,CAN,CANADA,CAN,,,...,4,"Port Elgin, Ontario, Canada",CA,CA08,12643,44.4333,-81.38330,-571576,20220414014500,https://www.lakeshoreadvance.com/news/local-ne...
2,1039303080,20210414,202104,2021,2021.2849,CHN,CHINA,CHN,,,...,4,"Shanghai, Shanghai, China",CH,CH23,13243,31.2222,121.45800,-1924465,20220414014500,https://news.yahoo.com/zealand-court-rules-all...
3,1039303081,20210414,202104,2021,2021.2849,CVL,SCIENTIST,,,,...,4,"Paris, France (general), France",FR,FR00,16282,48.8667,2.33333,-1456928,20220414014500,http://www.jordantimes.com/news/features/first...
4,1039303082,20210414,202104,2021,2021.2849,MNCUSAMED,GOOGLE,USA,,,...,2,"California, United States",US,USCA,,36.1700,-119.74600,CA,20220414014500,https://menafn.com/1104016162/Google-to-invest...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
125665,1040383216,20220420,202204,2022,2022.3014,cre,CREE,,,cre,...,0,,,,,,,,20220420234500,https://www.cjvr.com/2022/04/20/first-nations-...
125666,1040383217,20220420,202204,2022,2022.3014,cre,CREE,,,cre,...,0,,,,,,,,20220420234500,https://www.cjvr.com/2022/04/20/first-nations-...
125667,1040383218,20220420,202204,2022,2022.3014,cre,CREE,,,cre,...,0,,,,,,,,20220420234500,https://www.cjvr.com/2022/04/20/first-nations-...
125668,1040383219,20220420,202204,2022,2022.3014,telOPP,TELUGU,,,tel,...,0,,,,,,,,20220420234500,https://www.deccanchronicle.com/nation/politic...


Event data is highly denormalized with many redundancies for ease of querying and coded using a hierachical coding system called [CAMEO](http://data.gdeltproject.org/documentation/CAMEO.Manual.1.1b3.pdf) - Conflict and Mediation Event Observations

In [6]:
events[['EventCode', 'CAMEOCodeDescription']]

Unnamed: 0,EventCode,CAMEOCodeDescription
0,012,Make pessimistic comment
1,020,"Appeal, not specified below"
2,0213,Appeal for judicial cooperation
3,043,Host a visit
4,0311,Express intent to cooperate economically
...,...,...
125665,060,"Engage in material cooperation, not spec below"
125666,073,Provide humanitarian aid
125667,090,"Investigate, not specified below"
125668,043,Host a visit


We can query and filter event data directly using the Pandas dataframe

In [128]:
# Find all events that were geolocated in Ukraine
uka_events = events[(events.ActionGeo_CountryCode == 'UP')]
uka_events

Unnamed: 0,GLOBALEVENTID,SQLDATE,MonthYear,Year,FractionDate,Actor1Code,Actor1Name,Actor1CountryCode,Actor1KnownGroupCode,Actor1EthnicCode,...,ActionGeo_Type,ActionGeo_FullName,ActionGeo_CountryCode,ActionGeo_ADM1Code,ActionGeo_ADM2Code,ActionGeo_Lat,ActionGeo_Long,ActionGeo_FeatureID,DATEADDED,SOURCEURL
126,1039301016,20220414,202204,2022,2022.2849,,,,,,...,4,"Kharkiv, Kharkivs'ka Oblast', Ukraine",UP,UP07,25036,49.9808,36.2527,-1041320,20220414013000,https://www.understandingwar.org/backgrounder/...
127,1039301017,20220414,202204,2022,2022.2849,,,,,,...,4,"Kherson, Khersons'ka Oblast', Ukraine",UP,UP08,28557,46.6558,32.6178,-1041356,20220414013000,https://www.understandingwar.org/backgrounder/...
260,1039301150,20220414,202204,2022,2022.2849,CAN,CANADA,CAN,,,...,1,Ukraine,UP,UP,,49.0000,32.0000,UP,20220414013000,https://www.castanet.net/news/Canada/365990/Se...
276,1039301166,20220414,202204,2022,2022.2849,CAN,CANADA,CAN,,,...,1,Ukraine,UP,UP,,49.0000,32.0000,UP,20220414013000,https://www.castanet.net/news/Canada/365990/Se...
286,1039301176,20220414,202204,2022,2022.2849,CAN,CANADIAN,CAN,,,...,1,Ukraine,UP,UP,,49.0000,32.0000,UP,20220414013000,https://www.castanet.net/news/Canada/365990/Se...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
125374,1040370052,20220420,202204,2022,2022.3014,UKRPTY,UKRAINIAN,UKR,,,...,4,"Kiev, Ukraine (general), Ukraine",UP,UP00,28554,50.4333,30.5167,-1044367,20220420214500,https://www.bignewsnetwork.com/news/272500617/...
125375,1040370053,20220420,202204,2022,2022.3014,UKRREF,UKRAINIAN,UKR,,,...,1,Ukraine,UP,UP,,49.0000,32.0000,UP,20220420214500,https://www.irishmirror.ie/news/irish-news/tao...
125376,1040370054,20220420,202204,2022,2022.3014,UKRREF,UKRAINE,UKR,,,...,1,Ukraine,UP,UP,,49.0000,32.0000,UP,20220420214500,https://www.irishmirror.ie/news/irish-news/tao...
125439,1040370117,20220420,202204,2022,2022.3014,USA,UNITED STATES,USA,,,...,1,Ukraine,UP,UP,,49.0000,32.0000,UP,20220420214500,https://www.oann.com/report-pentagon-monitorin...


So about 50K of 700K events last week were coded as happening in Ukraine, not surprising given recent events. Many of those related to use of military force.

In [134]:
# CAMEO code 190 denotes 'use of military force'
uka_events[uka_events.EventCode.str.startswith('190')]

Unnamed: 0,GLOBALEVENTID,SQLDATE,MonthYear,Year,FractionDate,Actor1Code,Actor1Name,Actor1CountryCode,Actor1KnownGroupCode,Actor1EthnicCode,...,ActionGeo_Type,ActionGeo_FullName,ActionGeo_CountryCode,ActionGeo_ADM1Code,ActionGeo_ADM2Code,ActionGeo_Lat,ActionGeo_Long,ActionGeo_FeatureID,DATEADDED,SOURCEURL
568,1039301458,20220414,202204,2022,2022.2849,GOV,GOVERNOR,,,,...,4,"Kharkiv, Kharkivs'ka Oblast', Ukraine",UP,UP07,25036,49.9808,36.2527,-1041320,20220414013000,http://www.koreatimes.co.kr/www/world/2022/04/...
894,1039301784,20220414,202204,2022,2022.2849,RUS,RUSSIAN,RUS,,,...,5,"Kherson Oblast, Khersons'ka Oblast', Ukraine",UP,UP08,28550,46.5000,34.0000,-1041362,20220414013000,https://www.understandingwar.org/backgrounder/...
913,1039301803,20220414,202204,2022,2022.2849,RUS,RUSSIAN,RUS,,,...,4,"Kherson, Khersons'ka Oblast', Ukraine",UP,UP08,28557,46.6558,32.6178,-1041356,20220414013000,https://www.understandingwar.org/backgrounder/...
915,1039301805,20220414,202204,2022,2022.2849,RUS,RUSSIA,RUS,,,...,5,"Kherson Oblast, Khersons'ka Oblast', Ukraine",UP,UP08,28550,46.5000,34.0000,-1041362,20220414013000,https://www.understandingwar.org/backgrounder/...
916,1039301806,20220414,202204,2022,2022.2849,RUS,RUSSIAN,RUS,,,...,4,"Rubizhne, Luhans'ka Oblast', Ukraine",UP,UP14,25090,49.0123,38.3797,-1052568,20220414013000,https://www.understandingwar.org/backgrounder/...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
118569,1040373136,20220420,202204,2022,2022.3014,RUS,RUSSIAN,RUS,,,...,4,"Vadym, Khersons'ka Oblast', Ukraine",UP,UP08,28553,46.1827,33.5971,-1057325,20220420221500,https://www.news8000.com/i/elderly-in-ukraine-...
118650,1040373217,20220420,202204,2022,2022.3014,UKR,UKRAINE,UKR,,,...,4,"Chernihiv, Chernihivs'ka Oblast', Ukraine",UP,UP02,28554,51.5055,31.2849,-1037057,20220420221500,http://www.msn.com/en-us/news/world/a-bomb-sni...
118725,1040373292,20220420,202204,2022,2022.3014,USA,UNITED STATES,USA,,,...,4,"Kyiv, Kyyiv, Misto, Ukraine",UP,UP12,28554,50.4333,30.5167,-1044367,20220420221500,http://www.msn.com/en-us/news/world/as-a-new-u...
123964,1040382951,20220420,202204,2022,2022.3014,UKR,UKRAINIAN,UKR,,,...,1,Ukraine,UP,UP,,49.0000,32.0000,UP,20220420234500,https://www.agassizharrisonobserver.com/news/a...


In [158]:
# Import Folium to plot these military force events on a map
import folium
folium.Map(
    location=[48., 31.], 
    tiles="Stamen Toner",
    zoom_start=6
)

In [162]:
uka_map = folium.Map(
    location=[48., 31.], 
    #tiles="Stamen Toner",
    zoom_start=6
)
uka_map
uka_events_sample = uka_events[uka_events.EventCode.str.startswith('190')].sample(n=100)
for r in uka_events_sample.itertuples():
    m = folium.Marker(location=[r.ActionGeo_Lat, r.ActionGeo_Long],
                      icon=folium.Icon(color="red", icon="fire", prefix="glyphicon"),
                      tooltip=str(r.Actor1CountryCode) + '->' + str(r.EventCode) + ' ' +  str(r.CAMEOCodeDescription) + '->' + str(r.Actor2CountryCode) +' on ' + str(r.SQLDATE)
                     )
    m.add_to(uka_map)
uka_map

In [None]:
# Uncomment and run below if running inside Colab and you want to pull env variables from a file called vars.env on your GDrive
# !pip install colab-env --upgrade
# import colab_env