# osiris GDELT data

This notebook describes the [GDELT](https://www.gdeltproject.org/) project data that osiris uses and how to import it using osiris either from the GDELT file server or from Google BigQuery.

*From the GDELT website*:
>The GDELT Project is a realtime network diagram and database of global human society for open research.
![gf](https://www.gdeltproject.org/images/spinningglobe.gif)
>The GDELT Project is an initiative to construct a catalog of human societal-scale behavior and beliefs across all countries of the world, connecting every person, organization, location, count, theme, news source, and event across the planet into a single massive network that captures what's happening around the world, what its context is and who's involved, and how the world is feeling about it, every single day.

In [1]:
# Import the osiris code and set the runtime env 
import os, sys
sys.path.append(os.path.join('..', 'osiris'))
sys.path.append(os.path.join('..', 'ext'))
from osiris_global import set_runtime_env
set_runtime_env(debug = False, interactive_nb=True)

In [5]:
# Import data directly from GDELT file server
from data.gdelt import DataSource
gdelt = DataSource()

## GDELT Event data

The GDELT [event data](http://data.gdeltproject.org/documentation/GDELT-Event_Codebook-V2.0.pdf) contains hundreds of millions of automatically coded events extracted from news stories daily.

In [6]:
events = gdelt.import_data('events', 'Nov-02-2021 02:00:00', 'Nov-02-2021 04:00:00')

Importing GDELT events data for 2 hour(s) from 11-02-2021 02:00:00 to 11-02-2021 04:00:00...
Importing GDELT events data for 2 hour(s) from 11-02-2021 02:00:00 to 11-02-2021 04:00:00 completed in 2.63 s.


In [7]:
events.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 916 entries, 0 to 915
Data columns (total 62 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   GLOBALEVENTID          916 non-null    int64  
 1   SQLDATE                916 non-null    int64  
 2   MonthYear              916 non-null    int64  
 3   Year                   916 non-null    int64  
 4   FractionDate           916 non-null    float64
 5   Actor1Code             825 non-null    object 
 6   Actor1Name             825 non-null    object 
 7   Actor1CountryCode      483 non-null    object 
 8   Actor1KnownGroupCode   11 non-null     object 
 9   Actor1EthnicCode       8 non-null      object 
 10  Actor1Religion1Code    7 non-null      object 
 11  Actor1Religion2Code    6 non-null      object 
 12  Actor1Type1Code        460 non-null    object 
 13  Actor1Type2Code        25 non-null     object 
 14  Actor1Type3Code        0 non-null      float64
 15  Actor2

Each event row in the data is highly denormalized coded using a hierachical coding system called [CAMEO](http://data.gdeltproject.org/documentation/CAMEO.Manual.1.1b3.pdf) - Conflict and Mediation Event Observations

In [8]:
events[['EventCode', 'CAMEOCodeDescription']]

Unnamed: 0,EventCode,CAMEOCodeDescription
0,190,"Use conventional military force, not specifie..."
1,010,"Make statement, not specified below"
2,042,Make a visit
3,1724,Impose state of emergency or martial law
4,043,Host a visit
...,...,...
911,042,Make a visit
912,042,Make a visit
913,051,Praise or endorse
914,042,Make a visit


In [None]:
gkg = gdelt.import_data('gkg', 'Nov-02-2021 02:00:00', 'Nov-02-2021 04:00:00')

In [None]:
gkg.info()

In [None]:
gkg.head(100)