# osiris GDELT data

This notebook describes the [GDELT](https://www.gdeltproject.org/) project data that osiris uses and how to import it using osiris either from the GDELT file server or from Google BigQuery.

*From the GDELT website*:
>The GDELT Project is a realtime network diagram and database of global human society for open research.
![gf](https://www.gdeltproject.org/images/spinningglobe.gif)
>The GDELT Project is an initiative to construct a catalog of human societal-scale behavior and beliefs across all countries of the world, connecting every person, organization, location, count, theme, news source, and event across the planet into a single massive network that captures what's happening around the world, what its context is and who's involved, and how the world is feeling about it, every single day.

In [1]:
# Import the osiris code and set the runtime env 
import os, sys
sys.path.append(os.path.join('..', 'osiris'))
sys.path.append(os.path.join('..', 'ext'))
from osiris_global import set_runtime_env
set_runtime_env(debug = False, interactive_nb=True)

In [4]:
# Import data directly from GDELT file server
from data.gdelt import DataSource
gdelt = DataSource()

## GDELT Event data

The GDELT [event data](http://data.gdeltproject.org/documentation/GDELT-Event_Codebook-V2.0.pdf) contains hundreds of millions of automatically coded events extracted from news stories daily.

In [None]:
events = gdelt.import_data('events', 'Nov-02-2021 02:00:00', 'Nov-02-2021 04:00:00')

In [None]:
events.info()

Each event row in the data is highly denormalized coded using a hierachical coding system called [CAMEO](http://data.gdeltproject.org/documentation/CAMEO.Manual.1.1b3.pdf) - Conflict and Mediation Event Observations

In [None]:
events[['EventCode', 'CAMEOCodeDescription']]

In [None]:
gkg = gdelt.import_data('gkg', 'Nov-02-2021 02:00:00', 'Nov-02-2021 04:00:00')

In [None]:
gkg.info()

In [None]:
gkg.head(100)

In [2]:
from data.tables import events
eve = events('2022-01-01', maxrows=30000)
print(eve)

Fetching row iterator for table osiris-347701.gdelt_snapshots.events_actions_20220000_20220421...
Fetching row iterator for table osiris-347701.gdelt_snapshots.events_actions_20220000_20220421 completed in 0.00 s.


  0%|          | 0/3 [00:00<?, ?batch/s]

              ID                      Actor1ID                      Actor2ID  \
0     1029322782  TzDUyfR/lqodWVk8ks3USmv88VM=  XOU985vr8VIlggJ5793XLI9ba84=   
1     1028350424  ubHwXHRJviEVW2vtL0ObeedhrJw=  lMZTtnHvU3s1GlvlvJo5k3+jyMs=   
2     1021550572  l7eim/rHniS2aUx7zMwSffGqHJE=  iu+wbEJuB6CmcaHiSItIWNaUpzA=   
3     1022318261  S7Oqkp1MxndhbgKltH/KITlJaqs=  iu+wbEJuB6CmcaHiSItIWNaUpzA=   
4     1024310578  jOEZ/5w36CLp1du3XMqBH4DB9k4=  iu+wbEJuB6CmcaHiSItIWNaUpzA=   
...          ...                           ...                           ...   
9995  1029197704  DAWq7VcwC5sZeZIqhbqWXbeuK14=  szM+hTrcccbeZXO98H9lv2aPmvI=   
9996  1023368287  wuGYV2nALD0u88o/mteURWYlmIA=  iu+wbEJuB6CmcaHiSItIWNaUpzA=   
9997  1038791987  fmt8ICK6NDXXn12xQjtO0K02YIE=  iu+wbEJuB6CmcaHiSItIWNaUpzA=   
9998  1035826050  d8NXpQNosnWLngixXg1F1C1b+HI=  YVRh8I4t0XfIbSgkQr8Vouh8v+I=   
9999  1022890353  CPKznfipz4SPamwt3F9dDhvtu2o=  8+e5VArriK2GkyX+pXT1x6eqMBY=   

            Date  IsRoot CAMEOCode Base