# osiris GDELT data

This notebook describes the [GDELT](https://www.gdeltproject.org/) project data that osiris uses and how to import it using osiris either from the GDELT file server or from Google BigQuery.

*From the GDELT website*:
>The GDELT Project is a realtime network diagram and database of global human society for open research.
![gf](https://www.gdeltproject.org/images/spinningglobe.gif)
>The GDELT Project is an initiative to construct a catalog of human societal-scale behavior and beliefs across all countries of the world, connecting every person, organization, location, count, theme, news source, and event across the planet into a single massive network that captures what's happening around the world, what its context is and who's involved, and how the world is feeling about it, every single day.

In [3]:
# Import the osiris code and set the runtime env 
import os, sys
sys.path.append(os.path.join('..', 'osiris'))
sys.path.append(os.path.join('..', 'ext'))
from osiris_global import set_runtime_env
set_runtime_env(debug = False, interactive_nb=True)

In [4]:
from data.tables import events
eve = events('2022-01-01', maxrows=30000)
print(eve)

Fetching row iterator for table osiris-347701.gdelt_snapshots.events_actions_20220000_20220421...
Fetching row iterator for table osiris-347701.gdelt_snapshots.events_actions_20220000_20220421 completed in 0.00 s.


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:10<00:00,  3.47s/it]

              ID                      Actor1ID                      Actor2ID  \
0     1029322782  TzDUyfR/lqodWVk8ks3USmv88VM=  XOU985vr8VIlggJ5793XLI9ba84=   
1     1028350424  ubHwXHRJviEVW2vtL0ObeedhrJw=  lMZTtnHvU3s1GlvlvJo5k3+jyMs=   
2     1021550572  l7eim/rHniS2aUx7zMwSffGqHJE=  iu+wbEJuB6CmcaHiSItIWNaUpzA=   
3     1022318261  S7Oqkp1MxndhbgKltH/KITlJaqs=  iu+wbEJuB6CmcaHiSItIWNaUpzA=   
4     1024310578  jOEZ/5w36CLp1du3XMqBH4DB9k4=  iu+wbEJuB6CmcaHiSItIWNaUpzA=   
...          ...                           ...                           ...   
9995  1029197704  DAWq7VcwC5sZeZIqhbqWXbeuK14=  szM+hTrcccbeZXO98H9lv2aPmvI=   
9996  1023368287  wuGYV2nALD0u88o/mteURWYlmIA=  iu+wbEJuB6CmcaHiSItIWNaUpzA=   
9997  1038791987  fmt8ICK6NDXXn12xQjtO0K02YIE=  iu+wbEJuB6CmcaHiSItIWNaUpzA=   
9998  1035826050  d8NXpQNosnWLngixXg1F1C1b+HI=  YVRh8I4t0XfIbSgkQr8Vouh8v+I=   
9999  1022890353  CPKznfipz4SPamwt3F9dDhvtu2o=  8+e5VArriK2GkyX+pXT1x6eqMBY=   

            Date  IsRoot CAMEOCode Base




In [2]:
# Import data directly from GDELT file server
from data.gdelt import DataSource
gdelt = DataSource()

## GDELT Event data

The GDELT [event data](http://data.gdeltproject.org/documentation/GDELT-Event_Codebook-V2.0.pdf) contains hundreds of millions of automatically coded events extracted from news stories daily.

In [3]:
events = gdelt.import_data('events', 'Nov-02-2021 02:00:00', 'Nov-02-2021 04:00:00')

Importing GDELT events data for 2 hour(s) from 11-02-2021 02:00:00 to 11-02-2021 04:00:00...
Importing GDELT events data for 2 hour(s) from 11-02-2021 02:00:00 to 11-02-2021 04:00:00 completed in 2.68 s.


In [17]:
events.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 916 entries, 0 to 915
Data columns (total 62 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   GLOBALEVENTID          916 non-null    int64  
 1   SQLDATE                916 non-null    int64  
 2   MonthYear              916 non-null    int64  
 3   Year                   916 non-null    int64  
 4   FractionDate           916 non-null    float64
 5   Actor1Code             825 non-null    object 
 6   Actor1Name             825 non-null    object 
 7   Actor1CountryCode      483 non-null    object 
 8   Actor1KnownGroupCode   11 non-null     object 
 9   Actor1EthnicCode       8 non-null      object 
 10  Actor1Religion1Code    7 non-null      object 
 11  Actor1Religion2Code    6 non-null      object 
 12  Actor1Type1Code        460 non-null    object 
 13  Actor1Type2Code        25 non-null     object 
 14  Actor1Type3Code        0 non-null      float64
 15  Actor2

Each event row in the data is highly denormalized coded using a hierachical coding system called [CAMEO](http://data.gdeltproject.org/documentation/CAMEO.Manual.1.1b3.pdf) - Conflict and Mediation Event Observations

In [5]:
events[['EventCode', 'CAMEOCodeDescription']]

Unnamed: 0,EventCode,CAMEOCodeDescription
0,190,"Use conventional military force, not specifie..."
1,010,"Make statement, not specified below"
2,042,Make a visit
3,1724,Impose state of emergency or martial law
4,043,Host a visit
...,...,...
911,042,Make a visit
912,042,Make a visit
913,051,Praise or endorse
914,042,Make a visit


In [25]:
gkg = gdelt.import_data('gkg', 'Nov-02-2021 02:00:00', 'Nov-02-2021 04:00:00')

Importing GDELT gkg data for 2 hour(s) from 11-02-2021 02:00:00 to 11-02-2021 04:00:00...
Importing GDELT gkg data for 2 hour(s) from 11-02-2021 02:00:00 to 11-02-2021 04:00:00 completed in 8.24 s.


In [26]:
gkg.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1504 entries, 0 to 1503
Data columns (total 27 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   GKGRECORDID                 1504 non-null   object 
 1   DATE                        1504 non-null   int64  
 2   SourceCollectionIdentifier  1504 non-null   int64  
 3   SourceCommonName            1504 non-null   object 
 4   DocumentIdentifier          1504 non-null   object 
 5   Counts                      186 non-null    object 
 6   V2Counts                    186 non-null    object 
 7   Themes                      1360 non-null   object 
 8   V2Themes                    1360 non-null   object 
 9   Locations                   1053 non-null   object 
 10  V2Locations                 1051 non-null   object 
 11  Persons                     1158 non-null   object 
 12  V2Persons                   1150 non-null   object 
 13  Organizations               1088 

In [28]:
gkg.head(100)

Unnamed: 0,GKGRECORDID,DATE,SourceCollectionIdentifier,SourceCommonName,DocumentIdentifier,Counts,V2Counts,Themes,V2Themes,Locations,...,GCAM,SharingImage,RelatedImages,SocialImageEmbeds,SocialVideoEmbeds,Quotations,AllNames,Amounts,TranslationInfo,Extras
0,20211102234500-0,20211102234500,1,edie.net,https://www.edie.net/news/9/COP26-Covered-Podc...,,,EPU_ECONOMY_HISTORIC;UNGP_FORESTS_RIVERS_OCEAN...,"TAX_FNCACT_CHIEF,1715;TAX_FNCACT_CHIEF,2080;TA...","4#Glasgow, Glasgow City, United Kingdom#UK#UKV...",...,"wc:448,c1.2:2,c1.3:7,c12.1:17,c12.10:36,c12.12...",https://e2k9ube.cloudimg.io/s/cdn/x/https://ed...,,,,,"Corporate Leaders Group,539;Triodos Bank,585;U...","3,conundrums,54;4,episode of COP26 Covered,192...",,<PAGE_LINKS>https://open.spotify.com/show/0Cwq...
1,20211102234500-1,20211102234500,1,dailyrecord.co.uk,https://www.dailyrecord.co.uk/news/business-co...,,,DISABILITY;WB_1458_HEALTH_PROMOTION_AND_DISEAS...,"TAX_FNCACT_CHIEF,853;MARITIME,966;DISABILITY,8...",1#United Kingdom#UK#UK#54#-4#UK,...,"wc:200,c12.1:10,c12.10:20,c12.12:4,c12.13:9,c1...",https://i2-prod.dailyrecord.co.uk/incoming/art...,,,,,"Purple Tuesday,351;Purple Tuesday,549;United K...","3000000,on Mondays,354;1000000,people with a d...",,<PAGE_LINKS>https://www.birminghammail.co.uk/w...
2,20211102234500-2,20211102234500,1,wnmufm.org,https://www.wnmufm.org/2021-11-02/around-12-00...,,,MILITARY;GENERAL_HEALTH;HEALTH_VACCINATION;WB_...,"EPU_CATS_REGULATION,188;GENERAL_HEALTH,44;HEAL...",,...,"wc:29,c12.1:3,c12.10:4,c12.12:4,c12.3:1,c12.4:...",,,,,,"Air Force,14",,,<PAGE_LINKS>https://www.cpr.org/</PAGE_LINKS><...
3,20211102234500-3,20211102234500,1,hannibal.net,https://www.hannibal.net/news/local/monroe-cou...,,,TAX_FNCACT;TAX_FNCACT_MAN;WB_1428_INJURY;WB_14...,"WB_635_PUBLIC_HEALTH,550;WB_2165_HEALTH_EMERGE...","4#Paris, France (General), France#FR#FR00#48.8...",...,"wc:79,c12.1:5,c12.10:2,c12.12:1,c12.14:1,c12.3...",,,,,,"Missouri State Highway Patrol,152;Saint Mary,5...","154,east of MO 15,171;",,<PAGE_AUTHORS>STAFF REPORT</PAGE_AUTHORS><PAGE...
4,20211102234500-4,20211102234500,1,southwalesargus.co.uk,https://www.southwalesargus.co.uk/news/nationa...,,,,,"4#Huntingdon, Cambridgeshire, United Kingdom#U...",...,"wc:725,c1.2:1,c12.1:30,c12.10:63,c12.12:32,c12...",https://www.southwalesargus.co.uk/resources/im...,https://image.assets.pressassociation.io/v2/im...,,,,"High Court,149;Manchester City,321;Manchester ...","8,men were sexually,559;5,City scouts,942;8,me...",,<PAGE_PRECISEPUBTIMESTAMP>20211102211700</PAGE...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,20211102234500-95,20211102234500,1,komu.com,https://www.komu.com/news/covid19/vaccine/um-s...,,,EDUCATION;SOC_POINTSOFINTEREST;SOC_POINTSOFINT...,"SOC_POINTSOFINTEREST_UNIVERSITIES,1597;SOC_POI...","2#Missouri, United States#US#USMO#38.4623#-92....",...,"wc:519,c1.2:1,c1.3:1,c1.4:1,c12.1:30,c12.10:42...",https://bloximages.newyork1.vip.townnews.com/k...,,,https://youtube.com/user/komunews;,2514|26||cooperate fully and timely,"Missouri System President Mun Choi,102;Joe Bid...","12,counts,1605;21,states,1677;",,<PAGE_LINKS>http://mizzoudata.imodules.com/con...
96,20211102234500-96,20211102234500,1,wvasfm.org,https://www.wvasfm.org/?page=16571,,,TAX_DISEASE;TAX_DISEASE_CORONAVIRUS;TAX_WORLDL...,"GENERAL_HEALTH,170;MEDICAL,170;TAX_WORLDLANGUA...","2#Alabama, United States#US#USAL#32.799#-86.80...",...,"wc:90,c12.1:2,c12.10:8,c12.12:5,c12.13:1,c12.1...",,,,,,"Alabama Department,158;Public Health,175","906,COVID,154;67,counties have seen their,315;",,"<PAGE_TITLE>WVAS | Jazz, Blues, News & Views</..."
97,20211102234500-97,20211102234500,1,argusobserver.com,https://www.argusobserver.com/national/agricul...,,,TAX_FOODSTAPLES;TAX_FOODSTAPLES_CORN;TAX_FNCAC...,"WB_471_ECONOMIC_GROWTH,1785;WB_1078_DETERMINAN...",,...,"wc:300,c1.2:3,c12.1:27,c12.10:49,c12.12:12,c12...",https://bloximages.chicago2.vip.townnews.com/a...,,,https://youtube.com/channel/UCl34oEhs-vlNGK15-...,,"Total Farm Marketing,164;Northern Hemisphere,687",,,<PAGE_PRECISEPUBTIMESTAMP>20211102220000</PAGE...
98,20211102234500-98,20211102234500,1,independentsentinel.com,https://www.independentsentinel.com/lunatic-pu...,AFFECT#5##1#United States#US#US#39.828175#-98....,AFFECT#5##1#United States#US#US#39.828175#-98....,CRISISLEX_CRISISLEXREC;ECON_TAXATION;USPEC_POL...,"ECON_INFLATION,1126;ECON_INFLATION,3175;ECON_I...","2#California, United States#US#USCA#36.17#-119...",...,"wc:524,c1.2:3,c1.3:3,c12.1:47,c12.10:54,c12.11...",https://www.independentsentinel.com/wp-content...,,https://pic.twitter.com/BoITpv2yCi;https://pic...,https://youtube.com/channel/UCqNfXxfww4ATBZsbv...,1280|177||Rents are up x2013 ; the cost of eve...,"Joe Biden,10;Joe Biden,336;Jewish Deplorable,1...",,,<PAGE_LINKS>https://t.co/0h3IXfrdCV;https://t....


In [2]:
from data.shaped import events
eve = events('2022-01-01')
print(eve)

Checking C:\Users\Allister\Downloads\osiris-347701-686569037d50.json for explicit credentials as part of auth process...
Checking C:\Users\Allister\Downloads\osiris-347701-686569037d50.json for explicit credentials as part of auth process...
Checking C:\Users\Allister\Downloads\osiris-347701-686569037d50.json for explicit credentials as part of auth process...
Fetching row iterator for table osiris-347701.gdelt_snapshots.events_actions_20220000_20220421...
Fetching row iterator for table osiris-347701.gdelt_snapshots.events_actions_20220000_20220421 completed in 0.00 s.


TypeError: 'generator' object cannot be interpreted as an integer