In [11]:
import pandas as pd

# GDELT GKG Media Categorization Metadata
GDELT injests a ton of different websites, spammy SEO blogs, PR newswires, and of course a ton of news media. Because it's a bit of a soup, we've created media lists that fit categories academics tend to study: elite media, wire media, traditional news media, online partisian media, and emerging media.

## For a review of the methodology that created this data, see:

Vargo, C., & Guo, L. (2017). Networks, big data, and intermedia agenda-setting: an analysis of traditional, partisan, and emerging online U.S. news. Journalism & Mass Communication Quarterly, 94(4) 1031–1055. http://chrisjvargo.com/wp-content/uploads/2016/12/1FinalPDFJMCQ.pdf

Guo, L., & Vargo, C. (2018). “Fake news” and emerging online media ecosystem: An integrated intermedia agenda-setting analysis of during the 2016 U.S. presidential election. Communication Research. Preprint published online, June 4, 2018. https://www.dropbox.com/s/knjgj2ior6r9k8c/1FinalPDF.pdf?dl=1

# The Data
The data is stored in one excel file, with 5 sheets: elite media, wire media, traditional news media, online partisian media, and emerging media.

In [12]:
gdeltsources = pd.ExcelFile("GDELT-Global_Knowledge_Graph_SourceLists.xlsx")
gdeltsources.sheet_names

['Elite', 'Wire', 'Traditional', 'Online partisan', 'Emerging']

# Elite and Wire Media
The elite and wire sheets are self explanatory.

In [13]:
elitemedia = pd.read_excel(open('GDELT-Global_Knowledge_Graph_SourceLists.xlsx', 'rb'), sheet_name="Elite")

In [14]:
elitemedia.head()

Unnamed: 0,NYT/WaPo
0,nytimes.com
1,washingtonpost.com


In [15]:
wiremedia = pd.read_excel(open('GDELT-Global_Knowledge_Graph_SourceLists.xlsx', 'rb'), sheet_name="Wire")

In [21]:
wiremedia.head()

Unnamed: 0,Wires
0,upi.com
1,ap.org


# Traditional Media
Note that we define traditional media as non-elite, non-wire, non-partisian sources that are: newspapers, TV or radio stations. They're traditional in the sense that they're tied to legacy media channels. (more: http://chrisjvargo.com/wp-content/uploads/2016/12/1FinalPDFJMCQ.pdf)

Outside of that, nothing unusual here other than there's a lot of them.

In [22]:
traditionalmedia = pd.read_excel(open('GDELT-Global_Knowledge_Graph_SourceLists.xlsx', 'rb'), sheet_name="Traditional")

In [23]:
traditionalmedia.head()

Unnamed: 0,Traditional
0,1011now.com
1,10news.com
2,10tv.com
3,11alive.com
4,12news.com


# Emerging Media
Emerging media are non-elite, non-wire, non-partisian sources that aren't tied to traditional sources as above. They're emerging in that they're online only. (more: http://chrisjvargo.com/wp-content/uploads/2016/12/1FinalPDFJMCQ.pdf)

In [24]:
emergingmedia = pd.read_excel(open('GDELT-Global_Knowledge_Graph_SourceLists.xlsx', 'rb'), sheet_name="Emerging")

In [20]:
emergingmedia.head()

Unnamed: 0,Emerging
0,yahoo.com
1,wickedlocal.com
2,marketwatch.com
3,patch.com
4,digitaljournal.com


# Online Partisan
Finally, we have created a category for online partisan media. These media are online news media that aren't in the categories above. (more: http://chrisjvargo.com/wp-content/uploads/2016/12/1FinalPDFJMCQ.pdf)
## Partisanship
For this category, we also add the partisan association as a separate column, where 1 = liberal, 2 = conservative and 3 = libertarian.

In [30]:
partisanmedia = pd.read_excel(open('GDELT-Global_Knowledge_Graph_SourceLists.xlsx', 'rb'), sheet_name="Online partisan")
partisanmedia.head()

Unnamed: 0,Online partisan,Liberal -1; Conservative -2; libertarian-3
0,addictinginfo.org,1
1,alan.com,1
2,blogforiowa.com,1
3,bluenationreview.com,1
4,carbonated.tv,1


### Disclaimer
This list isn't perfect, and we'd argue that no media source list is. However, we've tried the best we can. Should you have suggestions to improve or enhance the list, feel free to either drop me a line at christopher.vargo@colorado.edu, or to put a request in here on this repo.