# Exploring GDELT 2.0 TV News API

In this notebook, I will be playing with the API to see what the optimal use is, especially for recommendations. I will list any notes, problems, and discoveries here:

- Example:
    - https://api.gdeltproject.org/api/v2/tv/tv?query=Kris%20Kobach%20Jeff%20Colyer%20market:%22National%22&mode=clipgallery&format=json&datanorm=perc&timelinesmooth=0&datacomb=sep&last24=yes&timezoom=yes&TIMESPAN=14days#

In [3]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
import requests
import json
import csv

Assume at this point, we already have parsed an article for entities to run a search query.  

In [17]:
with open("search_result.json", "r") as f:
    search_result = json.load(f)

In [24]:
query = " ".join(search_result['POI'])
mode = "json"
url = f'https://api.gdeltproject.org/api/v2/tv/tv?query={query}%20market:%22National%22&mode=clipgallery&format={mode}&datanorm=perc&timelinesmooth=0&datacomb=sep&last24=yes&timezoom=yes&TIMESPAN=14days#'

In [25]:
res = requests.get(url)
print(res.status_code)

200


In [27]:
df = pd.DataFrame(json.loads(res.text)['clips'])

In [28]:
df.shape

(32, 8)

In [29]:
df.head()

Unnamed: 0,date,ia_show_id,preview_thumb,preview_url,show,show_date,snippet,station
0,20180807T185405Z,MSNBCW_20180807_180000_MSNBC_Live_With_Katy_Tur,https://archive.org/download/MSNBCW_20180807_1...,https://archive.org/details/MSNBCW_20180807_18...,MSNBC Live With Katy Tur,20180807T180000Z,"in, with conservative media. alex jones is not...",MSNBC
1,20180809T211130Z,BLOOMBERG_20180809_210000_Bloomberg_Technology,https://archive.org/download/BLOOMBERG_2018080...,https://archive.org/details/BLOOMBERG_20180809...,Bloomberg Technology,20180809T210000Z,alex jones. our discussion coming up. emily: t...,Bloomberg
2,20180808T223437Z,FBC_20180808_220000_Making_Money_With_Charles_...,https://archive.org/download/FBC_20180808_2200...,https://archive.org/details/FBC_20180808_22000...,Making Money With Charles Payne,20180808T220000Z,adam: twitter is letting alex jones spew his s...,FOX Business
3,20180807T151450Z,CNBC_20180807_150000_Squawk_Alley,https://archive.org/download/CNBC_20180807_150...,https://archive.org/details/CNBC_20180807_1500...,Squawk Alley,20180807T150000Z,it is far from stuff that alex jones has done ...,CNBC
4,20180807T074331Z,FOXNEWSW_20180807_070000_Fox_News__Night_With_...,https://archive.org/download/FOXNEWSW_20180807...,https://archive.org/details/FOXNEWSW_20180807_...,Fox News Night With Shannon Bream,20180807T070000Z,related to alex jones. the move comes after ap...,FOX News


In [9]:
snippet_matrix = df.snippet.as_matrix()

In [12]:
for doc in snippet_matrix[0:5]:
    print(doc)
    print()

in, with conservative media. alex jones is not a conservative. alex jones is a liar. alex jones is a conspiracy spreader. he's not someone saying that i believe that the tax policy that president obama put in place

alex jones. our discussion coming up. emily: twitter ceo jack dorsey continues to defend twitter's refusal to ban conspiracy mogul alex jones.

adam: twitter is letting alex jones spew his stuff. i'm not a fan of alex jones but at least they areletting him talk. i think facebook needs to be

it is far from stuff that alex jones has done and said. isn't this a situation it is all fun and games until somebody gets popular alex jones was popular even though apple doesn't host this stuff, discovery demands

related to alex jones. the move comes after apple and spotify, he claims it is censorship, the research center said i don't support alex jones, he is not a conservative but



In [61]:
snippet_matrix.put(0, search_result['document'])

In [62]:
snippet_matrix

array(['Apple and Google\xa0have both removed some\xa0Alex Jones\xa0content from some of their platforms, but his Infowars app is still available for download in both app stores and his accounts on Twitter and Instagram are still active.\n\nListed under the “News” sections of the iOS App Store and the Google Play store, the Infowars app offers its subscribers livestreams and articles. It’s ranked as high as No. 23 among free news apps on Apple, according to CNN Money.\n\nApple did remove some of Jones’ podcasts from iTunes, while YouTube, which is owned by Google, removed his channel.\xa0\n\n“Apple does not tolerate hate speech, and we have clear guidelines that creators and developers must follow to ensure we provide a safe environment for all of our users,” the company said in a statement. “Podcasts that violate these guidelines are removed from our directory making them no longer searchable or available for download or streaming. We believe in representing a wide range of views, so 

In [39]:
vectorizer = TfidfVectorizer(stop_words="english", ngram_range=(1,2) )
bow = vectorizer.fit_transform(snippet_matrix)

In [40]:
bow_df = pd.DataFrame(bow.todense(), columns=vectorizer.get_feature_names())

In [41]:
bow_df.shape

(32, 984)

In [16]:
bow_df.head()

Unnamed: 0,11th,11th attack,30,30 days,50,50 day,90,90 day,absolutely,absolutely absolutely,...,years,years intense,years saying,yesterday,yesterday know,youtube,youtube particular,youtube suspended,zuckerberg,zuckerberg saying
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [42]:
from sklearn.metrics.pairwise import pairwise_distances, cosine_distances

In [43]:
distances = pairwise_distances(bow, metric='cosine') 
distance_df = pd.DataFrame(distances, index=bow_df.index, columns=bow_df.index)
distance_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,22,23,24,25,26,27,28,29,30,31
0,0.0,0.933566,0.948288,0.960185,0.856741,0.953557,0.961746,0.960047,0.893921,0.962768,...,0.938089,0.956582,0.979815,0.96306,0.918457,0.95127,0.9611,0.9611,0.915128,0.961816
1,0.933566,0.0,0.937997,0.973269,0.966661,0.968819,0.974316,0.973176,0.945842,0.975002,...,0.969302,0.970849,0.986448,0.932333,0.970887,0.929197,0.960012,0.960012,0.974749,0.963718
2,0.948288,0.937997,0.0,0.921985,0.962277,0.93935,0.970939,0.969649,0.961336,0.958226,...,0.940289,0.967016,0.984666,0.960911,0.941046,0.950489,0.94573,0.94573,0.971429,0.964061
3,0.960185,0.973269,0.921985,0.0,0.936609,0.962162,0.968834,0.976632,0.952819,0.961026,...,0.973256,0.974605,0.988194,0.961415,0.934581,0.944554,0.977247,0.977247,0.978002,0.954776
4,0.856741,0.966661,0.962277,0.936609,0.0,0.926056,0.96113,0.970855,0.962873,0.97284,...,0.966646,0.968327,0.985275,0.951877,0.968368,0.975113,0.971623,0.971623,0.972565,0.929829


In [57]:
order = distance_df.loc[31, :].sort_values(ascending=True).index

In [60]:
for i in order:
    print(df.loc[i, ["show", "preview_url"]]['preview_url'])

https://archive.org/details/FOXNEWSW_20180808_060000_The_Ingraham_Angle#start/143/end/178
https://archive.org/details/CNNW_20180807_140000_CNN_Newsroom_With_John_Berman_and_Poppy_Harlow#start/3432/end/3467
https://archive.org/details/FOXNEWSW_20180808_020000_The_Ingraham_Angle#start/141/end/176
https://archive.org/details/FOXNEWSW_20180807_070000_Fox_News__Night_With_Shannon_Bream#start/2596/end/2631
https://archive.org/details/CNNW_20180811_190000_CNN_Newsroom_With_Ana_Cabrera#start/1833/end/1868
https://archive.org/details/FBC_20180812_120000_The_Journal_Editorial_Report#start/1999/end/2034
https://archive.org/details/FOXNEWSW_20180811_190000_Americas_News_HQ#start/1998/end/2033
https://archive.org/details/CNBC_20180810_150000_Squawk_Alley#start/1096/end/1131
https://archive.org/details/FBC_20180811_220000_The_Journal_Editorial_Report#start/1998/end/2033
https://archive.org/details/CNBC_20180810_150000_Squawk_Alley#start/1143/end/1178
https://archive.org/details/CNBC_20180807_150000_