# Exploring GDELT 2.0 TV News API

In this notebook, I will be playing with the API to see what the optimal use is, especially for recommendations. I will list any notes, problems, and discoveries here:

- Example:
    - https://api.gdeltproject.org/api/v2/tv/tv?query=Kris%20Kobach%20Jeff%20Colyer%20market:%22National%22&mode=clipgallery&format=json&datanorm=perc&timelinesmooth=0&datacomb=sep&last24=yes&timezoom=yes&TIMESPAN=14days#

In [62]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
import requests
import json
import csv

Assume at this point, we already have parsed an article for entities to run a search query.  

In [37]:
query = "alex jones social media"
mode = "json"
url = f'https://api.gdeltproject.org/api/v2/tv/tv?query={query}%20market:%22National%22&mode=clipgallery&format={mode}&datanorm=perc&timelinesmooth=0&datacomb=sep&last24=yes&timezoom=yes&TIMESPAN=14days#'

In [38]:
res = requests.get(url)
print(res.status_code)

200


In [39]:
df = pd.DataFrame(json.loads(res.text)['clips'])

In [77]:
df.shape

(14, 8)

In [40]:
df.head()

Unnamed: 0,date,ia_show_id,preview_thumb,preview_url,show,show_date,snippet,station
0,20180811T222716Z,FBC_20180811_220000_The_Journal_Editorial_Report,https://archive.org/download/FBC_20180811_2200...,https://archive.org/details/FBC_20180811_22000...,The Journal Editorial Report,20180811T220000Z,ruling rulers as a result of this as well. pau...,FOX Business
1,20180812T122717Z,FBC_20180812_120000_The_Journal_Editorial_Report,https://archive.org/download/FBC_20180812_1200...,https://archive.org/details/FBC_20180812_12000...,The Journal Editorial Report,20180812T120000Z,ruling rulers as a result of this as well. pau...,FOX Business
2,20180810T194547Z,MSNBCW_20180810_190000_MSNBC_Live_With_Ali_Velshi,https://archive.org/download/MSNBCW_20180810_1...,https://archive.org/details/MSNBCW_20180810_19...,MSNBC Live With Ali Velshi,20180810T190000Z,out over infowars on various social media. fal...,MSNBC
3,20180808T002435Z,FBC_20180808_000000_Kennedy,https://archive.org/download/FBC_20180808_0000...,https://archive.org/details/FBC_20180808_00000...,Kennedy,20180808T000000Z,"kennedy: we have to pay the bills, congressman...",FOX Business
4,20180808T042436Z,FBC_20180808_040000_Kennedy,https://archive.org/download/FBC_20180808_0400...,https://archive.org/details/FBC_20180808_04000...,Kennedy,20180808T040000Z,"kennedy: we have to pay the bills, congressman...",FOX Business


In [45]:
snippet_matrix = df.snippet.as_matrix()

In [72]:
vectorizer = TfidfVectorizer(stop_words="english", ngram_range=(1,2) )
bow = vectorizer.fit_transform(snippet_matrix)

In [73]:
bow_df = pd.DataFrame(bow.todense(), columns=vectorizer.get_feature_names())

In [74]:
bow_df.shape

(14, 266)

In [76]:
bow_df.head()

Unnamed: 0,89,89 days,ahead,ahead facebook,al,al social,alex,alex jones,allowing,allowing conspiracy,...,virtually al,wars,wars anger,wars crowd,wasn,wasn just,yeah,yeah define,youtube,youtube banned
0,0.0,0.0,0.149832,0.149832,0.0,0.0,0.064534,0.064534,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.149832,0.149832,0.0,0.0,0.064534,0.064534,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.047312,0.047312,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.080051,0.080051,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.080051,0.080051,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
