# My Breakfast Papers

An hourly updated news feed with custom filters powered by [Treebeard](https://treebeard.io), [News API](https://newsapi.org/)

## Setup

In [1]:
# !pip install jupyter_dashboards
# !jupyter dashboards quick-setup --sys-prefix

In [6]:
# !pip install -U treebeard
!treebeard version

[0m0.0.47
[0m[0m

Add my News API key in a file named `secrets.json` which I securely store using Treebeard secrets store.  
I also add `secrets.json` to my `.gitignore` file so it doesn't get committed to a repository.

In [3]:
# !treebeard secrets push -f secrets.json

In [17]:
import json
with open("secrets.json") as f:
    secrets = json.loads(f.read())

In [18]:
# import some libraries that I'll be using
import requests
import pandas as pd

## Call News API and get some news

News API recommend using the API key as the value with the `X-Api-Key` or `Authorization` HTTP headers, rather than as part of a query string request.

In [19]:
# construct headers to query News API
headers = {"Authorization": f"{secrets['newsapi_key']}"}

News API offers a few different endpoints, like `/v2/top-headlines` for the most popular and breaking news stories with country filters. I'd like to cast a wide net and then filter in here, so I could query `/v2/everything` - but then I'd have to set query parameters.  
I'd rather take whatever news items come and create some exclusion rules after that.  
Their [documentation](https://newsapi.org/docs/endpoints/everything) is very informative, breaking down the parameters that are available.

In [20]:
query = "cake"
country = "gb"
everything_url = f"http://newsapi.org/v2/top-headlines?query={query}"
top_url = f"http://newsapi.org/v2/top-headlines?country={country}"

In [21]:
r = requests.get(top_url, headers=headers)

In [22]:
stories = pd.DataFrame(json.loads(r.text)['articles'])
stories.head(1)

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': None, 'name': 'Sky.com'}",Greg Heffer,Budget 2020: Chancellor Rishi Sunak's plan to ...,The government will provide a £30bn stimulus t...,https://news.sky.com/story/coronavirus-uk-one-...,https://e3.365dm.com/20/03/1600x900/skynews-ri...,2020-03-11T12:45:00Z,


In [23]:
# clean up the source column
stories['id'] = stories['source'].apply(lambda x: x['id'])
stories['name'] = stories['source'].apply(lambda x: x['name'])
stories = stories.drop('source', axis=1)

In [24]:
stories.head(5)

Unnamed: 0,author,title,description,url,urlToImage,publishedAt,content,id,name
0,Greg Heffer,Budget 2020: Chancellor Rishi Sunak's plan to ...,The government will provide a £30bn stimulus t...,https://news.sky.com/story/coronavirus-uk-one-...,https://e3.365dm.com/20/03/1600x900/skynews-ri...,2020-03-11T12:45:00Z,,,Sky.com
1,Alex Matthews,MP Nadine Dorries sparks Parliament coronaviru...,,http://www.thesun.co.uk/news/politics/11146410...,https://www.thesun.co.uk/wp-content/uploads/20...,2020-03-11T12:39:28Z,NADINE Dorries has sparked more Parliament cor...,,Thesun.co.uk
2,Samantha Lock,"British woman, 53, killed by coronavirus in Ba...",,http://www.thesun.co.uk/news/11146278/british-...,https://www.thesun.co.uk/wp-content/uploads/20...,2020-03-11T12:39:26Z,A BRITISH woman has died in Bali after contrac...,,Thesun.co.uk
3,https://www.facebook.com/bbcnews,Coronavirus: Up to 70% of Germany could become...,The chancellor warns that as many as 58 millio...,https://www.bbc.co.uk/news/world-us-canada-518...,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-03-11T12:06:41Z,Image copyrightReutersImage caption\r\n Chance...,bbc-news,BBC News
4,Adam Marshall,United's training squad ahead of LASK tie - Ma...,See which players were spotted by our cameras ...,https://www.manutd.com/en/news/detail/man-utd-...,https://www.manutd.com/AssetPicker/images/0/0/...,2020-03-11T12:06:23Z,"Training squad: David De Gea, Sergio Romero, N...",,Manutd.com


We're in business  

News API also offer a convenient `sources` endpoint showing what feeds in to top-headlines. I'll use that to power a dashboard where I can choose which sources to include.

In [25]:
sources_url = "http://newsapi.org/v2/sources"
r = requests.get(sources_url, headers=headers)
sources = pd.DataFrame(json.loads(r.text)['sources'])
sources.head()

Unnamed: 0,id,name,description,url,category,language,country
0,abc-news,ABC News,"Your trusted source for breaking news, analysi...",https://abcnews.go.com,general,en,us
1,abc-news-au,ABC News (AU),"Australia's most trusted source of local, nati...",http://www.abc.net.au/news,general,en,au
2,aftenposten,Aftenposten,Norges ledende nettavis med alltid oppdaterte ...,https://www.aftenposten.no,general,no,no
3,al-jazeera-english,Al Jazeera English,"News, analysis from the Middle East and worldw...",http://www.aljazeera.com,general,en,us
4,ansa,ANSA.it,"Agenzia ANSA: ultime notizie, foto, video e ap...",http://www.ansa.it,general,it,it


Straight away it's clear I should filter for language as regretfully I can only read English.  
Let's do that and get UK sources as well.

In [26]:
sources[(sources['language']=='en') & (sources['country']=='gb')]

Unnamed: 0,id,name,description,url,category,language,country
11,bbc-news,BBC News,"Use BBC News for up-to-the-minute news, breaki...",http://www.bbc.co.uk/news,general,en,gb
12,bbc-sport,BBC Sport,The home of BBC Sport online. Includes live sp...,http://www.bbc.co.uk/sport,sports,en,gb
19,business-insider-uk,Business Insider (UK),Business Insider is a fast-growing business si...,http://uk.businessinsider.com,business,en,gb
38,four-four-two,FourFourTwo,"The latest football news, in-depth features, t...",http://www.fourfourtwo.com/news,sports,en,gb
53,google-news-uk,Google News (UK),"Comprehensive, up-to-date UK news coverage, ag...",https://news.google.com,general,en,gb
60,independent,Independent,National morning quality (tabloid) includes fr...,http://www.independent.co.uk,general,en,gb
76,mtv-news-uk,MTV News (UK),"All the latest celebrity news, gossip, exclusi...",http://www.mtv.co.uk/news,entertainment,en,gb
102,talksport,TalkSport,Tune in to the world's biggest sports radio st...,http://talksport.com,sports,en,gb
113,the-lad-bible,The Lad Bible,The LAD Bible is one of the largest community ...,https://www.theladbible.com,entertainment,en,gb
115,the-sport-bible,The Sport Bible,TheSPORTbible is one of the largest communitie...,https://www.thesportbible.com,sports,en,gb


Interesting - there are newspapers I would expect to see that are missing.  
I think this source list changes frequently as the top headlines change.

In [27]:
sources[sources['url'].str.contains('Sky')]

Unnamed: 0,id,name,description,url,category,language,country


In [28]:
stories[stories['name'].str.contains('Sky')].head(1)

Unnamed: 0,author,title,description,url,urlToImage,publishedAt,content,id,name
0,Greg Heffer,Budget 2020: Chancellor Rishi Sunak's plan to ...,The government will provide a £30bn stimulus t...,https://news.sky.com/story/coronavirus-uk-one-...,https://e3.365dm.com/20/03/1600x900/skynews-ri...,2020-03-11T12:45:00Z,,,Sky.com


Another thing I don't understand - the top stories have sources that don't apear in the sources list.  
Maybe I need to give the `sources` endpoint the `country` code as well?

In [29]:
sources_url = f"http://newsapi.org/v2/sources?country={country}"
r = requests.get(sources_url, headers=headers)
sources = pd.DataFrame(json.loads(r.text)['sources'])

In [30]:
sources[sources['url'].str.contains('Sky')]

Unnamed: 0,id,name,description,url,category,language,country


Nope, still doesn't find em. 

In [31]:
sources

Unnamed: 0,id,name,description,url,category,language,country
0,bbc-news,BBC News,"Use BBC News for up-to-the-minute news, breaki...",http://www.bbc.co.uk/news,general,en,gb
1,bbc-sport,BBC Sport,The home of BBC Sport online. Includes live sp...,http://www.bbc.co.uk/sport,sports,en,gb
2,business-insider-uk,Business Insider (UK),Business Insider is a fast-growing business si...,http://uk.businessinsider.com,business,en,gb
3,four-four-two,FourFourTwo,"The latest football news, in-depth features, t...",http://www.fourfourtwo.com/news,sports,en,gb
4,google-news-uk,Google News (UK),"Comprehensive, up-to-date UK news coverage, ag...",https://news.google.com,general,en,gb
5,independent,Independent,National morning quality (tabloid) includes fr...,http://www.independent.co.uk,general,en,gb
6,mtv-news-uk,MTV News (UK),"All the latest celebrity news, gossip, exclusi...",http://www.mtv.co.uk/news,entertainment,en,gb
7,talksport,TalkSport,Tune in to the world's biggest sports radio st...,http://talksport.com,sports,en,gb
8,the-lad-bible,The Lad Bible,The LAD Bible is one of the largest community ...,https://www.theladbible.com,entertainment,en,gb
9,the-sport-bible,The Sport Bible,TheSPORTbible is one of the largest communitie...,https://www.thesportbible.com,sports,en,gb


This is just a filtered version of the previous list.  
It's a mystery for now. 

## Get the news I want

In [32]:
language = 'en'
my_sources = []
banned_topics = []