### Problem Statement - Search for news articles that mention 'bitcoin'

In [1]:
#Create an API key for NewsApi.org ( or find another site where they are serving News)
#Use that API to create queries.
#Understand the API – Pick a topic , for example: All articles published by the Wall Street Journal in the last 6 months, sorted by recent first
#Fetch the recent articles published about topic you have selected.
#For each article extract the followings– [source-id , source-name, author, title, description, content]
#Convert the JSON output to data frame with each parameter as a column and each article as a row.

In [73]:
import json
import urllib.request
import pandas as pd
from datetime import datetime

#### Here we are only getting 100 news articles beyond which the website asks to upgrade the plan. So, let's get the latest hundred articles.

In [74]:
#newsURL = "http://newsapi.org/v2/top-headlines?country=us&category=business&apiKey=b9fe15a99b3a44df803f8fe5fc9beb23"
#newsURL = "http://newsapi.org/v2/top-headlines?country=us&pageSize=100&apiKey=b9fe15a99b3a44df803f8fe5fc9beb23"
newsURL = ('http://newsapi.org/v2/everything?q=bitcoin&from=2020-07-16&sortBy=publishedAt&pageSize=100&'
           'apiKey=b9fe15a99b3a44df803f8fe5fc9beb23')
newsData = json.load(urllib.request.urlopen(newsURL)) # Load the json response into variable
#print(newsData)
#print(newsData['articles'])
#print(newsData['articles'][0])

#### Define an empty dataframe with required columns and then populate the same with news articles

In [75]:
# Define an empty dataframe with required columns
df = pd.DataFrame(columns = ['source-id', 'source-name', 'author', 'title', 'description', 'content','url','publishedAt'])

# Let's populate the dataframe columns in for loop
for article in newsData['articles']:
    df = df.append({'source-id' :  article['source']['id'], 'source-name' : article['source']['name'], \
                    'author' : article['author'], 'title' : article['title'], 'description' : article['description'],\
                    'content' : article['content'], 'url' : article['url'], 'publishedAt' : article['publishedAt'] }, \
                   ignore_index = True)
    
#print(df)

#### Though we have got the data sorted based on publish date (publishedAT) but let's sort the data explicitly to avoid any shuffling

In [76]:
#date1 = datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ") 
#print(date1)
#date2 = datetime.now().strftime("%Y-%m-%d::%H:%M:%S") 
#print(date2)
df['publishedAt'] = pd.to_datetime(df['publishedAt'], format="%Y-%m-%dT%H:%M:%SZ")
df.sort_values(by=['publishedAt'], inplace=True, ascending=False)

#### Printing the final dataframe before writing into CSV file

In [77]:
df

Unnamed: 0,source-id,source-name,author,title,description,content,url,publishedAt
0,,Boing Boing,Boing Boing's Shop,Here are 10 good reasons you should hop on the...,"Ever since the great TP shortage of 2020, it's...","Ever since the great TP shortage of 2020, it's...",https://boingboing.net/2020/08/15/here-are-10-...,2020-08-16 02:00:00
1,,Yahoo Entertainment,Bob Mason,The Crypto Daily – Movers and Shakers – August...,"It’s a bearish start to the day, Failure to mo...","Bitcoin, BTC to USD, rose by 0.70% on Saturday...",https://finance.yahoo.com/news/crypto-daily-mo...,2020-08-16 00:28:22
2,,newsBTC,Nick Chong,Is Bitcoin Really In a Bull Market? Here’s Why...,Bitcoin has done extremely well over the past ...,Bitcoin has done extremely well over the past ...,https://www.newsbtc.com/2020/08/16/bitcoin-rea...,2020-08-16 00:00:19
3,,Bitcoinist,Nick Chong,These 4 Factors Show Why It’s Hard to Be Beari...,There remain some Bitcoin bears despite the on...,<ul><li>There remain some Bitcoin bears despit...,https://bitcoinist.com/5-factors-hard-bearish-...,2020-08-15 23:59:14
4,,Cointelegraph,Cointelegraph By Ray Salmond,Bulls Stampede Toward $12K Bitcoin Price as We...,Bitcoin price continues to meet resistance at ...,<ul><li>Bitcoin price is making a strong push ...,https://cointelegraph.com/news/bulls-stampede-...,2020-08-15 23:32:00
...,...,...,...,...,...,...,...,...
95,business-insider,Business Insider,Emily Graffeo,"Gold 'certainly' could soar as high as $3000, ...",<ul>\n<li>Michael Novogratz told Bloomberg the...,REUTERS/Rick Wilking\r\n<ul><li>Michael Novogr...,https://www.businessinsider.com/gold-price-out...,2020-08-14 20:13:44
96,,newsBTC,Tony Spilotro,Indicator Shows XRP Has Most Explosive Momentu...,"XRP’s recent rally has taken a pause, leaving ...","XRP’s recent rally has taken a pause, leaving ...",https://www.newsbtc.com/2020/08/14/xrp-strengt...,2020-08-14 20:00:23
97,,PRNewswire,,"Controllable Load Resource (""CLR"") Market Lead...","HOUSTON, Aug. 14, 2020 /PRNewswire/ -- Lancium...","HOUSTON, Aug. 14, 2020 /PRNewswire/ -- Lancium...",https://www.prnewswire.com/news-releases/contr...,2020-08-14 19:57:00
98,,Yahoo Entertainment,PR Newswire,"Controllable Load Resource (""CLR"") Market Lead...","Lancium LLC, the leader in data center power r...","HOUSTON, Aug. 14, 2020 /PRNewswire/ -- Lancium...",https://news.yahoo.com/controllable-load-clr-m...,2020-08-14 19:57:00


#### Write the dataframe to CSV file

In [78]:
df.to_csv('./NewsApi_Anuroop_Ajmera.csv', index=False)