## NEWS DATA FETCHING
* We are going to fetch news based on category from a news api using the `requests` python module.
> [API](https://newsapi.org/)

In [81]:
import requests
import json
import pandas as pd
import numpy as np
import csv

In [30]:
class News:
    def __init__(self, urlToImage, author, title, description, url, publishedAt, content):
        self.urlToImage = urlToImage
        self.author = author
        self.title = title
        self.description = description
        self.url = url
        self.content = content

## Get all countries
> get the list of countries in the `countries.csv` file.

In [48]:
df = pd.read_csv('countries.csv')
countries =[
'ae','ar','at','au','be','bg', 'br',
'ca','ch','cn','co','cu','cz', 'de', 'eg', 'fr', 'gb',
'gr','hk', 'hu', 'id','ie','il','in','it','jp','kr','lt','lv',
'ma','mx','my','ng','nl','no','nz','ph','pl','pt','ro','rs','ru','sa','se','sg','si','sk',
'th','tr','tw','ua','us','ve','za'
]
countries[:2]

['ae', 'ar']

In [71]:
countries = ['za', 'us', 'gb', 'eg', 'fr', 'rs', 'gr', 'ca', 'ng', 'ru','sa','cn','co','no']

In [72]:
len(countries)

14

In [73]:
API_KEY = "bb90b68b8d384ef78bc42501584aea44"
categories = ["business","entertainment","general","health","science","sports","technology"]

In [74]:
END_POINT = "https://newsapi.org/v2/top-headlines?country=us&category=business&apiKey=150acdfbe4964f5b94b9c5fab701191b&pageSize=100"

## Now let's fetch data and create 

In [76]:
news = []

for category in categories:
    for country in countries:
        END_POINT = f"https://newsapi.org/v2/top-headlines?country={country}&category={category}&apiKey={API_KEY}&pageSize=100"
        res = requests.get(END_POINT)
        data = json.loads(res.content)["articles"]
        for new in data:
            news_dict ={
                "category": category.upper(),
                "country_code": country,
                'urlToImage': new['urlToImage'],
                'author': new['author'], 
                'title': new['title'], 
                'description':new['description'],
                'url':new['url'], 
                'publishedAt':new['publishedAt'], 
                'content':new['content']
            }
            news.append(news_dict)

In [77]:
len(news)

5279

> Now we have `5279` news from 14 different countries and 7 different categories.

In [78]:
news[0]

{'category': 'BUSINESS',
 'country_code': 'za',
 'urlToImage': 'https://cdn.24.co.za/files/Cms/General/d/6839/a8b85dc80ca641c78aa261c139ce8da5.jpg',
 'author': None,
 'title': 'CEO of $2bn start-up ousted for microdosing LSD at work - News24',
 'description': '',
 'url': 'https://www.news24.com/fin24/economy/world/ceo-of-2bn-start-up-ousted-for-microdosing-lsd-at-work-20210428',
 'publishedAt': '2021-04-28T07:15:39Z',
 'content': 'Marketing startup Iterable dismissed its chief executive officer over violations of company policy, Iterable said in a note to employees on Monday.\xa0\r\nThe fired CEO, Justin Zhu, said the board’s chief… [+1553 chars]'}

### Save News
> We are going to save our news as `news_categories.csv`

In [79]:
path_name = "news_categories.csv"
keys = news[0].keys()
keys

dict_keys(['category', 'country_code', 'urlToImage', 'author', 'title', 'description', 'url', 'publishedAt', 'content'])

In [91]:
with open(path_name, 'w', newline='', encoding="utf-8")  as writter:
    dict_writer = csv.DictWriter(writter, keys)
    dict_writer.writeheader()
    dict_writer.writerows(news)

print("A NEWS csv FILE HAS BEEN CREATED!!")

A NEWS csv FILE HAS BEEN CREATED!!


> Done fetching data, now we need to do some cleaning on the data