<a href="https://colab.research.google.com/github/jgamel/learn_n_dev/blob/python_web_scrapping/GoogleNews_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Google News

PyGoogleNews, created by the NewsCatcher Team, acts like a Python wrapper for Google News or an unofficial Google News API. It is based on one simple trick: it exploits a lightweight Google News RSS feed.

What data points can it fetch for you?

* Top stories
* Topic-related news feeds
* Geolocation specific news feed
* An extensive query-based search feed

Mount Drive:

In [3]:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

Mounted at /content/gdrive


In [4]:
import sys
sys.path.append('/content/gdrive/My Drive/Colab Notebooks')

### Example 1:

Pull Top Headlines from Google News

In [3]:
from pygooglenews import GoogleNews
import json
import time

gn = GoogleNews()
top = gn.top_news()

entries = top["entries"]
count = 0
for entry in entries:
  count = count + 1
  print(
    str(count) + ". " + entry["title"] + entry["published"]
  )
  time.sleep(0.25)

1. One million refugees flee Ukraine as Russia escalates bombardment of key cities - CNNThu, 03 Mar 2022 14:21:00 GMT
2. Former Illinois House Speaker Michael Madigan Charged With Racketeering - NBC ChicagoThu, 03 Mar 2022 03:21:55 GMT
3. Jan. 6 Committee Lays Out Potential Criminal Charges Against Trump - The New York TimesThu, 03 Mar 2022 12:34:00 GMT
4. Russian forces seize key Ukrainian port, pressure others - The Associated Press - en EspañolThu, 03 Mar 2022 13:23:45 GMT
5. Russian aircraft now banned from US airspace - NPRThu, 03 Mar 2022 04:57:00 GMT
6. Putin's War in Ukraine Has Russian Oligarchs Losing Power—and Billions - BloombergThu, 03 Mar 2022 08:57:00 GMT
7. Adopted parents charged with killing toddlers months before reporting them missing - New York PostThu, 03 Mar 2022 05:40:00 GMT
8. Santa Fe police officer, another driver killed in I-25 chase - Santa Fe New MexicanThu, 03 Mar 2022 02:33:12 GMT
9. Ex-cop says he 'absolutely' did nothing wrong in Breonna Taylor raid | 

The code above shows how you can extract certain data points from the top news articles in the Google RSS feed. You can replace the code “gn.top_news()” with “gn.topic_headlines('business')” to get the top headlines related to “Business” or you could have replaced it with “gn.geo_headlines('San Fran')” to get the top news in the San Fransisco region.

You can also use complex queries such as “gn.search('boeing OR airbus')” to find news articles mentioning Boeing or Airbus or “gn.search('boeing -airbus')” to find all news articles that mention Boeing but not Airbus.

When web-scraping news articles with this library, for every news entry that you capture, you get the following data points, that you can use for data processing, or training your machine learning model, or running NLP scripts:

* Title - contains the Headline for the article
* Link - the original link for the article
* Published - the date on which it was published
* Summary - the article summary
* Source - the website on which it was published
* Sub-Articles - list of titles, publishers, and links that are on the same topic

We extracted just a few of the available data points, but you can extract the others as well, based on your requirements. Here’s a small example of the results produced by complex queries.

If you run the code below:

In [13]:
from pygooglenews import GoogleNews
import json
import time

gn = GoogleNews()
top = gn.topic_headlines('business')

entries = top["entries"]
count = 0
for entry in entries:
  count = count + 1
  print(
    str(count) + ". " + entry["title"] + entry["published"] + entry["link"]
  )
  time.sleep(0.25)

1. Stocks Turn Lower; Oil Briefly Tops $116 - The Wall Street JournalThu, 03 Mar 2022 15:59:00 GMThttps://news.google.com/__i/rss/rd/articles/CBMiVGh0dHBzOi8vd3d3Lndzai5jb20vYXJ0aWNsZXMvZ2xvYmFsLXN0b2Nrcy1tYXJrZXRzLWRvdy11cGRhdGUtMDMtMDMtMjAyMi0xMTY0NjI5NjQ1MtIBAA?oc=5
2. LIVE: Federal Reserve Chair Powell testifies on monetary policy before Senate committee — 3/3/22 - CNBC TelevisionThu, 03 Mar 2022 14:56:43 GMThttps://news.google.com/__i/rss/rd/articles/CBMiK2h0dHBzOi8vd3d3LnlvdXR1YmUuY29tL3dhdGNoP3Y9TC1lWnJNcHYxQ0XSAQA?oc=5
3. Wall Street praises Ford's EV plans but questions its sales and profit margin targets - CNBCThu, 03 Mar 2022 12:48:08 GMThttps://news.google.com/__i/rss/rd/articles/CBMiamh0dHBzOi8vd3d3LmNuYmMuY29tLzIwMjIvMDMvMDMvd2FsbC1zdHJlZXQtcHJhaXNlcy1mb3Jkcy1ldi1wbGFucy1xdWVzdGlvbnMtc2FsZXMtYW5kLXByb2ZpdC10YXJnZXRzLmh0bWzSAQA?oc=5
4. Volkswagen stops car production in Russia and suspends shipments - CNNThu, 03 Mar 2022 16:45:00 GMThttps://news.google.com/__i/rss/rd/artic

In [9]:
from pygooglenews import GoogleNews
import json
import time

gn = GoogleNews()
top = gn.geo_headlines('Saint Louis') 

entries = top["entries"]
count = 0
for entry in entries:
  count = count + 1
  print(
    str(count) + ". " + entry["title"] + entry["published"]
  )
  time.sleep(0.25)



1. Find a St. Louis fish fry with this 2022 map for Lent - KTVI Fox 2 St. LouisWed, 02 Mar 2022 19:44:32 GMT
2. Where car break-ins are most common and what's being done about them - KSDK.comWed, 02 Mar 2022 22:15:38 GMT
3. St. Louis alderman says police chief offered to void his traffic-stop ticket - St. Louis Post-DispatchThu, 03 Mar 2022 00:15:00 GMT
4. Gov. Parson visits St. Louis area to announce education partnership with McDonald’s - KTVI Fox 2 St. LouisWed, 02 Mar 2022 23:52:23 GMT
5. 'Oh, to be a thorn!' St. Louis artisans make patches, T-shirts to protest local book banning - St. Louis Post-DispatchThu, 03 Mar 2022 13:40:00 GMT
6. Cheers, applause: Emotional video shows St. Louis hospital thanking Navy doctors - KSDK.comWed, 02 Mar 2022 17:32:00 GMT
7. Trapped In Traffic With the Insufferable 'People's Convoy' - Riverfront TimesWed, 02 Mar 2022 14:35:54 GMT
8. Days says proposal for north St. Louis County recreation facility 'close,' but won't release details - St. Louis Post

In [None]:
from pygooglenews import GoogleNews

gn = GoogleNews()
s = gn.search('russia -putin') 


for entry in s["entries"]:
    print(entry["title"])

UPS and FedEx halting shipments to Russia and Ukraine - Reuters
FIFA impose measures on Russia in response to its invasion of Ukraine - The Athletic
Factbox-Companies With Exposure to Russia - U.S. News & World Report
Ukrainian minister says Russia lost some 4300 men in invasion - Reuters
The 'unprecedented' sanctions on Russia could make war unsustainable, expert says - NPR
Analyzing the state of Russia's military - NPR
Russia continues to advance on Kyiv in attempt to topple Ukrainian government - NPR
3 ways Russia's invasion of Ukraine will impact the American economy - NBC News
U.S. banks prepare for cyber attacks after latest Russia sanctions - Reuters
SWIFT ban prevents Russia from moving money easily. It also has unintended effects - NPR
Boxing's governing organizations won't sanction any title bouts in Russia due to invasion of Ukraine - ESPN
France urges its citizens making short-term visits to Russia to leave - Reuters
Biden sanctions spare Russia's energy sector. What that m

### Example 2:

Search Google News and Save to CSV File

In [None]:
import pandas as pd
import csv
from pygooglenews import GoogleNews

gn = GoogleNews (lang = 'en', country = 'UK') 

russiasearch = gn.search('intitle:russia', helper = True, from_ = '2022-01-01', to_= '2022-12-31')

print(russiasearch['feed'].title)

for item in russiasearch ['entries']:
  print(item['title'])

with open('/content/gdrive/My Drive/russia_search.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['russiasearch'])  # I presume you meant this to be your header
    
    # Use your loop from before...
    for item in russiasearch ['entries']:
        # And write each item
        writer.writerow([item['title']])

file.close()

"intitle:russia after:2022-01-01 before:2022-12-31" - Google News
Russia-Ukraine live updates: Kyiv to hold talks with Moscow - Al Jazeera English
Russian ex-official: Putin's plan is full victory by March 2 - Al Jazeera English
Russia homes in on Kyiv and Kharkiv and pushes across Black Sea coast - Financial Times
Ukraine, Russia agree to talks ‘without preconditions’: Zelenskyy - Al Jazeera English
Putin signals escalation as he puts Russia’s nuclear force on high alert - The Guardian
As Russia invades Ukraine, Iraqis remember painful war memories - Al Jazeera English
More than 2,000 arrested at anti-war protests in Russia - Al Jazeera English
Ukraine appeals for foreign volunteers to join fight against Russia - The Guardian
Two top Russian billionaires speak out against invasion of Ukraine - The Guardian
Russia’s invasion of Ukraine: List of key developments from Day 4 - Al Jazeera English
‘A global financial pariah’: how central bank sanctions could hobble Russia - Financial Times
