## **Using pygooglenews (Unoffical Google News API)**

In [1]:
from pygooglenews import GoogleNews
import json
import time

In [2]:
# create a google news object
gn = GoogleNews()
results = gn.search('Apple company latest news')
for entry in results['entries']:
    print(entry['title'])

Apple’s Alibaba A.I. Deal Provokes Washington’s Resistance - The New York Times
Universal Music Group and Apple Music announce Sound Therapy - Apple
Apple blocks Fortnite game on iPhones, video game company says - CBS News
Trump Not Happy With Apple Moving iPhone Production To India - Investor's Business Daily
Why Apple won’t find it easy to move iPhone production from India to US - Times of India
Trump says he has a ‘little problem’ with Tim Cook - CNN
Apple Pay was down — live updates on the outage - Tom's Guide
Trump tells Apple’s CEO to stop expanding iPhone production in India - TechCrunch
Trump wants Apple to make iPhones in the US. Will it ever happen? - ABC News
Apple India Manufacturing: No Change in iPhone Production Shift Plans, Says Source - Deccan Herald
Why Apple can’t afford to surrender to Trump’s latest anti-India diktat - The Economic Times
India may gain from Apple’s Inc exit if it spurs deeper manufacturing push: GTRI | Company Business News - Mint
Trump wants Apple

## **Using newspaper3k**
NewsPaper3k is a Python library for web scraping news articles by just passing the URL. A lot of the libraries that we saw before gave us the content but along with a lot of HTML tags and junk data. This library would help you fetch the content and a few more data points from almost any newspaper article on the web.

This Python web scraping library can be combined with any of the libraries above to extract the full-text body of the article.

To install run: pip install newspaper3k

```query = "Apple company news site:cnn.com OR site:bbc.com OR site:reuters.com"```

In [10]:
from googlesearch import search
from newspaper import Article
import time

In [None]:
# query to search for news only from top sites like bbc, reuter, cnn
company_name = 'Apple'
query = f"{company_name} company news site:cnn.com OR site:bbc.com OR site:reuters.com"
# Get all he possible urls
urls = search(query, num = 3) # Take standard top 5 results
# traverse each url item in urls
for url in urls:
    try:
        # instantiate a newspaper3k object
        article = Article(url)
        # download the article
        article.download()
        # parse the downloaded article - get title, url, text and other meta data
        article.parse()
        print(f"Title : {article.title}")
        print(f"Site url : {url}")
        print(f"Content : {article.text[:200]} [...]\n\n")
        time.sleep(1)
    except Exception as e:
        print('-' * 100)
        print("(done)")
    

In [None]:
# query to search for news only from top sites like bbc, reuter, cnn
company_name = 'Apple'
query = f"{company_name} company news site:cnn.com OR site:bbc.com OR site:reuters.com"
# Get all he possible urls
urls = search(query, num = 3) # Take standard top 5 results
# traverse each url item in urls
for url in urls:
    try:
        # instantiate a newspaper3k object
        article = Article(url)
        # download the article
        article.download()
        # parse the downloaded article - get title, url, text and other meta data
        article.parse()
        print(f"Title : {article.title}")
        print(f"Site url : {url}")
        print(f"Content : {article.text[:200]} [...]\n\n")
        time.sleep(1)
    except Exception as e:
        print('-' * 100)
        print("(done)")
    

In [19]:
import googlesearch
print(googlesearch.__file__)

/home/devansh-rathore/Desktop/scraping/myenv/lib/python3.12/site-packages/googlesearch/__init__.py


## **Using feedparser**
feedparser is a Python library used to parse RSS and Atom feeds. It turns an RSS or Atom XML feed into a Python dictionary-like object, making it easy to extract news headlines, links, publication dates, etc.

### What Does feedparser.parse() give us ?
It returns a dictionary-like object. The structure:
```
{
  'feed': {...},     # Metadata about the feed itself
  'entries': [...],  # A list of articles/items
  'bozo': 0 or 1,    # Whether there was a parsing error
}

```
Each entry (article) may contain:

    title: Title of the article

    link: URL to the full article

    summary: Summary or snippet

    published: Date published

    author: Author name (optional)

    media_content: Embedded media (if any)

    content: Full text (rare — usually use newspaper3k for full content)

In [1]:
import feedparser
from newspaper import Article

feed_url = "https://news.google.com/rss/search?q=apple"
feed = feedparser.parse(feed_url)

for entry in feed.entries[:5]:
    print(entry.title, entry.link)
    article = Article(entry.link)
    article.download()
    article.parse()
    print(article.text[:300])
    print('-' * 40)


"Don't Want You Building In India": Donald Trump To Apple CEO Tim Cook - NDTV https://news.google.com/rss/articles/CBMizwFBVV95cUxNalFoZ1l5MGRzaU4tNHphdEFEcDRvR09kcHA2d3lGSF9WY3dIT2pybnMydVljRnc2b0twWXA0N0Vmd3lWbHZod3c0cGJLV1VMbUhpcDdZRDN4TmI3SnFIU3l1eVVrcjdZUjhTVjJrNDdubkdQRkVuaXkzNzg2TklpYlE4a2pIZzhKczZUZDBkMjliZ0JwY0VVR1NEZEo2RkQxOGxjSlN5b1pyT0k3U29UcjdjUy1BVkpXRXg5eVM1OTNhR1FaTnRaeHhMMGdvNGfSAdcBQVVfeXFMTTlPTnVxYTByWV9YLW9Fc1p5aU1wTW5xTkZNaWlFc3oyS2lQWkZ6Y3dnbFVoQW5iWHV0SnduVEVRbFptR0psTEsyMVlzX3JwUUZzMnNNRVB0d0dnX3R6M1ozeW9MYWI1dFJ0QTFIR0dJVGZBcDUyclJaOWR4S1hzTjlWWkhJSXZQTzdhY3d3cWc1N1VWbmdJbXFGUXpsaDc3UmsxR2ZheWdHaUVNSndXM3ZlbTlBTlBhMGYtY0Y4NlJ0TXY1YlN4MV9nUTFXUnZFeGJvbnNKZ0E?oc=5

----------------------------------------
iPhone17, iPhone 17 Pro, iPhone 17 Pro Max launch: Check prices in India, Dubai and the USA, launch date and key specs - The Financial Express https://news.google.com/rss/articles/CBMigAJBVV95cUxQR2RzVk9FUjRzMzl3X1owb2lyN2oyWWJTYTFvS1ppbzVaOHhGVFpLNFZFWW5JUVVBcE