# Automatic News Scraping with Python, Newspaper and Feedparser

The problem we are trying to solve here is to extract relevant information from news articles, such as the title, author, publish date, and the main content of the article. This information can then be used for various purposes such as creating a personal news feed, analyzing trends in the news, or even creating a dataset for natural language processing tasks. In the news, or even creating a dataset for natural language processing tasks. In this article, we will look at how we can use the Python programming language, along with the Newspaper and Feedparser modules, to scrape and parse news articles from various sources.

## Automatic news scraping with Python
To solve this problem, we can use the Python programming language, along with the Newspaper and Feedparser modules. The Newspaper module is a powerful tool for extracting and parsing news articles from various sources, while the Feedparser module is useful for parsing RSS feeds. RSS (Really Simple Syndication) is a web feed that allows users and applications to access updates to websites in a standardized, computer-readable format. These updates can include blog entries, news articles, audio, video, and any other content that can be provided in a feed.
#### require module

In [None]:
!pip3 install newspaper3k
!pip install feedparser

### Some of the Important Methods are:
The Newspaper and Feedparser modules have several useful methods for extracting and parsing news articles:
- newspaper.build(): This method is used to build a newspaper object from a given URL.
- newspaper.download(): This method is used to download the HTML of a given URL.
- newspaper.parse(): This method is used to parse the HTML of a given URL and extract relevant information such as the title,   author, publish date, and main content of the article.
- feedparser.parse(): This method is used to parse an RSS feed and extract relevant information such as the title, author, publish date, and link of the article.
Now that we have an understanding of the modules and methods we will be using, let’s look at how we can use them to scrape and parse news articles from various sources.

## Code Implementation
First, we import the required modules newspaper, and feedparser. Next, we define a function called scrape_news_from_feed() which takes a feed URL as input. Inside the function, we first parse the RSS feed using the feedparser.parse() method. This returns a dictionary containing various information about the feed and its entries.

Create a newspaper article object using the newspaper.Article() constructor and passing it the link of the article. Then download and parse the article using the article.download() and article.parse() methods. Extract relevant information such as the title, author, publish date, and main content of the article. Append this information to a list of articles. Finally, the function returns the list of articles.
#### libraries

In [21]:
import newspaper
import feedparser

#### functions

In [27]:
def scrape_news_from_feed(feed_url):
    articles=[]
    feed = feedparser.parse(feed_url)
    for entry in feed.entries:
        # create a newspaper article object
        article= newspaper.Article(entry.link)
        # download ana parse the article
        article.download()
        article.parse()
        # extract relevant information
        articles.append({
            'title': article.title,
            'author': article.authors,
            'publish_date': article.publish_date,
            'context': article.text
        })
    return articles


def looking_details(articles):
    # print the extracted articles
    for article in articles:
        print('Title:', article['title'], '\n')
        print('Author:', article['author'], '\n')
        print('Publish Date:', article['publish_date'], '\n')
        print('Content:', article['context'], '\n')
        print('--------------------------------------------------------------------------------------------------')

every web pages have a special rss feed link,we can use [get rss feed url](https://chromewebstore.google.com/detail/get-rss-feed-url/kfghpdldaipanmkhfpdcjglncmilendn/related?hl=en&pli=1) to get this link,its a **chrome** browser extension and it's *free*.
for example we using home page of Newyork Times as a ***feed url*** 
#### *note*:
for getting rss link of Newyork Times homepage,as we explained above,using get rss feed chrome extension.link of this page is different and it's:

## [newyork times](http://nytimes.com/international/)

## but there are different ways to get rss feed urls:
- Look for the RSS icon: Some websites have an orange RSS icon (as shown below) to signal users where they can find the RSS feed. If you can find such an icon on the website, click it. This will take you to the RSS link.
- Get the RSS Feed from the page source: The best way to obtain the RSS Link is to find it directly in the page source. To do so, go to the website of your choice and right-click anywhere on the page. Then, click on “View page source”. You can also access the page source by using your keyboard, with Ctrl+U on Windows computers and Cmd+U on Macbooks.Once you are on the page source, press Ctrl+F (or Cmd+F) and write RSS in the search bar. If the website has an RSS feed, this should find it. Then, all you have to do is copy the link and import it.
- Get the Feed by typing the link: Sometimes, you cannot find the link inside the page source. If this is the case, you may be able to find the RSS, by guesstimating its link. The most common RSS links are made of the main domain followed one of the following options:

### /feed/
### /rss/
### /blog/feed/
### /blog/rss/
### /rss.xml
### /blog/rss.xml





In [28]:
feed_url= 'https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml'
articles= scrape_news_from_feed(feed_url)

In [29]:
looking_details(articles)

Title: Before Taking Office, L.A.’s Mayor Said She Would Not Go Abroad 

Author: ['Shawn Hubler', 'Soumya Karlamangla'] 

Publish Date: 2025-01-12 00:00:00 

Content: After the first rally in her campaign for mayor of Los Angeles in 2021, Karen Bass spoke candidly about what she saw as a potential drawback to the job — a lack of world travel and involvement in global affairs.

Ms. Bass was accustomed to circling the globe as a Democratic member of Congress and of the House Foreign Affairs Committee, and had spent decades working on U.S.-Africa relations. It was one of the most absorbing parts of her political career, she told The New York Times in an interview on Oct. 17, 2021, at her home in the Baldwin Vista neighborhood of Los Angeles.

“I went to Africa every couple of months, all the time,” she said, adding, “The idea of leaving that, especially the international work and the Africa work, I was like, ‘Mmm, I don’t think I want to do that.’”

She ultimately decided that she did, te

#### references and links:
##### https://www.geeksforgeeks.org/automatic-news-scraping-with-python-newspaper-and-feedparser/
##### https://help.socialbee.com/article/78-how-can-i-find-the-rss-feed-of-a-website#feed
##### https://newspaper.readthedocs.io/en/latest/
##### https://chromewebstore.google.com/detail/get-rss-feed-url/kfghpdldaipanmkhfpdcjglncmilendn/related?hl=en&pli=1
##### https://www.nytimes.com/international/