## Automate news article collection for training the model using web scraper 

#### Import libraries for web scraping

In [1]:
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

In this example, we demonstrate web scraping of news articles on the web application **Factly**. You can use any application of your choice

In [2]:
url = "https://factly.in/category/english/"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req)

In [3]:
html = webpage.read().decode("utf-8")
soup = BeautifulSoup(html, "html.parser")

The HTML document of the page was analysed and as a result the below logic is applied to scrape the articles.

In [4]:
articles = [article.find("h2").find("a").get("href") for article in soup.find(class_ = "col-8 main-content").find_all("article")]

In [5]:
articles

['https://factly.in/video-showing-golden-lord-meenakshi-idol-at-chemmanur-international-jewellers-shared-as-from-madurai-meenakshi-temple/',
 'https://factly.in/this-times-business-newspaper-clipping-alleging-losses-in-5g-auction-is-an-edited-one/',
 'https://factly.in/no-amazon-has-not-launched-bitcoinprime-the-link-given-in-this-post-is-fraudulent/',
 'https://factly.in/rishi-sunak-is-an-indian-origin-british-mp-who-right-now-is-just-a-contender-to-become-the-next-pm-of-the-uk/',
 'https://factly.in/the-shivalingam-seen-in-this-video-is-located-in-malaysia-not-tamil-nadu/',
 'https://factly.in/an-old-video-of-jaipur-floods-is-being-shared-as-that-of-recent-jodhpur-floods/',
 'https://factly.in/old-video-of-justin-trudeau-before-he-became-the-pm-of-canada-shared-as-him-living-now-without-security/',
 'https://factly.in/ram-nath-kovind-has-not-made-these-statements-praising-mahatma-gandhi-and-the-congress-party/',
 'https://factly.in/ruchi-soyas-debt-of-%e2%82%b92212-crores-was-written

In [6]:
for article_url in articles:
    if "review:" not in article_url and "data:" not in article_url and "explainer:" not in article_url:
        article_req = Request(article_url, headers={'User-Agent': 'Mozilla/5.0'})
        article_page = urlopen(article_req)
        article_html = article_page.read().decode("utf-8")
        article_soup = BeautifulSoup(article_html, "html.parser")
        article_content = article_soup.find(class_ = "post-content-right").get_text()
        article_nature = article_soup.find("blockquote").find_all("strong")[2].get_text()
        print(f"Article: \n{article_content}\n")
        print(f"Nature of the article: {article_nature}")
        print("\n\n")

Article: 
A video is being shared on social media claiming it as the visuals of the Madurai Meenakshi idol decorated in golden attire. This post claims that golden attire will be offered to the Madurai Meenakshi idol once a year in the Arulmigu Meenakshi Sundareshwarar Temple in Madurai. Let’s verify the claim made in the post.Claim: Visuals of the Madurai Meenakshi idol when decorated with golden attire.Fact: The video shared in the post shows Madurai Meenakshi idol displayed at Chemmanur International Jewellers showroom in Madurai, Tamil Nadu. This idol was carved using 210 kg of gold and silver. This video does not show the Meenakshi idol inside the Arulmigu Meenakshi Sundareshwarar Temple in Madurai. Hence, the claim made in the post is MISLEADING. On reverse image searching the screenshots in the video, a similar picture of a golden idol was found on the Getty Images website. The description of the photo states, “Idol of Goddess Madurai Meenakshi Amman made from 210kg of pure gold

Article: 
A post accompanied by a video showing a Shivalingam is being shared on social media. The post claims that the Shivalingam is located in Tamil Nadu. It further claims that throughout the year, water from the Sky falls only on the Shivalingam, leaving its surroundings. Let’s fact-check these claims in this article.Claim: Water from the Sky falls only on this Shivalingam throughout the year, and this is located in Tamil NaduFact: The Shivalingam seen in the video is located in Malaysia. It does not rain throughout the year on the Shivalingam. Hence the claims made in the post are MISLEADING.A reverse image search with the screenshots of the video led us to a few YouTube videos. One of these videos is the same as the one in the viral post, and its title, written in Tamil, mentions the location as Cork highway, Malaysia. Another video showing the exact Shivalingam titled Shivan Temple (Bentong Karak Old Road) was also found.We searched on the internet with these clues and relevant

Article: 
A post is being widely shared on social media claiming that the loan of  ₹2,212 crores taken by Baba Ramdev’s company Ruchi Soya was waived off. Let’s fact-check the claim made in the post.Claim: Ramdev baba owned Ruchi soya company’s debt of ₹2,212 crores was waived off.Fact: According to the RBI report, Ruchi soya’s debt of ₹2,212 crores were technically written off, which means, the lending banks clean up the bad loans from their balance sheet. However, the loan account still stays to continue with the lending bank as they can try to recover it later. Moreover, in December 2017, Ruchi Soya Industries entered the Corporate Insolvency Resolution Process because of its total debt of about ₹12,000 crores. In December 2019, Patanjali Ayurved, co-founded by Baba Ramdev acquired the debt ridden Ruchi Soya for ₹4,350 crores. Hence, the claim made in the post is MISLEADING.We we searched for the list of the ‘Top 50 Wilful defaulters in India’ and found the details in the Twitter ac