Web scrapping using Beautiful Soup.
Target - Kitco.com

In [116]:
import requests
from bs4 import BeautifulSoup

Load the news section and get the content of "Kitco Latest News".

In [117]:
coverpage = requests.get('https://www.kitco.com/news/').content
soup = BeautifulSoup(coverpage, 'html.parser')
front_news = soup.find_all('div', class_='news-info')
sec_news = soup.find_all('div', class_="news-list last-list")

Some articles are duplicated in the front news (with images) and the followup list. find unique and get urls.

In [119]:
unique_articles = []

for elem in front_news + sec_news[0].find_all('div', class_=""):
    ref = elem.find_all('span', class_='title')[0].find_all('a')[0]['href']
    title = elem.find_all('span', class_='title')[0].find_all('a')[0].text
    unique_articles.append((title, ref))

unique_articles = list(set(unique_articles))

Finally, read articles one by one, storing its url, title, description (if any), time published, time modified, and content.

In [120]:
from datetime import datetime

# Container class for storing news information.
class News_Piece():
    def __init__(self, url, title, descr, time_post, time_mod, text):
        self.url = url
        self.title = title
        self.description = descr
        self.time_post = time_post
        self.time_mod = time_mod
        self.text = text.strip('n')

scrapped_news = []

for elem in unique_articles:
    url = 'https://www.kitco.com' + elem[1]
        
    content = requests.get(url).content
    soup = BeautifulSoup(content, 'html.parser')
    
    title = soup.find('meta', {'property' : 'og:title'})['content']
    descr = soup.find('meta', {'property' : 'og:description'})['content']
    t_pub = soup.find('meta', {'property' : 'article:published_time'})['content'][:-5]
    t_mod = soup.find('meta', {'property' : 'article:modified_time'})['content'][:-5]
    text = soup.find('article', {'itemprop' : 'articleBody'}).text
    
    scrapped_news.append(News_Piece(url, title, descr, t_pub, t_pub, text))
    
    print('---Title--- ', title)
    print('---Description---', descr)
    print('---Publication time---', datetime.fromisoformat(t_pub), '\n')

---Title---  Sanders narrowly wins New Hampshire Democratic primary, Biden lags badly
---Description---   
---Publication time--- 2020-02-12 04:38:00 

---Title---  RBC's Gero: gold eases but look for bargain hunting on big price dips
---Description--- Market Nuggets compiles all of the day's top expert analysis on the gold market. They offer a short synopsis of bank forecasts, and the outlook of famed economists like Dennis Gartman and Nouriel Roubini. Kitco Nuggets are constantly updated throughout the day. 
---Publication time--- 2020-02-11 13:56:00 

---Title---  Gold bulls hanging tough in face of rallying global stock markets
---Description--- Senior Technical Analyst Jim Wyckoff prepares investors with an overview of how the markets opened and closed. What moved metal prices? How do the technicals look? By looking at important developments 
---Publication time--- 2020-02-12 18:22:00 

---Title---  Palladium deficit expected to widen in 2020 - Johnson Matthey
---Description---  
