#A.I.N.A. 1.4

####**Artificially Intelligent News Anchor (A.I.N.A.)** is an AI that can get current news, summarize them, and tell them to you. The inspiration behind AINA came from my desire to keep up with financial news but the disinterest in browsing different sources. Additionally, when I tried new sources or podcasts that spoke about financial news they did not give me the types of information I desired. So, I decided to create AINA, a way to to quickly consume financial news in one place.

**Version 1.4 updates:**


 Added fuctions that can:
  * Get the latest news and return it
  * scrape for general news and return it
  * Get news related to specific tickers that's passed through the funtion

Fixes errors such as:
  * Duplicated articles
  * Ignores websites that require subscription  
  * 'Sign in to your portfolio' rambling (which is related to the subscription error)

This version is now able to return full articles and titles of news hosted on *yahoo finance*.



---


**Cautions:**


*   Function calling should be done reasonably, i.e. when passing the amount of general news you desire take caution when passing a number greater than 8, it should work but it could also cause an error.
*   The above caution also applies for related news function. When specifying the list of tickers you want to pass take some caution passing a lot of tickers eventhough it should be able to handle it, but it's advised to just pass tickers you're interested in.





##Constructing A.I.N.A

In [1]:
!pip install gtts

Collecting gtts
  Downloading gTTS-2.5.2-py3-none-any.whl.metadata (4.1 kB)
Downloading gTTS-2.5.2-py3-none-any.whl (29 kB)
Installing collected packages: gtts
Successfully installed gtts-2.5.2


In [2]:
import pandas as pd
import numpy as np
import yfinance as yf

from copy import copy

import requests

# Web scraping
from bs4 import BeautifulSoup

#Transformer
from transformers import pipeline

# Audio
from gtts import gTTS
from IPython.display import Audio

In [3]:
class AINA:

  def __init__(self) -> None:
    pass

#-------------------------------------------------------------------------------------------------#

  def get_story(self, urls):

    titles = []
    body = []

    for index, url in enumerate(urls):
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

        response = requests.get(url, headers=headers)

        # Fetch the article
        html_content = response.text

        # Parse the HTML content
        soup = BeautifulSoup(html_content, 'html.parser')

        # Extract the title of the article by looking for the h1 tag
        title_tag = soup.find('title') #title instead of h1
        if title_tag:
            titles.append(title_tag.get_text())
        else:
            titles.append("No title found")

        # Extract the main content of the article
        article_body = soup.find_all('p')
        body_content = [paragraph.get_text() for paragraph in article_body[:-3]]
        body.append(' '.join(body_content) if body_content else "No content found")

    return titles, body

#-------------------------------------------------------------------------------------------------#

  def latest_news(self, latest=True):
    article_location = 0

    if not latest:
        return [], [], []

    # Define headers
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
    }

    # Get the latest news page
    response = requests.get('https://finance.yahoo.com/topic/latest-news/', headers=headers)
    if response.status_code != 200:
        return [], [], []

    # Parse the HTML content
    soup = BeautifulSoup(response.text, 'html.parser')

    # Extract relevant article links
    article_links = [a.get('href') for a in soup.find_all('a', href=True) if '/news/' in a.get('href') and '.html' in a.get('href')]
    unique_links = []
    for link in article_links:
      if link not in unique_links:
        unique_links.append(link)

    # Fetch article details
    titles, bodies = [], []
    for url in unique_links:
        article_response = requests.get(url, headers=headers)
        if article_response.status_code != 200:
            titles.append("No title found")
            bodies.append("No content found")
            continue

        article_soup = BeautifulSoup(article_response.text, 'html.parser')

        # Extract title
        title_tag = article_soup.find('title')
        titles.append(title_tag.get_text() if title_tag else "No title found")

        # Extract body content
        article_body = article_soup.find_all('p')
        body_content = [paragraph.get_text() for paragraph in article_body[:-3]]
        bodies.append(' '.join(body_content) if body_content else "No content found")


    search = 'No content found'
    body = bodies[article_location]

    while body == search:
      article_location += 1

    return [unique_links[article_location]], [titles[article_location]], [body]

#-------------------------------------------------------------------------------------------------#

  def general_news(self, article_amount):
    links = []

    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
    }

    response = requests.get('https://www.yahoo.com/', headers=headers)
    if response.status_code != 200:
        return [], [], []

    # Fetch the article
    html_content = response.text

    # Parse the HTML content
    soup = BeautifulSoup(html_content, 'html.parser')

    # Extract the main content of the article
    article_body = soup.find_all('a', href=True)

    # Filter out the relevant links
    article_links = [link.get('href') for link in article_body if '/news/' in link.get('href')]
    specific_links = ['https://www.yahoo.com/news/' + link for link in article_links if '.html' in link]

    # Remove duplicates
    # links = list(set(specific_links))
    links=[]
    for link in specific_links:
      if link not in links:
        links.append(link)
    # Limit to the requested number of articles
    links = links[:article_amount]

    # Fetch the articles and their titles
    titles = []
    body = []

    for url in links:
        response = requests.get(url, headers=headers)
        if response.status_code != 200:
            titles.append("No title found")
            body.append("No content found")
            continue

        html_content = response.text
        soup = BeautifulSoup(html_content, 'html.parser')

        # Extract the title of the article
        title_tag = soup.find('title')
        if title_tag:
            titles.append(title_tag.get_text())
        else:
            titles.append("No title found")

        # Extract the main content of the article
        article_body = soup.find_all('p')
        body_content = [paragraph.get_text() for paragraph in article_body]
        body.append(' '.join(body_content) if body_content else "No content found")

    return links, titles, body

#-------------------------------------------------------------------------------------------------#

  def related_news(self, tickers_list):

    links = []
    titles = []
    publisher = []

    # Collect news articles
    for tick in tickers_list:
        news = yf.Ticker(tick).news
        if news:
            links.append(news[0]['link'])
            titles.append(news[0]['title'])
            publisher.append(news[0]['publisher'])

    if len(tickers_list) > 1:
      title_list = titles.copy()
      link_list = links.copy()
      publisher_list = publisher.copy()

      actual_title = list(set(titles))  # Get unique titles

      search_dup = 'Duplicate'
      target = len(tickers_list)
      actual = len(actual_title)

      article_loc = 1

      while actual < target:
          title_list_copy = title_list.copy()

          for i in range(len(titles)):
              if len(title_list_copy) > 0:
                  title_list_copy.pop(0)
              else:
                  break
              if titles[i] in title_list_copy:
                  title_list[i] = search_dup

          find_dup = np.where(np.array(title_list) == search_dup)  # Find duplicates
          while search_dup in title_list:
              for i in find_dup[0]:
                  try:
                      news = yf.Ticker(tickers_list[i]).news
                      if len(news) > article_loc:
                          title_list[i] = news[article_loc]['title']
                          link_list[i] = news[article_loc]['link']
                          publisher_list[i] = news[article_loc]['publisher']
                  except IndexError:
                      title_list[i] = 'No more articles'
                  except Exception as e:
                      print(f"Error fetching news for {tickers_list[i]}: {e}")

          article_loc += 1
          actual = len(set(title_list))  # Recalculate unique titles
      links = link_list
      titles = title_list
      publisher = publisher_list

  ##------------------------------------------------------------------------------##

      ### This is to get the stories and titles from the links ###
      title_story = {}
      titles = []
      body = []

    for index, url in enumerate(links):
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

        response = requests.get(url, headers=headers)

        # Fetch the article
        html_content = response.text

        # Parse the HTML content
        soup = BeautifulSoup(html_content, 'html.parser')

        # Extract the title of the article by looking for the h1 tag
        title_tag = soup.find('title') #title instead of h1
        if title_tag:
            titles.append(title_tag.get_text())
        else:
            titles.append("No title found")

        # Extract the main content of the article
        article_body = soup.find_all('p')
        body_content = [paragraph.get_text() for paragraph in article_body[:-3]]
        body.append(' '.join(body_content) if body_content else "No content found")

        title_story[titles[index]] = body[index]

  ##------------------------------------------------------------------------------##
  # This is to search for 'No content found' and replace it with an actual article

    search = 'No content found'
    target1 = len(tickers_list)
    actual1 = len([i for i in body if i != search])

    # Create a list in the event 'NO content found' is in body
    if actual1 < target1:
      body_copy = body.copy()
      title_copy = titles.copy()
      link_copy = links.copy()

      finder1 = np.where(np.char.startswith(body, search)) # Find where the 'NCF' is/are
      article_position = article_loc + 1 ## Can start from article_loc from previous if statement

      while actual1 < target1:
        for i in finder1[0]:
          link = [yf.Ticker(tickers_list[i]).news[article_position]['link']]
          title_story, body_story = self.get_story(link)
          body_copy[i] = body_story[0]
          title_copy[i] = title_story[0]
          link_copy[i] = link[0]

        article_position += 1
        actual1 = len([i for i in body_copy if i != search])
        # print(f'Target: {target1} | Actual: {actual1}')

      body = body_copy
      titles = title_copy
      links = link_copy
    else:
      pass
    return links, titles, body

#-------------------------------------------------------------------------------------------------#

  def call_news_functions(self, general_amt, tickers_list, latest = True):
    url_specific, title_specific, body_specific = self.latest_news(latest)
    url_news, title_news, body_news = self.general_news(general_amt)
    link_related, title_related, body_related = self.related_news(tickers_list)
    # link_finance = financial_news(financial_amt)

    title_story = {}

    link = []
    [link.extend(i) for i in [url_specific, url_news, link_related]]

    body = []
    [body.extend(i) for i in [body_specific, body_news, body_related]]

    title = []
    [title.extend(i) for i in [title_specific, title_news, title_related]]

    for i in range(len(link)):
      title_story[title[i]] = body[i]

    return link, title_story

#-------------------------------------------------------------------------------------------------#

  def summarize_stories(self, stories_dict):
    summarizer = pipeline("summarization", model='facebook/bart-large-cnn')

    summarized_stories = {}
    for title, story in stories_dict.items():
        if story != "Failed to retrieve the article.":
            # Join the paragraphs into a single string if story is a list
            if isinstance(story, list):
                story = " ".join(story)

            # Ensure the story is not too long
            story = story[:2000]  # Truncate to the first 1000 characters

            summary = summarizer(story, max_length=200, min_length=100, do_sample=False)

            if summary:
                summarized_stories[title] = summary[0]['summary_text']
            else:
                summarized_stories[title] = "Summary could not be generated."
        else:
            summarized_stories[title] = story
    return summarized_stories

#-------------------------------------------------------------------------------------------------#

  def sanitize_filename(self, title): # This get's rid of any special charactes in the title allowing the file to be saved.
    # Replace invalid characters with underscores
    return ''.join(c if c.isalnum() or c in (' ', '_') else '_' for c in title)

#-------------------------------------------------------------------------------------------------#

  def voice(self, summarized):
    # Iterate over the dictionary and convert each story to speech
    for title, story in summarized.items():
        print(f"Title: '{title}'")
        tts = gTTS(text=story, lang='en')
        # Replace invalid characters in the title for file naming

        title = self.sanitize_filename(title)
        # Save the audio file
        audio_file = f"{title}.mp3"
        tts.save(audio_file)

        # Play the audio file
        display(Audio(audio_file))


##Calling A.I.N.A

In [8]:
if __name__ == '__main__':
  tickers = ['AAPL', 'MSFT', 'META', '^GSPC', 'LLY']
  aina = AINA()
  link, story = aina.call_news_functions(2, tickers, True)
  summary = aina.summarize_stories(story)
  aina.voice(summary)

Title: 'Montenegro court approves extradition of cryptocurrency mogul Do Kwon to native South Korea'


Title: 'US reaches plea deal with alleged 9/11 mastermind Khalid Sheikh Mohammed'


Title: 'Huge prisoner swap takes place in Turkey after days of speculation'


Title: 'Apple, Amazon Results Are Crucial for Nasdaq 100’s Next Leg'


Title: 'Here's What Analysts Are Forecasting For Microsoft Corporation (NASDAQ:MSFT) After Its Annual Results'


Title: 'AI is creating a 'new conversation' for Meta: Analyst'


Title: 'Stock market news today: Stocks slide after weak economic data as 10-year yield falls below 4%'


Title: 'Lilly's tirzepatide successful in phase 3 study showing benefit in adults with heart failure with preserved ejection fraction and obesity'
