<a href="https://colab.research.google.com/github/Flychuban/Stocks-Crypto-Research/blob/main/Stocks_Crypto_Research.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install transformers
!pip install sentencepiece

In [2]:
from transformers import PegasusTokenizer, PegasusForConditionalGeneration
from bs4 import BeautifulSoup
import requests

In [3]:
model_name = "human-centered-summarization/financial-summarization-pegasus"
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name) 

Downloading spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/1.34k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/1.44k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.27k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

In [4]:
url = "https://uk.finance.yahoo.com/news/d-put-2-000-tesla-043029225.html"
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
paragraphs = soup.find_all('p')

In [None]:
paragraphs

In [6]:
text = [paragraph.text for paragraph in paragraphs]
words = ' '.join(text).split(' ')[:400]
ARTICLE = ' '.join(words)

In [7]:
ARTICLE

'RCB’s owners have focused on Virat Kohli to leverage their commercial brand Both Tesla (NASDAQ: TSLA) and NIO (NYSE: NIO) stock declined by more than 50% in value in 2022. It was a dreadful year for most growth shares, including electric vehicle (EV) companies. But what if I’d taken a contrarian stance and decided to invest £1,000 in each of these fallen stocks as a New Year gift for myself? How much would I have today? Well, Tesla shares are up a very impressive 65% so far this year. In contrast, NIO shares have declined 17% since the end of December and now sit at just under $8 per share. This means that my Tesla holding would be worth £1,650, while the value of my position in its Chinese EV rival would have fallen to £830. So, my overall investment would be worth £2,480 today. That’s a gain of 24%, which is an exceptional return after just a few months. But what about the future? Should I buy either or both stocks today? There seem to be two big reasons why Tesla stock has come bac

In [8]:
input_ids = tokenizer.encode(ARTICLE, return_tensors = 'pt')
output = model.generate(input_ids, max_length = 55, num_beams = 5, early_stopping = True)
summary = tokenizer.decode(output[0], skip_special_tokens=True)

In [9]:
summary

'Tesla stock is up 65% so far this year, while NIO shares are down 17%.'

In [10]:
monitored_tickers = ['GME', 'TSLA', 'BTC']

In [11]:
from requests.api import request
def search_stock_news_urls(ticker):
  search_url = f'https://www.google.com/search?q=yahoo+finance+{ticker}&tbm=nws'
  r = requests.get(search_url)
  soup = BeautifulSoup(r.text, 'html.parser')
  atags = soup.find_all('a')
  hrefs = [link['href'] for link in atags]
  return hrefs

In [14]:
raw_urls = {ticker: search_stock_news_urls(ticker) for ticker in monitored_tickers}

In [None]:
raw_urls

In [16]:
import re

In [17]:
exclude_list = ['maps', 'policies', 'preferences', 'accounts', 'support']

In [18]:
def strip_unwanted_urls(urls, exclude_list):
    val = []
    for url in urls: 
        if 'https://' in url and not any(exclude_word in url for exclude_word in exclude_list):
            res = re.findall(r'(https?://\S+)', url)[0].split('&')[0]
            val.append(res)
    return list(set(val))

In [19]:
cleaned_urls = {ticker:strip_unwanted_urls(raw_urls[ticker], exclude_list) for ticker in monitored_tickers}
cleaned_urls

{'GME': ['https://finance.yahoo.com/news/defi-protocol-dflow-raises-5-130725669.html',
  'https://finance.yahoo.com/news/10-stocks-targeted-short-sellers-195952283.html',
  'https://finance.yahoo.com/news/12-most-popular-retail-investor-182453995.html',
  'https://finance.yahoo.com/news/bath-body-works-bbwi-reports-133001553.html',
  'https://www.google.com/search?q%3Dyahoo%2Bfinance%2BGME%26tbm%3Dnws%26pccc%3D1',
  'https://ca.finance.yahoo.com/news/gamestop-saga-getting-movie-seth-024131941.html',
  'https://finance.yahoo.com/news/are-banks-the-new-meme-stocks-193655826.html',
  'https://finance.yahoo.com/news/bullish-signals-flashing-suggest-stock-003002221.html',
  'https://seekingalpha.com/article/4604467-gamestop-investor-overoptimism-weak-fundamentals-threaten-stock',
  'https://finance.yahoo.com/news/robinhood-announces-plans-to-launch-24-hour-trading-with-names-like-apple-tesla-130110794.html',
  'https://finance.yahoo.com/news/elektra-healths-actuarial-cost-report-130000892.h

In [20]:
def scrape_and_process(URLs):
  ARTICLES = []
  for url in URLs:
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    paragraphs = soup.find_all('p')
    text = [paragraph.text for paragraph in paragraphs]
    words = ' '.join(text).split(' ')[:350]
    ARTICLE = ' '.join(words)
    ARTICLES.append(ARTICLE)
  return ARTICLES

In [None]:
articles = {ticker: scrape_and_process(cleaned_urls[ticker]) for ticker in monitored_tickers}
articles