# Concept: Stock buy/sell action through News
- https://www.zenrows.com/blog/403-web-scraping#set-fake-user-agent >> Handling errors
- https://www.youtube.com/watch?v=o-zM8onpQZY >> Using finviz to get stocks and NTLK processing
- https://medium.datadriveninvestor.com/sentiment-analysis-of-stocks-from-financial-news-using-python-82ebdcefb638 >> Using finviz to get stocks and NTLK processing
- https://towardsdatascience.com/stock-news-sentiment-analysis-with-python-193d4b4378d4 >> Using finviz to get stocks and NTLK processing



We will be trying to create script that scrapes and creates  recommendation for the top performing stocks from October2023 by checking news related to this stocks and their sentiment analysis result.<br>
This script will try to performance from news related to the company (Stock)
<br>As our portfolio we will be using the best S&P 500 stocks as of October 2023
<br>Company and ticker symbol	Performance in 2023
- Tesla (TSLA)	
- Royal Caribbean Cruises (RCL)
- Carnival Corporation (CCL)	
- General Electric (GE)	

In [33]:
# First import libraries we use
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import nltk
nltk.downloader.download('vader_lexicon')
import pandas as pd
from datetime import datetime

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\joaqu\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [34]:
# Now we set the website finviz from where we will extract the stock news, finviz is the easiest and most used in examples checked as pre work
finwiz_url = 'https://finviz.com/quote.ashx?t='
tickers = ['TSLA','RCL','CCL','GE']
n=4
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
   'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
   'Accept-Encoding': 'none',
   'Accept-Language': 'en-US,en;q=0.8',
   'Connection': 'keep-alive'}




In [35]:
# Now we will gather news from the stock tickers we defined on our portfolio and iterate the tickers 
news_tables = {}

for ticker in tickers:
    url = finwiz_url + ticker
    req = Request(url=url,headers=hdr) 
    resp = urlopen(req)    
    html = BeautifulSoup(resp, features="lxml")
    news_table = html.find(id='news-table')
    news_tables[ticker] = news_table

In [42]:
# Get news from tr (table rows) and extract data, print each ticker news to check visually
try:
    for ticker in tickers:
        df = news_tables[ticker]
        df_tr = df.findAll('tr')
    
        print ('\n')
        print ('Recent News Headlines for {}: '.format(ticker))
        
        for i, table_row in enumerate(df_tr):
            a_text = table_row.a.text
            td_text = table_row.td.text
            td_text = td_text.strip()
            print(a_text,'(',td_text,')')
            if i == n-1:
                break
except KeyError:
    pass



Recent News Headlines for TSLA: 
Stock Traders Face Pivotal Week as Apple Steals Fed Spotlight ( Today 12:05PM )
Tesla's Competition Are Feeling the Heat ( 10:30AM )
Even With Shrinking Profits, Tesla's Business Is Supercharged Compared to the Competition ( 10:00AM )
Dan Niles looking to re-short Tesla, Apple: Insider trades & hedge funds weekly ( 09:39AM )


Recent News Headlines for RCL: 
Carnival Cruise Line shares key dining rule some passengers ignore ( Today 09:36AM )
Royal Caribbean Cruises Ltd. (NYSE:RCL) Q3 2023 Earnings Call Transcript ( Oct-28-23 03:26PM )
Q3 2023 Royal Caribbean Cruises Ltd Earnings Call ( Oct-27-23 02:06AM )
Royal Caribbean Cruises (RCL) Q3 2023 Earnings Call Transcript ( Oct-26-23 06:00PM )


Recent News Headlines for CCL: 
2 Reasons You Might Seriously Regret Buying Carnival Cruise Stock ( Today 05:15AM )
Fresh Puerto Rican Tostones and Stone Crab Highlight New Caribbean Menu on Holland America Line ( Oct-27-23 02:18PM )
SEABOURN UNVEILS NEW EXPEDITION

In [37]:
# Iterate the news to get ticker date time and the text of the news
parsed_news = []
for file_name, news_table in news_tables.items():
    for x in news_table.findAll('tr'):
        text = x.a.get_text() 
        date_scrape = x.td.text.split()

        if len(date_scrape) == 1:
            time = date_scrape[0]
            
        else:
            date = date_scrape[0]
            time = date_scrape[1]

        ticker = file_name.split('_')[0]
        
        parsed_news.append([ticker, date, time, text])
        


In [38]:
# Sentiment Analysis
analyzer = SentimentIntensityAnalyzer()

columns = ['Ticker', 'Date', 'Time', 'Headline']
news = pd.DataFrame(parsed_news, columns=columns)
scores = news['Headline'].apply(analyzer.polarity_scores).tolist()

df_scores = pd.DataFrame(scores)
news = news.join(df_scores, rsuffix='_right')



In [44]:
## Check for all the news['Date'] that are equal to the string 'Today' and set TODAY() for the rest use the date

# Today's date
today = datetime.today()

# Check for all the news['Date'] that are equal to the string 'Today'
if 'Today' in news['Date'].values:
    # Add today's date in date type to those rows
    news.loc[news['Date'] == 'Today', 'Date'] = today
else:
    # Convert the rest of the news['Date'] to datetime64 type
    news['Date'] = pd.to_datetime(news.Date).dt.date



In [45]:
# Create a list of the unique tickers
unique_ticker = news['Ticker'].unique().tolist()
news_dict = {name: news.loc[news['Ticker'] == name] for name in unique_ticker}

# Calculate and set the mean sentiment fron the compound values
values = []
for ticker in tickers: 
    dataframe = news_dict[ticker]
    dataframe = dataframe.set_index('Ticker')
    dataframe = dataframe.drop(columns = ['Headline'])
    print ('\n')
    print (dataframe.head())
    
    mean = round(dataframe['compound'].mean(), 2)
    values.append(mean)

   
df = pd.DataFrame(list(zip(tickers, values)), columns =['Ticker', 'Mean Sentiment']) 
df = df.set_index('Ticker')
df = df.sort_values('Mean Sentiment', ascending=False)
print ('\n')
print (df)



              Date     Time    neg    neu    pos  compound
Ticker                                                    
TSLA    2023-10-29  12:05PM  0.268  0.732  0.000   -0.5106
TSLA    2023-10-29  10:30AM  0.000  0.769  0.231    0.1280
TSLA    2023-10-29  10:00AM  0.000  0.791  0.209    0.4404
TSLA    2023-10-29  09:39AM  0.000  1.000  0.000    0.0000
TSLA    2023-10-29  09:05AM  0.000  1.000  0.000    0.0000


              Date     Time    neg   neu    pos  compound
Ticker                                                   
RCL     2023-10-29  09:36AM  0.197  0.63  0.173   -0.0772
RCL     2023-10-28  03:26PM  0.000  1.00  0.000    0.0000
RCL     2023-10-27  02:06AM  0.000  1.00  0.000    0.0000
RCL     2023-10-26  06:00PM  0.000  1.00  0.000    0.0000
RCL     2023-10-26  05:52PM  0.000  0.70  0.300    0.4576


              Date     Time    neg    neu    pos  compound
Ticker                                                    
CCL     2023-10-29  05:15AM  0.391  0.609  0.000   -0.542