<a href="https://colab.research.google.com/github/RonitGandhi/Stock-X/blob/main/StockX.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Comprehensive Stock Recommendation System

Objective: The objective of this phase is to collect financial news articles and stock data from relevant sources to be used in the stock recommendation system.

Steps:

Identify relevant APIs and data sources for collecting financial news and stock data.
Implement code to fetch data from the selected APIs.
Store the fetched data in a suitable format for further processing.
Purpose: The purpose of this Python code is to fetch financial news articles and stock data from external APIs.

Functions:

fetch_financial_news(api_key, query, language='en', max_results=10): This function fetches financial news articles using the Google News API based on the provided query keywords. It accepts parameters such as API key, query, language, and maximum results to fetch.

fetch_stock_data(symbol): This function fetches stock data for a given stock symbol using the Yahoo Finance API. It accepts the stock symbol as a parameter.

Example Usage:

Replace 'YOUR_NEWS_API_KEY' with your actual API key for the Google News API.
Use the fetch_financial_news function to fetch news articles by providing a query keyword (e.g., 'stock market').
Use the fetch_stock_data function to fetch stock data by providing a stock symbol (e.g., 'AAPL' for Apple Inc.).
This code snippet demonstrates how to fetch financial news articles and stock data, which will be further processed in subsequent steps of the stock recommendation system development.

In [None]:
!pip install newsapi-python

Collecting newsapi-python
  Downloading newsapi_python-0.2.7-py2.py3-none-any.whl (7.9 kB)
Installing collected packages: newsapi-python
Successfully installed newsapi-python-0.2.7


In [None]:
import requests
import yfinance as yf
import pandas as pd

def fetch_financial_news(api_key, query, language='en'):

    #url = f'https://newsapi.org/v2/everything?q={query}&language={language}&apiKey={api_key}&pageSize={max_results}'
    url= f' https://newsapi.org/v2/everything?q={query}&apiKey={api_key}'
    response = requests.get(url)
    if response.status_code == 200:
        news_data = response.json()
        articles = news_data.get('articles', [])
        return articles
    else:
        print("Error fetching news articles:", response.status_code)
        return None



def get_stock_data(symbol):


    # Get data on this ticker
    tickerData = yf.Ticker(symbol)
    info = tickerData.info
    pe_ratio = info.get('trailingPE', None)
    pb_ratio = info.get('priceToBook', None)
    de_ratio = info.get('debtToEquity', None)
    roe = info.get('returnOnEquity', None)
    roa = info.get('returnOnAssets', None)
    dividend_yield = info.get('dividendYield', None)


    print("P/E Ratio:", pe_ratio)
    print("P/B Ratio:", pb_ratio)
    print("Debt/Equity Ratio:", de_ratio)
    print("Return on Equity (ROE):", roe)
    print("Return on Assets (ROA):", roa)
    print("Dividend Yield:", dividend_yield)

# Example usage:
if __name__ == "__main__":
    # Fetch financial news articles
    news_api_key = 'be4144e253b942fbb98414671f817864'
    query = 'finance'
    news_articles = fetch_financial_news(news_api_key, query)

    print("Fetched news articles:", news_articles)
    df = pd.DataFrame(news_articles)
    print(df)
    df.to_csv('news_articles.csv', index=False)


    # Fetch stock data
    symbol = 'AAPL'  # Apple Inc.
    stock_data = get_stock_data(symbol)



                                               source  \
0   {'id': 'business-insider', 'name': 'Business I...   
1                    {'id': 'wired', 'name': 'Wired'}   
2            {'id': 'the-verge', 'name': 'The Verge'}   
3                   {'id': None, 'name': '[Removed]'}   
4     {'id': None, 'name': 'Harvard Business Review'}   
..                                                ...   
95                       {'id': None, 'name': 'CNET'}   
96                       {'id': None, 'name': 'CNET'}   
97  {'id': 'business-insider', 'name': 'Business I...   
98          {'id': None, 'name': 'Project Syndicate'}   
99               {'id': None, 'name': 'Theonion.com'}   

                                        author  \
0                                  Huileng Tan   
1                              Aarian Marshall   
2                                 Emilia David   
3                                         None   
4                                         None   
..             

Objective: The objective of this phase is to preprocess the fetched financial news articles, extract relevant information (such as stock ticker symbols), and perform sentiment analysis on the text to determine the sentiment towards mentioned stocks.

Steps:

Implement Named Entity Recognition (NER) to identify stock ticker symbols within news articles.
Clean and preprocess the text data by removing stop words, punctuation, etc.
Perform sentiment analysis on the cleaned text to determine the sentiment (positive, negative, neutral) towards the mentioned stocks.Functions:

extract_stock_symbols(text): This function extracts stock ticker symbols from the input text using Named Entity Recognition (NER). It returns a list of unique stock ticker symbols found in the text.

preprocess_text(text): This function preprocesses the input text by converting it to lowercase, removing punctuation and stop words, and tokenizing the text. It returns the preprocessed text string.

analyze_sentiment(text): This function performs sentiment analysis on the input text using the VADER sentiment analyzer. It returns a dictionary containing positive, negative, and neutral sentiment scores.

Example Usage:

Use the extract_stock_symbols function to extract stock symbols from financial news articles.
Preprocess the text using the preprocess_text function before performing sentiment analysis.
Analyze the sentiment of the preprocessed text using the analyze_sentiment function.
This code snippet demonstrates how to preprocess financial news articles, extract relevant information, and perform sentiment analysis, which are essential steps in the stock recommendation system development process.

**Compound Score**: The compound score is a single numerical value that ranges between -1 (most negative) and +1 (most positive). It represents the overall sentiment of the text. A score of 0 typically indicates a neutral sentiment.

Positive values (greater than 0) indicate positive sentiment.
Negative values (less than 0) indicate negative sentiment.
Values closer to +1 or -1 indicate stronger sentiment intensity.

In [None]:
!pip install selenium
!apt-get update
!apt install chromium-chromedriver

Collecting selenium
  Downloading selenium-4.19.0-py3-none-any.whl (10.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.5/10.5 MB[0m [31m19.5 MB/s[0m eta [36m0:00:00[0m
Collecting trio~=0.17 (from selenium)
  Downloading trio-0.25.0-py3-none-any.whl (467 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m467.2/467.2 kB[0m [31m30.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting trio-websocket~=0.9 (from selenium)
  Downloading trio_websocket-0.11.1-py3-none-any.whl (17 kB)
Collecting outcome (from trio~=0.17->selenium)
  Downloading outcome-1.3.0.post0-py2.py3-none-any.whl (10 kB)
Collecting wsproto>=0.14 (from trio-websocket~=0.9->selenium)
  Downloading wsproto-1.2.0-py3-none-any.whl (24 kB)
Collecting h11<1,>=0.9.0 (from wsproto>=0.14->trio-websocket~=0.9->selenium)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?

In [None]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import string
import pandas as pd
import spacy
import yfinance as yf
import requests
import re
from bs4 import BeautifulSoup  # For scraping Wikipedia (adjust if using a different source)
import csv
import selenium

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('vader_lexicon')

# Load the CSV file into a DataFrame
df = pd.read_csv('news_articles.csv')
df['description'] = df['description'].fillna('')

# Load the spaCy English model
nlp = spacy.load('en_core_web_sm')
def extract_top_500_company_names():
    # Wikipedia URL for the S&P 500 list
    url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"

    # Send a GET request to the URL
    response = requests.get(url)

    # Parse the HTML content of the page
    soup = BeautifulSoup(response.content, "html.parser")

    # Find the table containing the company names
    table = soup.find("table", {"class": "wikitable sortable"})

    # Initialize a list to store company names
    company_names = []

    # Iterate through the rows of the table
    for row in table.find_all("tr")[1:]:  # Skip the header row
        # Extract the company name from the first column of each row
        company_name = row.find_all("td")[1].text.strip()
        company_names.append(company_name)

    return company_names

# Extract the names of the top 500 companies
top_500_company_names = extract_top_500_company_names()

# Print the list of company names
print(top_500_company_names)
companies_and_tickers=[]

def extract_company_names(news_headline):
    extracted_names = []
    headline_words = news_headline.split()

    for word in headline_words:
        for company_name in top_500_company_names:
            if word.lower() == company_name.lower():
                extracted_names.append(company_name)
    print(extracted_names)
    return extracted_names

def preprocess_text(text):

    # Convert text to lowercase
    text = text.lower()

    # Remove punctuation
    text = text.translate(str.maketrans('', '', string.punctuation))

    # Tokenize text
    tokens = word_tokenize(text)

    # Remove stop words
    stop_words = set(stopwords.words('english'))
    filtered_tokens = [word for word in tokens if word not in stop_words]

    # Join tokens back into text
    preprocessed_text = ' '.join(filtered_tokens)

    return preprocessed_text

# Function to perform sentiment analysis
def analyze_sentiment(text):

    sid = SentimentIntensityAnalyzer()
    sentiment_scores = sid.polarity_scores(text)
    return sentiment_scores['neg'], sentiment_scores['neu'], sentiment_scores['pos'], sentiment_scores['compound']

# Apply the functions to extract company names, preprocess text, and analyze sentiment for each headline
# companies_and_tickers = extract_company_names('news_articles.csv')
# print(companies_and_tickers)
# for company, ticker in companies_and_tickers:
#     print(f"Company: {company}, Ticker: {ticker}")

df['Company Names'] = df['description'].apply(extract_company_names)
df['Preprocessed Description'] = df['description'].apply(preprocess_text)
df['Negative'], df['Neutral'], df['Positive'], df['Compound'] = zip(*df['Preprocessed Description'].apply(analyze_sentiment))

# Save the DataFrame back to a new CSV file
df.to_csv('output_with_sentiment.csv', index=False)


Objective: The objective of this phase is to filter stocks based on sentiment and financial fundamentals and recommend top-scoring stocks for further analysis or investment.

Steps:

Filter stocks based on sentiment and financial metrics such as net profit, market cap, and PE ratio.
Score each stock based on a weighted combination of sentiment score and financial metrics.
Recommend the top-scoring stocks for further analysis or investment.Functions:

filter_stocks(stock_symbols, news_sentiments, financial_data): This function filters stocks based on positive sentiment and strong financial metrics such as net profit and market cap. It returns a list of filtered stock symbols.

score_stocks(filtered_stocks, news_sentiments, financial_data, weights): This function scores stocks based on a weighted combination of sentiment score and financial metrics (net profit and market cap). It returns a dictionary containing scores for each stock symbol.

recommend_stocks(scores, top_n): This function recommends the top-scoring stocks based on scores. It returns a list of recommended stock symbols.

Example Usage:

Use the filter_stocks function to filter stocks based on sentiment and financial metrics.
Score the filtered stocks using the score_stocks function with specified weights for sentiment score and financial metrics.
Recommend the top-scoring stocks using the recommend_stocks function with the desired number of recommendations.
This code snippet demonstrates how to filter stocks based on sentiment and financial metrics, score stocks, and recommend top-scoring stocks, which are essential steps in the stock recommendation system development process. Adjust weights and criteria as needed to customize the recommendation algorithm.

In [None]:
!pip install yahoo_fin

Objective: The objective of this phase is to integrate the components developed in previous steps to create a functional stock recommendation system. This system will fetch financial news articles, analyze sentiment, filter stocks, score them based on sentiment and financial metrics, and recommend top stocks for investment.

Steps:

Integrate data collection, text processing, sentiment analysis, stock filtering, and recommendation components into a cohesive system.
Implement error handling and logging to handle exceptions gracefully and track system behavior.
Test the system with sample data to ensure functionality and accuracy.
Deploy the system to a suitable environment for production use.

In [None]:
import requests
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import string

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('vader_lexicon')

def fetch_financial_news(api_key, query, language='en', max_results=10):
    """
    Fetch financial news articles using the Google News API.

    Args:
    - api_key: API key for accessing the Google News API.
    - query: Query keywords to search for relevant news articles (e.g., "stock market", "finance").
    - language: Language code for the news articles (default is 'en' for English).
    - max_results: Maximum number of news articles to fetch (default is 10).

    Returns:
    - List of dictionaries containing details of each news article.
    """
    url = f'https://newsapi.org/v2/everything?q={query}&language={language}&apiKey={api_key}&pageSize={max_results}'
    response = requests.get(url)
    if response.status_code == 200:
        news_data = response.json()
        articles = news_data.get('articles', [])
        return articles
    else:
        print("Error fetching news articles:", response.status_code)
        return None

def fetch_stock_data(symbol):
    """
    Fetch stock data for a given symbol using the Yahoo Finance API.

    Args:
    - symbol: Ticker symbol of the stock (e.g., 'AAPL' for Apple Inc.).

    Returns:
    - Dictionary containing stock data.
    """
    url = f'https://query1.finance.yahoo.com/v7/finance/quote?symbols={symbol}'
    response = requests.get(url)
    if response.status_code == 200:
        stock_data = response.json()
        return stock_data
    else:
        print("Error fetching stock data:", response.status_code)
        return None

def extract_stock_symbols(text):
    """
    Extract stock ticker symbols from text using Named Entity Recognition (NER).

    Args:
    - text: Input text containing financial news article.

    Returns:
    - List of unique stock ticker symbols extracted from the text.
    """
    # Placeholder function for NER implementation
    # Replace with your NER implementation to extract stock symbols
    # Example: return ['AAPL', 'MSFT', 'GOOGL']
    return []

def preprocess_text(text):
    """
    Preprocess text by removing stop words, punctuation, and converting to lowercase.

    Args:
    - text: Input text to be preprocessed.

    Returns:
    - Preprocessed text string.
    """
    # Convert text to lowercase
    text = text.lower()

    # Remove punctuation
    text = text.translate(str.maketrans('', '', string.punctuation))

    # Tokenize text
    tokens = word_tokenize(text)

    # Remove stop words
    stop_words = set(stopwords.words('english'))
    filtered_tokens = [word for word in tokens if word not in stop_words]

    # Join tokens back into text
    preprocessed_text = ' '.join(filtered_tokens)

    return preprocessed_text

def analyze_sentiment(text):
    """
    Perform sentiment analysis on text using VADER sentiment analyzer.

    Args:
    - text: Input text to be analyzed.

    Returns:
    - Sentiment score dictionary containing positive, negative, and neutral scores.
    """
    sid = SentimentIntensityAnalyzer()
    sentiment_scores = sid.polarity_scores(text)
    return sentiment_scores

def filter_stocks(stock_symbols, news_sentiments, financial_data):
    """
    Filter stocks based on sentiment and financial metrics.

    Args:
    - stock_symbols: List of stock ticker symbols mentioned in the news article.
    - news_sentiments: Dictionary containing sentiment scores for each stock symbol.
    - financial_data: Dictionary containing financial data for each stock symbol.

    Returns:
    - List of filtered stock symbols based on sentiment and financial metrics.
    """
    filtered_stocks = []
    for symbol in stock_symbols:
        if symbol in news_sentiments:
            sentiment_score = news_sentiments[symbol]['compound']  # Using compound sentiment score
            financial_metrics = financial_data.get(symbol, {})
            # Filter criteria: Positive sentiment and strong financial metrics
            if sentiment_score > 0 and financial_metrics.get('net_profit', 0) > 0 and financial_metrics.get('market_cap', 0) > 0:
                filtered_stocks.append(symbol)
    return filtered_stocks

def score_stocks(filtered_stocks, news_sentiments, financial_data, weights={'sentiment': 0.5, 'net_profit': 0.3, 'market_cap': 0.2}):
    """
    Score stocks based on a weighted combination of sentiment score and financial metrics.

    Args:
    - filtered_stocks: List of filtered stock symbols.
    - news_sentiments: Dictionary containing sentiment scores for each stock symbol.
    - financial_data: Dictionary containing financial data for each stock symbol.
    - weights: Dictionary containing weights for sentiment score and financial metrics.

    Returns:
    - Dictionary containing scores for each stock symbol.
    """
    scores = {}
    for symbol in filtered_stocks:
        sentiment_score = news_sentiments[symbol]['compound'] * weights['sentiment']
        financial_metrics = financial_data.get(symbol, {})
        net_profit_score = financial_metrics.get('net_profit', 0) * weights['net_profit']
        market_cap_score = financial_metrics.get('market_cap', 0) * weights['market_cap']
        total_score = sentiment_score + net_profit_score + market_cap_score
        scores[symbol] = total_score
    return scores

def recommend_stocks(scores, top_n=5):
    """
    Recommend top-scoring stocks based on scores.

    Args:
    - scores: Dictionary containing scores for each stock symbol.
    - top_n: Number of top-scoring stocks to recommend.

    Returns:
    - List of recommended stock symbols.
    """
    recommended_stocks = sorted(scores, key=scores.get, reverse=True)[:top_n]
    return recommended_stocks

def main(api_key, query):
    # Fetch financial news articles
    news_articles = fetch_financial_news(api_key, query)

    # Process each news article
    for article in news_articles:
        text = article.get('content', '')
        stock_symbols = extract_stock_symbols(text)
        preprocessed_text = preprocess_text(text)
        news_sentiments = analyze_sentiment(preprocessed_text)

        # Fetch financial data for filtered stock symbols
        filtered_stocks = filter_stocks(stock_symbols, news_sent

Objective: The objective of this phase is to ensure the reliability, functionality, and availability of the stock recommendation system by implementing error handling, conducting thorough testing, and deploying the system to a production environment.

Steps:

Implement error handling to gracefully handle exceptions and errors that may occur during data collection, processing, or recommendation.
Conduct thorough testing of the system to ensure functionality, accuracy, and robustness under various scenarios.
Deploy the system to a suitable environment for production use, such as a cloud server or local machine.
Monitor the deployed system for performance, availability, and any potential issues or errors.
Iterate on the system based on feedback, user testing, and performance metrics to continuously improve its quality and reliability.
Functions:

main(api_key, query): This function is the main entry point of the system. It fetches financial news articles, processes each article, filters stocks, scores them, recommends top stocks, and handles any exceptions gracefully using error handling.
Example Usage:

Replace 'YOUR_NEWS_API_KEY' with your actual API key for the Google News API.
Define the query keywords for fetching financial news articles.
Call the main function with the API key and query to run the stock recommendation system.
This code snippet demonstrates how to implement error handling, conduct testing, and deploy the stock recommendation system, ensuring its reliability and availability for production use.

In [None]:
def main(api_key, query):
    try:
        # Fetch financial news articles
        news_articles = fetch_financial_news(api_key, query)

        # Process each news article
        for article in news_articles:
            text = article.get('content', '')
            stock_symbols = extract_stock_symbols(text)
            preprocessed_text = preprocess_text(text)
            news_sentiments = analyze_sentiment(preprocessed_text)

            # Fetch financial data for filtered stock symbols
            filtered_stocks = filter_stocks(stock_symbols, news_sentiments, financial_data)

            # Score and recommend stocks
            scores = score_stocks(filtered_stocks, news_sentiments, financial_data)
            recommended_stocks = recommend_stocks(scores, top_n=5)

            print("Recommended stocks:", recommended_stocks)

    except Exception as e:
        print("An error occurred:", str(e))

# Example usage:
if __name__ == "__main__":
    api_key = 'YOUR_NEWS_API_KEY'
    query = 'stock market'
    main(api_key, query)


Objective: The objective of this phase is to further refine and enhance the stock recommendation system to improve its accuracy, performance, and user experience.

Steps:

Analyze the performance and user feedback of the deployed system to identify areas for improvement.
Refine the sentiment analysis algorithm to better capture nuances in financial news sentiment.
Explore additional features and metrics to incorporate into the recommendation algorithm, such as technical indicators or alternative data sources.
Optimize the system's codebase and data processing pipeline for improved efficiency and scalability.
Implement data visualization dashboards to provide users with insights into trending news, stock mentions, and recommendation performance.
Conduct A/B testing and user surveys to evaluate the effectiveness of new features and refinements.
Continuously monitor market trends, news sentiment, and system performance to adapt and evolve the recommendation system over time.