
#Financial Chatbot

Assignment: Proof of Concept Chatbot for Financial Sentiment Analysis Using Gradio in a Colab Notebook

This documentation provides an overview and explanation of the Financial Chatbot, which analyzes financial data and provides sentiment analysis based on user queries.

#Table of Contents

1. Overview
2. Setup
3. Financial Data Retrieval
4. Vectorizing Financial Data
5. Vector Search and Sentiment Analysis
6. Simulating Real-Time Stock Prices
7. Financial Chatbot Implementation
8. Random Question Generation
9. Gradio Integration

#1. Overview<a name="overview"></a>

The Financial Chatbot is designed to analyze financial data and provide sentiment analysis for user queries. It integrates with Alpha Vantage for retrieving historical financial data, uses TF-IDF vectorization for textual representation, and employs Faiss for vector search. Additionally, sentiment analysis is performed using the Hugging Face Transformers library.

#2. Setup<a name="setup"></a>
#3. Financial Data Retrieval<a name="financial-data-retrieval"></a>


The get_financial_data function fetches historical financial data using the Alpha Vantage API.

In [1]:
!pip install alpha_vantage pandas

from alpha_vantage.timeseries import TimeSeries
import pandas as pd

def get_financial_data(api_key, symbol, interval='1min', output_size='compact'):
    ts = TimeSeries(key=api_key, output_format='pandas')
    data, _ = ts.get_intraday(symbol=symbol, interval=interval, outputsize=output_size)
    return data

api_key = 'Z73G0W66BQXFGAJH'
symbol = 'AAPL'
financial_data = get_financial_data(api_key, symbol)

financial_data.head()

Collecting alpha_vantage
  Downloading alpha_vantage-2.3.1-py3-none-any.whl (31 kB)
Installing collected packages: alpha_vantage
Successfully installed alpha_vantage-2.3.1


Unnamed: 0_level_0,1. open,2. high,3. low,4. close,5. volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2023-12-08 19:59:00,195.8,195.84,195.71,195.84,170.0
2023-12-08 19:58:00,195.77,195.8,195.75,195.795,801.0
2023-12-08 19:57:00,195.75,195.77,195.745,195.76,225.0
2023-12-08 19:56:00,195.71,195.75,195.71,195.745,71.0
2023-12-08 19:55:00,195.73,195.75,195.71,195.74,107.0


#4. Vectorizing Financial Data<a name="vectorizing-financial-data"></a>

The vectorize_financial_data function uses TF-IDF vectorization to represent financial data textually.

In [2]:
from sklearn.feature_extraction.text import TfidfVectorizer

def vectorize_financial_data(data):
    selected_features = ['1. open', '2. high', '3. low', '4. close', '5. volume']

    data['text_representation'] = data[selected_features].astype(str).agg(' '.join, axis=1)

    vectorizer = TfidfVectorizer()
    vectors = vectorizer.fit_transform(data['text_representation'])

    return vectors

vectors = vectorize_financial_data(financial_data)

print("Vectorized Data Shape:", vectors.shape)


Vectorized Data Shape: (100, 94)


In [3]:
financial_data['vectors'] = list(vectors)

financial_data.head()


Unnamed: 0_level_0,1. open,2. high,3. low,4. close,5. volume,text_representation,vectors
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2023-12-08 19:59:00,195.8,195.84,195.71,195.84,170.0,195.8 195.84 195.71 195.84 170.0,"(0, 27)\t0.4171690883074654\n (0, 79)\t0.12..."
2023-12-08 19:58:00,195.77,195.8,195.75,195.795,801.0,195.77 195.8 195.75 195.795 801.0,"(0, 90)\t0.524918620794338\n (0, 89)\t0.524..."
2023-12-08 19:57:00,195.75,195.77,195.745,195.76,225.0,195.75 195.77 195.745 195.76 225.0,"(0, 39)\t0.47291648479014414\n (0, 87)\t0.4..."
2023-12-08 19:56:00,195.71,195.75,195.71,195.745,71.0,195.71 195.75 195.71 195.745 71.0,"(0, 85)\t0.5712774979640958\n (0, 86)\t0.34..."
2023-12-08 19:55:00,195.73,195.75,195.71,195.74,107.0,195.73 195.75 195.71 195.74 107.0,"(0, 5)\t0.5783602061105864\n (0, 84)\t0.449..."


In [4]:
financial_data.to_csv('financial_data_with_vectors.csv', index=False)

In [5]:
from google.colab import files
files.download('financial_data_with_vectors.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [7]:
!apt-get install libomp-dev
!pip install faiss-cpu

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libomp-14-dev libomp5-14
Suggested packages:
  libomp-14-doc
The following NEW packages will be installed:
  libomp-14-dev libomp-dev libomp5-14
0 upgraded, 3 newly installed, 0 to remove and 15 not upgraded.
Need to get 738 kB of archives.
After this operation, 8,991 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 libomp5-14 amd64 1:14.0.0-1ubuntu1.1 [389 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 libomp-14-dev amd64 1:14.0.0-1ubuntu1.1 [347 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libomp-dev amd64 1:14.0-55~exp2 [3,074 B]
Fetched 738 kB in 1s (1,013 kB/s)
Selecting previously unselected package libomp5-14:amd64.
(Reading database ... 120899 files and directories currently installed.)
Preparing to unpack .../libomp5-14_1%3a

In [9]:
import numpy as np
import faiss
vectorized_data = financial_data['vectors'].to_list()
vectorized_data = [vector.toarray().flatten() for vector in vectorized_data]

indexed_vectors = np.array(vectorized_data).astype('float32')

def vector_search(query_vector, indexed_vectors, k=5):
    index = faiss.IndexFlatL2(indexed_vectors.shape[1])
    index.add(indexed_vectors)
    _, result_index = index.search(query_vector.reshape(1, -1).astype('float32'), k)
    return result_index.flatten()

query_vector = vectorized_data[0]  # Use the vector from the first entry as an example query
top_k_results = vector_search(query_vector, indexed_vectors)

financial_data.iloc[top_k_results]


Unnamed: 0_level_0,1. open,2. high,3. low,4. close,5. volume,text_representation,vectors
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2023-12-08 19:59:00,195.8,195.84,195.71,195.84,170.0,195.8 195.84 195.71 195.84 170.0,"(0, 27)\t0.4171690883074654\n (0, 79)\t0.12..."
2023-12-08 18:51:00,195.7,195.71,195.7,195.71,7.0,195.7 195.71 195.7 195.71 7.0,"(0, 79)\t0.5845006228166181\n (0, 32)\t0.81..."
2023-12-08 19:20:00,195.7,195.7,195.7,195.7,4.0,195.7 195.7 195.7 195.7 4.0,"(0, 32)\t1.0"
2023-12-08 19:15:00,195.71,195.71,195.71,195.71,3.0,195.71 195.71 195.71 195.71 3.0,"(0, 79)\t0.821505970863965\n (0, 32)\t0.570..."
2023-12-08 19:13:00,195.71,195.71,195.71,195.71,3.0,195.71 195.71 195.71 195.71 3.0,"(0, 79)\t0.821505970863965\n (0, 32)\t0.570..."


In [10]:
!pip install transformers

from transformers import pipeline

sentiment_analyzer = pipeline('sentiment-analysis')

def analyze_sentiment(text):
    result = sentiment_analyzer(text)
    return result[0]['label'], result[0]['score']




No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

#5. Vector Search and Sentiment Analysis<a name="vector-search-and-sentiment-analysis"></a>

The vector_search function performs vector search using Faiss, and sentiment analysis is conducted using the Hugging Face Transformers library.

#6. Simulating Real-Time Stock Prices<a name="simulating-real-time-stock-prices"></a>
The get_current_stock_price function simulates the retrieval of real-time stock prices.

#7. Financial Chatbot Implementation<a name="financial-chatbot-implementation"></a>
The financial_chatbot function handles user queries, performing stock price retrieval or vector search and sentiment analysis.


#8. Random Question Generation<a name="random-question-generation"></a>
The generate_random_question function generates random user queries based on predefined templates.

In [45]:
import random
import pandas as pd

def get_current_stock_price(stock_symbol):
    # Simulating the retrieval of real-time stock price for the given stock symbol
    simulated_stock_prices = {
        "AAPL": {'symbol': 'AAPL', 'current_price': random.uniform(100, 200), 'currency': 'USD'},
        "MSFT": {'symbol': 'MSFT', 'current_price': random.uniform(150, 250), 'currency': 'USD'},
        "GOOGLE": {'symbol': 'GOOGLE', 'current_price': random.uniform(2000, 3000), 'currency': 'USD'},
        "MICROSOFT": {'symbol': 'MICROSOFT', 'current_price': random.uniform(100, 200), 'currency': 'USD'},
        # Add more simulated stock prices as needed
    }

    if stock_symbol.upper() in simulated_stock_prices:
        return simulated_stock_prices[stock_symbol.upper()]
    else:
        return {'symbol': stock_symbol, 'current_price': None, 'currency': 'USD'}

def vectorize_user_query(user_query, vectorizer):
    if not hasattr(vectorizer, 'fit_transform'):
        raise ValueError("The vectorizer must be fitted before vectorizing queries.")
    query_vector = vectorizer.transform([user_query]).toarray().flatten()
    return query_vector

def vector_search(query_vector, indexed_vectors, k=5):
    index = faiss.IndexFlatL2(query_vector.shape[0])  # Use the dimension of the query vector
    index.add(indexed_vectors)
    _, result_index = index.search(query_vector.reshape(1, -1).astype('float32'), k)
    return result_index.flatten()

def financial_chatbot(user_query, financial_data, indexed_vectors, vectorizer):
    if "stock price" in user_query.lower():
        stock_symbol = user_query.split()[-1].replace("?", "").upper()
        stock_price_info = get_current_stock_price(stock_symbol)
        return [stock_price_info]
    else:
        query_vector = vectorize_user_query(user_query, vectorizer)
        top_k_results = vector_search(query_vector, indexed_vectors)

        try:
            relevant_data = financial_data.iloc[top_k_results]
        except IndexError:
            return [{"error": "No relevant data found."}]

        sentiment_results = []
        for index, row in relevant_data.iterrows():
            text_for_sentiment_analysis = f"Financial data for {row.index[0]}: {row['text_representation']}"
            sentiment_label, sentiment_score = analyze_sentiment(text_for_sentiment_analysis)
            sentiment_results.append({'symbol': row.index[0], 'sentiment': sentiment_label, 'score': sentiment_score})
        return sentiment_results

# List of top 10 tech companies
top_tech_companies = ["AAPL", "MSFT", "GOOGLE", "AMZN", "FB", "TSLA", "NVDA", "INTC", "CSCO", "IBM"]

def generate_random_question(stock_symbols=top_tech_companies):
    question_templates = [
        "How did [stock] perform in the last [time_period]?",
        "What is the current stock price of [stock]?",
        "Tell me about [stock]'s recent product releases.",
    ]
    template = random.choice(question_templates)
    random_stock_symbol = random.choice(stock_symbols)
    random_question = template.replace("[stock]", random_stock_symbol).replace("[time_period]", "quarter")
    return random_question

random_question = generate_random_question()
chatbot_response_random = financial_chatbot(random_question, financial_data, indexed_vectors, vectorizer)

print("Random Question:", random_question)
print("Chatbot Response for Random Question:", chatbot_response_random)

current_stock_price_query = "What is the current stock price of Microsoft?"
chatbot_response_stock_price = financial_chatbot(current_stock_price_query, financial_data, indexed_vectors, vectorizer)

print("Query for Current Stock Price:", current_stock_price_query)
print("Chatbot Response for Current Stock Price Query:", chatbot_response_stock_price)

Random Question: What is the current stock price of AAPL?
Chatbot Response for Random Question: [{'symbol': 'AAPL', 'current_price': 170.69083094858664, 'currency': 'USD'}]
Query for Current Stock Price: What is the current stock price of Microsoft?
Chatbot Response for Current Stock Price Query: [{'symbol': 'MICROSOFT', 'current_price': 176.9452565623725, 'currency': 'USD'}]


In [31]:
!pip install gradio
!pip install typing-extensions==4.5.0

Collecting typing-extensions==4.5.0
  Downloading typing_extensions-4.5.0-py3-none-any.whl (27 kB)
Installing collected packages: typing-extensions
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.9.0
    Uninstalling typing_extensions-4.9.0:
      Successfully uninstalled typing_extensions-4.9.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires kaleido, which is not installed.
fastapi 0.104.1 requires typing-extensions>=4.8.0, but you have typing-extensions 4.5.0 which is incompatible.
pydantic 2.5.2 requires typing-extensions>=4.6.1, but you have typing-extensions 4.5.0 which is incompatible.
pydantic-core 2.14.5 requires typing-extensions!=4.7.0,>=4.6.0, but you have typing-extensions 4.5.0 which is incompatible.[0m[31m
[0mSuccessfully installed typing-extensions-4.5.0


In [32]:
!pip uninstall -y typing-extensions
!pip install tensorflow-probability==0.22.0 gradio


Found existing installation: typing_extensions 4.5.0
Uninstalling typing_extensions-4.5.0:
  Successfully uninstalled typing_extensions-4.5.0
Collecting typing-extensions<4.6.0 (from tensorflow-probability==0.22.0)
  Using cached typing_extensions-4.5.0-py3-none-any.whl (27 kB)
INFO: pip is looking at multiple versions of pydantic to determine which version is compatible with other requirements. This could take a while.
Collecting pydantic>=2.0 (from gradio)
  Downloading pydantic-2.5.1-py3-none-any.whl (381 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m381.6/381.6 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pydantic-core==2.14.3 (from pydantic>=2.0->gradio)
  Downloading pydantic_core-2.14.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m19.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pydantic>=2.0 (from gradio)
  Downloading pydantic-2.

#9. Gradio Integration<a name="gradio-integration"></a>
The get_chatbot_response function is integrated with Gradio for easy user interaction.

In [33]:
import gradio as gr
def get_chatbot_response(query):
    return financial_chatbot(query, financial_data, indexed_vectors, vectorizer)

iface = gr.Interface(
    fn=get_chatbot_response,
    inputs=gr.Textbox(),
    outputs="text"
)

iface.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://9ee6da6f3d1ec79845.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


