## How My Financial Chatbot Works

The AI-powered financial chatbot is designed to analyze and retrieve key financial data from corporate 10-K and 10-Q reports. It utilizes Natural Language Processing (NLP) through spaCy to understand user queries, extract relevant information such as company names, financial metrics, and dates, and provide insights based on predefined financial data. The chatbot can handle real-time user inputs, offering instant responses by matching the query to the financial data and returning the requested metrics (e.g., revenue growth, net income, liabilities).

## Predefined Queries the Chatbot Can Respond To

The chatbot is programmed to respond to various financial queries, such as:

 * "What was Tesla’s net income in 2022?"
 * "Show me Apple's asset growth for 2021."
 * "How did Microsoft's revenue grow in 2023?"
 * "Can you tell me Tesla's cash flow growth for 2023?"
 * "What is Apple's debt to equity ratio for 2022?"

The chatbot can recognize and respond to a range of financial metrics such as revenue growth, net income growth, liabilities growth, asset growth, cash flow growth, and more. It retrieves these metrics dynamically from preloaded datasets for companies like Apple, Tesla, and Microsoft.

## Limitations

1. Predefined Data Scope: The chatbot relies on static datasets that contain financial information for a limited set of companies and years. It cannot handle queries outside of the available data.

2.rase Matching: The chatbot uses rule-based matching via the PhraseMatcher, so it may not understand complex or highly varied queries. It works best with queries that closely match predefined financial terms.

3.al-Time Data: The current version does not support real-time financial updates. It uses historical data, and any changes to the data would require manual updates to the dataset.

These limitations aside, the chatbot is a powerful tool for providing quick, accurate insights into corporate financial performance.

In [1]:
import pandas as pd

# Load your data into pandas DataFrames
growth_data = pd.DataFrame({
    'Date': ['2021-09-25', '2022-09-24', '2023-09-30', '2021-06-30', '2022-06-30', '2023-06-30', '2021-12-31', '2022-12-31', '2023-12-31'],
    'Company': ['Apple Inc.', 'Apple Inc.', 'Apple Inc.', 'Microsoft', 'Microsoft', 'Microsoft', 'Tesla', 'Tesla', 'Tesla'],
    'Assets Growth (%)': [None, 0.4994, -0.0488, None, 9.3059, 12.9196, None, 32.5232, 29.4882],
    'Cash Flow Growth (%)': [None, 17.41, -9.503, None, 16.0216, -1.6319, None, 28.0682, -9.9701],
    'Liabilities Growth (%)': [None, 4.922, -3.8552, None, 3.3928, 3.7595, None, 19.2877, 18.0269],
    'Net Income Growth (%)': [None, 5.4109, -2.8135, None, 18.7152, -0.5183, None, 127.505, 19.4409],
    'Revenue Growth (%)': [None, 7.7938, -2.8005, None, 17.9561, 6.882, None, 51.3517, 18.7953]
})

ratios_data = pd.DataFrame({
    'Date': ['2021-09-25', '2022-09-24', '2023-09-30', '2021-06-30', '2022-06-30', '2023-06-30', '2021-12-31', '2022-12-31', '2023-12-31'],
    'Company': ['Apple Inc.', 'Apple Inc.', 'Apple Inc.', 'Microsoft', 'Microsoft', 'Microsoft', 'Tesla', 'Tesla', 'Tesla'],
    'Net Profit Margin (%)': [25.88, 25.31, 25.30, 36.45, 36.68, 34.15, 10.25, 15.41, 15.50],
    'Return on Assets (ROA) (%)': [26.97, 28.29, 27.51, 18.36, 19.94, 17.56, 8.88, 15.25, 14.07],
    'Debt to Assets Ratio (%)': [82.03, 85.64, 82.37, 57.46, 54.35, 49.94, 49.17, 44.26, 40.34],
    'Equity to Assets Ratio (%)': [17.97, 14.36, 17.63, 42.54, 45.65, 50.06, 50.83, 55.74, 59.66],
    'Debt to Equity Ratio': [4.56, 5.96, 4.67, 1.35, 1.19, 0.99, 0.97, 0.79, 0.68],
    'Operating Cash Flow to Revenue Ratio (%)': [28.44, 30.98, 28.84, 45.65, 44.91, 41.33, 21.36, 18.07, 13.70],
})


In [10]:
import ipywidgets as widgets
from IPython.display import display, clear_output
import spacy
from spacy.matcher import PhraseMatcher

# Load spaCy's English NLP model
nlp = spacy.load("en_core_web_sm")

# Normalize text (remove special characters like curly apostrophes)
def normalize_text(text):
    return text.replace("’", "'")

# List of known companies
known_companies = ["Tesla", "Apple", "Microsoft"]

# Extract financial terms (dynamically from your DataFrame columns)
growth_columns = [col for col in growth_data.columns if col not in ['Company', 'Date']]
ratios_columns = [col for col in ratios_data.columns if col not in ['Company', 'Date']]
all_columns = growth_columns + ratios_columns
financial_terms = [col.replace("_", " ").replace("(%)", "").strip().lower() for col in all_columns]

# Initialize the matcher for financial terms
matcher = PhraseMatcher(nlp.vocab)
patterns = [nlp.make_doc(term) for term in financial_terms]
matcher.add("FIN_TERMS", patterns)

# Function to extract entities from the query
def extract_entities_from_query(query):
    # Normalize the query to handle apostrophes
    query = normalize_text(query)
    
    # Process query through the NLP pipeline
    doc = nlp(query)
    
    # Initialize variables
    company = None
    year = None
    metric = None
    
    # Identify company using a list of known companies
    for token in doc:
        if token.text in known_companies:
            company = token.text
    
    # Identify year using spaCy's entity recognition (DATE label)
    for ent in doc.ents:
        if ent.label_ == 'DATE':  # Year
            year = ent.text.strip()
    
    # Use PhraseMatcher to find the financial term (metric) in the query
    matches = matcher(doc)
    for match_id, start, end in matches:
        metric = doc[start:end].text.lower()  # Match the financial term
    
    # Debugging: print what we extracted
    print(f"Extracted company: {company}, metric: {metric}, year: {year}")
    
    return company, metric, year

def get_financial_metric(company, metric, year, growth_data, ratios_data):
    # Convert the metric to a column name format (reverse the cleanup process)
    column_map = {
        'revenue growth': 'Revenue Growth (%)',
        'net income growth': 'Net Income Growth (%)',
        'liabilities growth': 'Liabilities Growth (%)',
        'assets growth': 'Assets Growth (%)',
        'cash flow growth': 'Cash Flow Growth (%)',
        'net profit margin': 'Net Profit Margin (%)',
        'return on assets (roa)': 'Return on Assets (ROA) (%)',
        'debt to assets ratio': 'Debt to Assets Ratio (%)',
        'equity to assets ratio': 'Equity to Assets Ratio (%)',
        'debt to equity ratio': 'Debt to Equity Ratio',
        'operating cash flow to revenue ratio': 'Operating Cash Flow to Revenue Ratio (%)',
    }
    
    # Look for the correct column in the data
    column_name = column_map.get(metric)
    
    if column_name is None:
        return "Metric not recognized."
    
    try:
        # Convert the Date column to just the year for comparison
        growth_data['Year'] = pd.to_datetime(growth_data['Date']).dt.year
        ratios_data['Year'] = pd.to_datetime(ratios_data['Date']).dt.year
        
        # Check first in growth_data
        if column_name in growth_data.columns:
            print(f"Looking for {company} {metric} in {year} in growth_data...")
            result = growth_data[(growth_data['Company'].str.contains(company, case=False)) & (growth_data['Year'] == int(year))]
            
            if not result.empty:
                value = result[column_name].values[0]
                return f"{company} {metric} in {year}: {value}"
            else:
                print("No matching data found in growth_data.")
        
        # Check in ratios_data if not found in growth_data
        if column_name in ratios_data.columns:
            print(f"Looking for {company} {metric} in {year} in ratios_data...")
            result = ratios_data[(ratios_data['Company'].str.contains(company, case=False)) & (ratios_data['Year'] == int(year))]
            
            if not result.empty:
                value = result[column_name].values[0]
                return f"{company} {metric} in {year}: {value}"
            else:
                print("No matching data found in ratios_data.")
        
    except KeyError as e:
        print(f"KeyError: {e}")
        return "Data not available for this query."
    
    return "No data found for the specified query."

# Create widgets
query_input = widgets.Text(description="Query:", style={'description_width': 'initial'})
submit_button = widgets.Button(description="Submit")
output = widgets.Output()

# Example queries
example_queries = [
    "What was Tesla's net income growth in 2022?",
    "How much did Apple's revenue growth in 2022?",
    "Show me Microsoft's liabilities growth for 2023.",
    "What is Apple's assets growth in 2022?",
    "Can you tell me Tesla’s cash flow growth for 2023?",
    "What was Microsoft's return on assets (roa) in 2022?",
    "Give me the net profit margin for Tesla in 2021.",
    "How did Apple's operating cash flow to revenue ratio change in 2023?",
    "What is the debt to equity ratio for Microsoft in 2022?",
    "Show the liabilities growth for Tesla in 2023."
]

example_dropdown = widgets.Dropdown(
    options=example_queries,
    description='Example Queries:',
    style={'description_width': 'initial'}
)

# Info text
info_text = widgets.HTML(
    value=f"<b>Available Companies:</b> {', '.join(companies)}<br>"
          f"<b>Available Years:</b> {', '.join(map(str, years))}"
)

def on_submit_button_clicked(b):
    with output:
        clear_output()
        query = query_input.value
        company, metric, year = extract_entities_from_query(query)
        if company and metric and year:
            response = get_financial_metric(company, metric, year, growth_data, ratios_data)
            print(response)
        else:
            print(f"Could not extract enough information from query: {query}")

def on_example_selected(change):
    query_input.value = change.new

submit_button.on_click(on_submit_button_clicked)
example_dropdown.observe(on_example_selected, names='value')

# Display the app
display(info_text)
display(example_dropdown)
display(query_input)
display(submit_button)
display(output)

HTML(value='<b>Available Companies:</b> Apple Inc., Microsoft, Tesla<br><b>Available Years:</b> 2021, 2022, 20…

Dropdown(description='Example Queries:', options=("What was Tesla's net income growth in 2022?", "How much did…

Text(value='', description='Query:', style=TextStyle(description_width='initial'))

Button(description='Submit', style=ButtonStyle())

Output()