Building a fully functional AI chatbot for financial analysis is a complex process involving advanced programming and deep learning techniques. However, to fit our learning objectives and time constraints, we've tailored a simplified task. This streamlined version will introduce you to the basics of chatbot development, focusing on creating a prototype that responds to predefined financial queries. It's a first step into the world of AI chatbots, offering a glimpse into their potential without the need for extensive development time or advanced technical skills.

# Step 1: Preparation

In [1]:
import pandas as pd
import numpy as np

# Step 2: Chatbot design and data preparation

In [2]:
# Load CSV into pandas dataframe
df = pd.read_csv('10-k-fillings-update.csv')
df

Unnamed: 0,Company,Fiscal Year,Total Revenue,Net Income,Total Assets,Total Liabilities,Cash Flow From Ops,Revenue Growth (%),Net Income Growth (%)
0,Apple,2022,394328,99803,352755,302083,122151,0.0,0.0
1,Apple,2023,383285,96995,352583,290437,110543,-2.800461,-2.813543
2,Apple,2024,391035,93736,364980,308030,118254,2.021994,-3.359967
3,Microsoft,2022,198270,72738,364840,198298,89035,0.0,0.0
4,Microsoft,2023,211915,72361,411976,205753,87582,6.88203,-0.518299
5,Microsoft,2024,245122,88136,512163,243686,118548,15.669962,21.800417
6,Tesla,2022,81462,12587,82338,36440,14724,0.0,0.0
7,Tesla,2023,96773,14974,106618,43009,13256,18.795267,18.96401
8,Tesla,2024,97690,7153,122070,48390,14923,0.947578,-52.230533


In [3]:
# Ensure all elements are numeric value, expect Company name
for col in df.columns:
  if col != 'Company':
    # Remove dollar signs, commas, or other formatting if needed
    df[col] = df[col].replace('[\$,]', '', regex=True)
    # Convert to numeric object
    df[col] = pd.to_numeric(df[col], errors='coerce')

In [4]:
# Check the types of data in the Dataframe
df.dtypes

Unnamed: 0,0
Company,object
Fiscal Year,int64
Total Revenue,int64
Net Income,int64
Total Assets,int64
Total Liabilities,int64
Cash Flow From Ops,int64
Revenue Growth (%),float64
Net Income Growth (%),float64


In [5]:
df

Unnamed: 0,Company,Fiscal Year,Total Revenue,Net Income,Total Assets,Total Liabilities,Cash Flow From Ops,Revenue Growth (%),Net Income Growth (%)
0,Apple,2022,394328,99803,352755,302083,122151,0.0,0.0
1,Apple,2023,383285,96995,352583,290437,110543,-2.800461,-2.813543
2,Apple,2024,391035,93736,364980,308030,118254,2.021994,-3.359967
3,Microsoft,2022,198270,72738,364840,198298,89035,0.0,0.0
4,Microsoft,2023,211915,72361,411976,205753,87582,6.88203,-0.518299
5,Microsoft,2024,245122,88136,512163,243686,118548,15.669962,21.800417
6,Tesla,2022,81462,12587,82338,36440,14724,0.0,0.0
7,Tesla,2023,96773,14974,106618,43009,13256,18.795267,18.96401
8,Tesla,2024,97690,7153,122070,48390,14923,0.947578,-52.230533


# Step 3: Basic chatbot development

In [21]:
def simple_chatbot():
   # Start the chatbot interaction
   print("Welcome to BCGX financial Chatbot! Which company's data are you looking for?")
   companies = df['Company'].unique()
   # Loop through the companies and print their index
   for i, company in enumerate(companies):
      print(f"{i+1}. {company}")
   # Add an "Other" option at the end
   print(f"{len(companies)+1}. Other (currently unavailable)")

   # User input for company selection
   company_choice = input("Please enter the number corresponding to the company: ")

   if company_choice.isdigit() and int(company_choice) >= 1 and int(company_choice) <= len(companies):
     selected_company = companies[int(company_choice)-1]
   elif company_choice == str(len(companies)+1):
     print("Sorry, I can only provide information on existing company")
     return
   else:
     print("Invalid selection! Please choose a valid number.")
     return

   # Now ask the user what information they want about the company
   print(f"\nYou have selected {selected_company}. What information would you like?")
   columns = df.columns.tolist()
   # Show the options for columns (excluding 'Company')
   for i, column in enumerate(columns):
      if column != 'Company':
         print(f"{i}. {column}")
   # Add an option to view all data for the company
   print(f"{len(columns)}. View all data for {selected_company}")
   # Add an option to other/exit
   print(f"{len(columns) + 1}. Other (currently unavailable)")

   # User input for column selection
   column_choice = input("Please enter the number corresponding to the data you want: ")

   # Print corresponding information
   # Option to view all data (last option)
   # There is no +1 here because we removed column of "Company"
   if column_choice == str(len(columns)):
      company_data = df[df['Company'] == selected_company].reset_index(drop=True)
      print(f"\nShowing all data for {selected_company}:")
      print(company_data)
   elif column_choice.isdigit() and int(column_choice) >= 1 and int(column_choice) <= len(columns):
      # Handle valid column choices
      selected_column = columns[int(column_choice)]
      company_data = df[df['Company'] == selected_company][['Company', selected_column]].reset_index(drop=True)
      print(f"\nShowing {selected_column} data for {selected_company}:")
      print(company_data)
   elif column_choice == str(len(columns) + 1):
      print("Sorry, I can only provide information on existing columns")
   else:
      print("Invalid selection! Please choose a valid number.")

# Step 4: Demonstration and documentation

In [24]:
simple_chatbot()

Welcome to BCGX financial Chatbot! Which company's data are you looking for?
1. Apple
2. Microsoft
3. Tesla
4. Other (currently unavailable)
Please enter the number corresponding to the company: 2

You have selected Microsoft. What information would you like?
1. Fiscal Year
2. Total Revenue
3. Net Income
4. Total Assets
5. Total Liabilities
6. Cash Flow From Ops
7. Revenue Growth (%)
8. Net Income Growth (%)
9. View all data for Microsoft
10. Other (currently unavailable)
Please enter the number corresponding to the data you want: 9

Showing all data for Microsoft:
     Company  Fiscal Year  Total Revenue  Net Income  Total Assets  \
0  Microsoft         2022         198270       72738        364840   
1  Microsoft         2023         211915       72361        411976   
2  Microsoft         2024         245122       88136        512163   

   Total Liabilities  Cash Flow From Ops  Revenue Growth (%)  \
0             198298               89035            0.000000   
1             20575

Our chatgpt is able to answer user's query based on predefined information. \
We use terminal information to give user option to easily and quickly extract information they need. \
The chatbot is generalized to input csv, no hard coded functionality involved. \
Restrictions:
 * Information is limited to input CSV file.
 * Predefined query/searching method, user cannot use natural language input to fetch desired information.
 * Fixed output, user cannot ask follow-up questions nor interact with Chatbot.