# Task Three : Use of NLP

In [2]:
import pandas as pd
from fuzzywuzzy import process
import spacy

# Load dataset
dataset_path = "10K_Financial_Data_Enhanced.csv"
df = pd.read_csv(dataset_path)

df



Unnamed: 0,Company,Fiscal Year,Total Revenue ($M),Net Income ($M),Total Assets ($M),Total Liabilities ($M),Cash Flow from Operating Activities ($M),Revenue Growth (%),Net Income Growth (%),Assets Growth (%),Liabilities Growth (%),Cash Flow Growth (%)
0,Apple,2022,394328.0,99803.0,352755.0,302083.0,122151.0,0.0,0.0,0.0,0.0,0.0
1,Apple,2023,383285.0,96995.0,352583.0,290437.0,110543.0,-2.800461,-2.813543,-0.048759,-3.855232,-9.502992
2,Apple,2024,391035.0,93736.0,364980.0,308030.0,118254.0,2.021994,-3.359967,3.516052,6.057424,6.975566
3,Microsoft,2022,198270.0,72738.0,364840.0,198298.0,89035.0,0.0,0.0,0.0,0.0,0.0
4,Microsoft,2023,211915.0,72361.0,411976.0,205753.0,87582.0,6.88203,-0.518299,12.919636,3.759493,-1.631942
5,Microsoft,2024,245122.0,88136.0,512163.0,243686.0,118548.0,15.669962,21.800417,24.31865,18.436183,35.35658
6,Tesla,2022,81462.0,12587.0,82338.0,36440.0,14724.0,0.0,0.0,0.0,0.0,0.0
7,Tesla,2023,96773.0,14974.0,106618.0,43009.0,13256.0,18.795267,18.96401,29.488207,18.026894,-9.970117
8,Tesla,2024,97690.0,7153.0,122070.0,48390.0,14923.0,0.947578,-52.230533,14.492862,12.511335,12.575438


We will implement a **financial chatbot** that allows users to query financial data for various companies. It processes user inputs using **Natural Language Processing (NLP)** and **fuzzy matching** to extract relevant information such as:

- Financial metrics (e.g., revenue, net income, assets, liabilities)
- Company names
- Fiscal years

### 🔍 Features:
- Retrieves specific financial metrics for a given company and year.
- Calculates **net income change** over time.
- Identifies **the company with the highest cash flow** for a given year.
- Uses **fuzzy matching** to handle variations in user input.
- Employs **Spacy** for entity recognition and **Pandas** for data handling.

🚀 Just ask a question like:  
*"What was Tesla's net income in 2022?"*  
And the chatbot will fetch the relevant data!

In [12]:
# Load NLP model
nlp = spacy.load("en_core_web_sm")

# Extract unique company names and years
company_names = {name.lower(): name for name in df['Company'].unique()}
available_years = set(map(str, df['Fiscal Year'].unique()))
financial_metrics = [col for col in df.columns if col not in ['Company', 'Fiscal Year']]

def find_best_match(query, choices):
    """Find the best matching financial metric from the dataset using fuzzy matching."""
    match, score = process.extractOne(query, choices)
    return match if score > 70 else None

def get_financial_data(company, year, metric):
    """Retrieve financial data based on user query."""
    metric_match = find_best_match(metric, financial_metrics)
    
    if not metric_match:
        return "I couldn't find a matching financial metric. Can you rephrase?"
    
    result = df[(df['Company'] == company) & (df['Fiscal Year'] == int(year))]
    
    if result.empty:
        return f"No data available for {company} in {year}. But I can provide related insights."
    
    return f"{company}'s {metric_match} in {year}: {result[metric_match].values[0]}"

def get_net_income_change(company):
    """Calculate the change in net income from the previous year."""
    result = df[df['Company'] == company].sort_values(by='Fiscal Year', ascending=False)
    if len(result) > 1:
        current_year = result.iloc[0]
        previous_year = result.iloc[1]
        change = ((current_year['Net Income ($M)'] - previous_year['Net Income ($M)']) / previous_year['Net Income ($M)']) * 100
        return f"The net income for {company} changed by {change:.2f}% from {previous_year['Fiscal Year']} to {current_year['Fiscal Year']}."
    else:
        return "Not enough data to calculate net income change."

def get_assets_liabilities(company, year):
    """Retrieve total assets and liabilities for a given company and year."""
    result = df[(df['Company'] == company) & (df['Fiscal Year'] == int(year))]
    if not result.empty:
        assets = result['Total Assets ($M)'].values[0]
        liabilities = result['Total Liabilities ($M)'].values[0]
        return f"In {year}, {company} had total assets of ${assets} million and total liabilities of ${liabilities} million."
    else:
        return "No data available for the requested company and year."

def get_highest_cash_flow(year):
    """Find the company with the highest cash flow from operating activities in a given year."""
    result = df[df['Fiscal Year'] == int(year)]
    if not result.empty:
        max_cash_flow = result['Cash Flow from Operating Activities ($M)'].max()
        company = result[result['Cash Flow from Operating Activities ($M)'] == max_cash_flow]['Company'].values[0]
        return f"In {year}, {company} had the highest cash flow from operating activities, totaling ${max_cash_flow} million."
    else:
        return "No data available for the requested year."

def chatbot_response(user_input):
    """Generate chatbot response dynamically."""
    doc = nlp(user_input)
    
    # Extract company name using fuzzy matching
    extracted_company = next((ent.text.lower() for ent in doc.ents if ent.label_ == "ORG"), None)
    company = company_names.get(extracted_company, None)
    
    if not company and extracted_company:
        best_match, score = process.extractOne(extracted_company, list(company_names.keys()))
        if score > 70:
            company = company_names[best_match]
    
    # Extract year (ensure it exists in dataset)
    year = next((ent.text for ent in doc.ents if ent.label_ == "DATE" and ent.text in available_years), None)
    
    # Extract financial metric from user input
    metric = find_best_match(user_input, financial_metrics)
    
    if "net income change" in user_input.lower() and company:
        return get_net_income_change(company)
    
    if "assets" in user_input.lower() or "liabilities" in user_input.lower():
        if company and year:
            return get_assets_liabilities(company, year)
    
    if "cash flow" in user_input.lower() and "highest" in user_input.lower():
        if year:
            return get_highest_cash_flow(year)
    
    if company and year and metric:
        return get_financial_data(company, year, metric)
    
    return "I couldn't fully understand your request. Could you specify a company, year, and financial metric?"

# Interactive chatbot loop
print("Welcome to the Financial Chatbot! Ask me anything about company finances. Type 'stop' or 'quit' to exit.")
while True:
    user_input = input("You: ")
    if user_input.lower() in ["stop", "quit"]:
        print("Chatbot: Goodbye!")
        break
    response = chatbot_response(user_input)
    print(f"Chatbot: {response}")


Welcome to the Financial Chatbot! Ask me anything about company finances. Type 'stop' or 'quit' to exit.


You:  what is the Cash Flow from Operating Activities in 2023 for Tesla


Chatbot: Tesla's Cash Flow from Operating Activities ($M) in 2023: 13256.0


You:  hello


Chatbot: I couldn't fully understand your request. Could you specify a company, year, and financial metric?


You:  quit


Chatbot: Goodbye!



## 🔧 Limitations & Future Improvements

### ❌ Limitations:
- **Inconsistent entity recognition:** The chatbot relies on Spacy for identifying company names, but it may fail when companies have uncommon names or abbreviations.
- **Ambiguous year extraction:** The chatbot can struggle to recognize financial years if they are not explicitly mentioned in a recognizable format.
- **Fuzzy matching errors:** While **FuzzyWuzzy** helps with metric recognition, it may still produce incorrect matches if the input is too vague.
- **Lack of contextual understanding:** The chatbot does not maintain conversation history, so it cannot handle follow-up questions like *"What about the year after?"*.
- **No real-time data updates:** The chatbot only works with the dataset provided (`10K_Financial_Data_Enhanced.csv`). It cannot fetch **live** financial data from online sources.

### ✅ Future Improvements:
- **Enhance entity recognition** by fine-tuning the Spacy model with company names from the dataset.
- **Improve year detection** by handling relative references (e.g., *"last year"*).
- **Implement memory for context tracking**, enabling follow-up questions.
- **Expand dataset integration**, potentially linking to APIs for real-time financial data.
- **Refine fuzzy matching** to avoid misinterpretations of financial terms.