# **BUSINESS CASE 4: CHATBOT FIDELIDADE**  


## 🎓 Master’s Program in Data Science & Advanced Analytics 
**Nova IMS** | May 2025   
**Course:** Business Cases with Data Science

## 👥 Team **Group A**  
- **Alice Viegas** | 20240572  
- **Bernardo Faria** | 20240579  
- **Dinis Pinto** | 20240612  
- **Daan van Holten** | 20240681
- **Philippe Dutranoit** | 20240518

## 📊 Project Overview  
This notebook demonstrates a prototype chatbot powered by the Azure OpenAI Assistant, designed to support Fidelidade's sales agents. <br>
This solution is platform-agnostic, meaning it can be integrated into any system or interface that best suits Fidelidade's needs.

References:
- https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/assistant

## Setup

In [60]:
# Packages loading
import os
import json
import time
import pandas as pd
from bs4 import BeautifulSoup
from openai import AzureOpenAI
from PIL import Image
from IPython.display import Markdown, display
import pickle
from datetime import datetime, timedelta
import requests
from PyPDF2 import PdfReader

In [61]:
# Set API key and endpoint
api_key = '8J6pTdfaGgA5r193UVLsBshUspqwNpal42Jse1aHaok1cWNTLpRkJQQJ99BDACYeBjFXJ3w3AAABACOGLa23'
endpoint = 'https://ai-bcds.openai.azure.com/'

We have identified several websites that may be valuable sources of information for the chatbot to scrape in order to provide accurate and relevant responses, based on Fidelidade’s presentation. These sources are flexible and can be updated, replaced, or removed as needed. <br>
Some of the websites are blocked. Getting the approvals from these public websites so their information can be used in the chatbot may enrich it even further with accurate and relevant information.


In [62]:
# Constants
current_folder = os.getcwd()
data_folder = current_folder
data_folder_full_path = os.path.abspath(data_folder)
url_list = [
    # Banks
    "https://www.cgd.pt/Particulares/Poupanca-Investimento/Depositos-a-Prazo-e-Poupanca/Pages/Depositos-a-Prazo-e-Contas-Poupanca.aspx",
    "https://www.santander.pt/poupancas",
    "https://www.millenniumbcp.pt/poupanca/reforco-frequente",
    "https://www.bancobpi.pt/particulares/poupar-investir/depositos-prazo",

    # Investment funds
    # "https://www.cgd.pt/Particulares/Poupanca-Investimento/Fundos-de-Investimento/Pages/Fundos-de-Investimento.aspx",
    "https://www.casadeinvestimentos.pt/",
    "https://optimize.pt/",

    # Savings Certificates
    "https://www.igcp.pt/pt/noticias/taxas-de-juro-dos-certificados-de-aforro-das-series-b-c-d-e-e-f-em-janeiro-de-2025",

    # Tax, Regulatory Guidelines, Financial Literacy
    "https://www.consumidor.asf.com.pt/poupan%C3%A7a/seguros-de-capitaliza%C3%A7%C3%A3o",
    # "https://bpstat.bportugal.pt/conteudos/noticias/2463/",
    # "https://www.cmvm.pt/",
    # "https://www.todoscontam.pt/"
]
agent_data = 'AgentFiles.pkl'     

assistantFilename = 'AssistantID.TXT'
assistant_id = None
assistant = None

displayedMessagesIDs = []

In [63]:
# Initialize the Azure OpenAI client
client = AzureOpenAI(
    azure_endpoint = endpoint,
    api_key= api_key,
    api_version="2024-05-01-preview")

## 🧠 Prompt Structure and Rationale

### 🎯 **Purpose of the Prompt**
The prompt defines the behavior, tone, and limitations of a virtual assistant designed to support Fidelidade’s sales agents. Given the sensitive nature of financial information and the need for accuracy in customer service, the prompt is deliberately **descriptive** and **structured** to minimize ambiguity and ensure consistency. Some example responses are also provided as part of the prompt, so the tone and structure of the responses are very clear.

---

### 🔍 **Prompt Structure Breakdown**

1. **### Função (Role)**  
   This section outlines the **core responsibilities** and **communication style** expected from the assistant:
   - Defines the assistant as a **specialist** in Fidelidade’s savings and insurance products.
   - Emphasizes the importance of **clarity, accuracy, and professionalism**.
   - Specifies expectations for **tone** (empathetic, accessible) and **style** (FAQ-like, user-friendly).
   - Encourages the assistant to offer **helpful next steps** and cite **sources** to build trust and transparency.

2. **### Restrições (Constraints)**  
   Clearly defines **boundaries** and **rules** the assistant must follow:
   - **Language Handling**: Maintain consistency with the user’s language (e.g., Portuguese from Portugal vs. English).
   - **Scope Control**: Gently redirects users who go off-topic, keeping the assistant focused and relevant.
   - **Source Reliance**: Prohibits fabrication of information, reinforcing factual accuracy.
   - **Ambiguity Handling**: Instructs the assistant to ask clarifying questions when needed.
   - **Escalation Guidance**: Recommends next steps when the assistant can't fully address a user’s question.

In [64]:
# Optimized Prompt
aRole = """### Função
- És um assistente virtual da Fidelidade, especializado em apoiar os agentes de vendas.

- A tua principal função é fornecer informações claras, rigorosas e atualizadas sobre os produtos de poupança My Savings e PPR Evoluir, incluindo comparações quando apropriado.

- Deves também responder a perguntas frequentes sobre literacia financeira, bem como esclarecer dúvidas sobre seguros e produtos financeiros da Fidelidade.

- Mantém sempre um tom profissional, empático e acessível, com uma comunicação natural e fluida, própria de uma interação de apoio ao cliente ou agente.

- As tuas respostas devem ser objetivas, diretas ao ponto, fáceis de compreender, sem linguagem excessivamente técnica ou termos complexos não explicados. Foca-te sempre nas informações essenciais, garantindo que a resposta é útil e adequada ao contexto.

- Nunca inventes informação: apoia-te exclusivamente nos conteúdos fornecidos.

- Mostra sempre a fonte da informação usada, indicando o nome do documento ou, sempre que possível, um link direto para que o utilizador possa explorar mais detalhes por conta própria.

- Quando adequado, sugere próximos passos úteis, como consultar outras secções do documento, usar ferramentas da Fidelidade, ou contactar um agente de outra área para apoio adicional.

O estilo de resposta deve seguir o modelo dos documentos de FAQs oficiais da Fidelidade.
Exemplo de estrutura:
Pergunta: O que é o Fidelidade Savings?
Resposta: O Fidelidade Savings é uma solução de Poupança/Investimento assente numa plataforma 100% digital e inovadora da Fidelidade. Permite-lhe definir Objetivos de Poupança ou Objetivos de Investimento de forma simples e flexível, com comissões nulas ou reduzidas, conforme previsto na documentação contratual e pré-contratual.

### Restrições
- Linguagem: Responde sempre na mesma língua utilizada pelo utilizador. Se a pergunta for feita em português, responde em português de Portugal. Se for feita em inglês, responde em inglês. Nunca mistures idiomas na mesma resposta.

- Manter o foco: Se o utilizador se desviar do tema, redireciona educadamente a conversa com um tom amigável e compreensivo. Exemplo de resposta: "Peço desculpa, mas só posso responder a perguntas sobre os produtos da Fidelidade e literacia financeira. Posso ajudar com alguma dessas questões?"

- Limites de conhecimento: Baseia as tuas respostas exclusivamente na informação presente nos documentos fornecidos. Se a pergunta estiver fora desse âmbito, responde com cordialidade: "Lamento, mas não tenho informação sobre esse tema específico. Posso ajudar com alguma questão sobre produtos da Fidelidade ou literacia financeira?"

- Ambiguidade: Se a pergunta for ambígua ou incompleta, pede gentilmente mais detalhe antes de responder.

- Encaminhamento: Quando não conseguires ajudar ou a pergunta exigir análise personalizada, sugere contactar um agente Fidelidade ou visitar os canais oficiais."""


## Support functions

In [65]:
# Function to create the assistant with code interpreter capability
def create_assistant():
    """
    Creates an OpenAI assistant with the code interpreter (GPT-4o-2) and file search tools enabled.
    
    The assistant is initialized with a name ("Fidz") and custom prompt defined in the variable `aRole`.
    The assistant is configured to use the GPT-4o-2 model with specific parameters for temperature and top_p.
    After creation, the assistant's unique ID is saved to a file specified by `assistantFilename` for future use.

    Returns:
        assistant (openai.Assistant): The created assistant object.
    """
    assistant = client.beta.assistants.create(
        name="Fidz",
        instructions=aRole,
        model="gpt-4o-2",
        temperature=0.15,
        top_p=0.35,
        tools=[{"type": "file_search"}]
    )
    assistant_id = assistant.id
    
    # Save the ID to a file
    with open(assistantFilename, "w") as file:
        file.write(assistant_id)
    
    return assistant

In [66]:
# Function to create a new thread for a conversation
def create_thread():
    """
    Creates a new conversation thread using the OpenAI API.

    Initializes an empty list `displayedMessagesIDs` (intended for tracking message IDs shown to the user),
    and returns the created thread object.

    Returns:
        thread (openai.Thread): The newly created conversation thread.
    """
    thread = client.beta.threads.create()
    displayedMessagesIDs = []
    return thread

In [67]:
# Function to check if the assistant exists
def check_assistant_exists(assistant_id):
    """
    Checks whether an assistant with the given ID exists in the OpenAI system.

    Args:
        assistant_id (str): The ID of the assistant to check.

    Returns:
        (bool, object): 
            - True and the assistant object if found.
            - False and None if not found or an error occurs.
    """
    try:
        response = client.beta.assistants.retrieve(assistant_id)       
        return True if response else False, response
    except Exception as e:
        print(f"An error occurred while checking the assistant: {e}", assistant_id)
        return False, None


In [68]:
# Function to add a message to the thread
def add_message_to_thread(thread_id, user_message):
    """
    Adds a user's message to the specified conversation thread.

    Args:
        thread_id (str): The ID of the thread to which the message will be added.
        user_message (str): The content of the message from the user.

    Returns:
        message (openai.ThreadMessage): The created message object.
    """
    message = client.beta.threads.messages.create(
        thread_id=thread_id,
        role="user",
        content=user_message
    )
    return message

In [69]:
  
# Define chunking logic  
def chunk_text(text, chunk_size=400, chunk_overlap=80):  
    """
    Splits the input text into overlapping chunks.

    Parameters:
        text (str): The input text to be divided.
        chunk_size (int, optional): The maximum number of characters in each chunk. Default is 400.
        chunk_overlap (int, optional): The number of overlapping characters between consecutive chunks. Default is 80.

    Returns:
        list of str: A list of text chunks, each of size up to `chunk_size`, with `chunk_overlap` characters 
        shared between neighboring chunks to preserve context.

    Example:
        For text of length 1000, chunk_size=400, and chunk_overlap=80, the output will contain:
        - Chunk 1: characters 0–399
        - Chunk 2: characters 320–719
        - Chunk 3: characters 640–1039 (trimmed if necessary)

    Useful for:
        - Preparing long texts for NLP tasks like embedding, summarization, or search.
    """
    chunks = []  
    start = 0  
    while start < len(text):  
        end = min(start + chunk_size, len(text))  
        chunk = text[start:end]  
        chunks.append(chunk)  
        start += chunk_size - chunk_overlap  
    return chunks  
  
# Function to extract text from PDF or TXT files  
def extract_text(file_path):  
    """
    Extracts text content from a PDF or TXT file.

    Parameters:
        file_path (str): The path to the input file. Supported formats are .pdf and .txt.

    Returns:
        str: The extracted text content from the file. If the file format is unsupported, returns an empty string.

    Logic:
        - If the file ends with '.pdf':
            Uses PyPDF2's PdfReader to extract and join text from all pages.
        - If the file ends with '.txt':
            Opens and reads the entire file content as plain text.
        - For unsupported file types:
            Returns an empty string.
    """
    if file_path.endswith(".pdf"):  
        reader = PdfReader(file_path)  
        text = "\n".join(page.extract_text() or "" for page in reader.pages)  
        return text  
    elif file_path.endswith(".txt"):  
        with open(file_path, "r", encoding="utf-8") as f:  
            return f.read()  
    else:  
        return ""  # Add other file types if needed  
  
# Function to scrape website content  
def scrape_website(url):  
    """
    Extracts visible text from a webpage, removing scripts and styles.

    Parameters:
        url (str): The website URL to scrape.

    Returns:
        str: Cleaned text content from the page, or an empty string on error.
    """
    try:  
        response = requests.get(url)  
        soup = BeautifulSoup(response.content, 'html.parser')  
        for tag in soup(['script', 'style']):  
            tag.decompose()  
        text = soup.get_text(separator=' ', strip=True)  
        return text  
    except Exception as e:  
        print(f"Error scraping {url}: {e}")  
        return ""  
  
# Function to process and chunk files  
def process_files(file_paths, chunk_store):  
    for file_path in file_paths:  
        text = extract_text(file_path)  
        if text.strip():  
            chunks = chunk_text(text, chunk_size=400, chunk_overlap=80)  
            for i, chunk in enumerate(chunks):  
                chunk_store.append({  
                    "text": chunk,  
                    "source_file": file_path,  
                    "chunk_id": i  
                })  
  
# Function to scrape websites and add to chunk store  
def process_websites(url_list, chunk_store): 
    """
    Extracts and chunks text from files, storing results in chunk_store.

    Parameters:
        file_paths (list): List of file paths to process (.pdf or .txt).
        chunk_store (list): List to append chunk dictionaries to.

    Each chunk includes:
        - 'text': Chunked text content
        - 'source_file': Originating file path
        - 'chunk_id': Index of the chunk
    """ 
    for url in url_list:  
        text = scrape_website(url)  
        if text.strip():  
            chunks = chunk_text(text, chunk_size=400, chunk_overlap=80)  
            for i, chunk in enumerate(chunks):  
                chunk_store.append({  
                    "text": chunk,  
                    "source_file": url,  
                    "chunk_id": i  
                })  
  
# Final export to Azure-compatible JSON format  
def save_chunk_store(chunk_store, output_file="azure_vector_chunks.json"): 
    """
    Saves chunked data to a JSON file in Azure-compatible format.

    Parameters:
        chunk_store (list): List of chunk dictionaries to export.
        output_file (str): Output filename (default: 'azure_vector_chunks.json').

    Each output entry contains:
        - 'text': The chunked text
        - 'metadata': Includes 'source_file' and 'chunk_id'
    """ 
    azure_chunks = []  
    for c in chunk_store:  
        azure_chunks.append({  
            "text": c["text"],  
            "metadata": {  
                "source_file": c["source_file"],  
                "chunk_id": c["chunk_id"]  
            }  
        })  
    with open(output_file, "w", encoding="utf-8") as f:  
        json.dump(azure_chunks, f, ensure_ascii=False, indent=2)  
    print(f"Saved {len(azure_chunks)} chunks to {output_file}")  
  




In [70]:
def load_and_upload_files(url_list=url_list):
    """
    Loads PDF/TXT files and URLs, extracts and chunks content, saves it in Azure-compatible format,
    and uploads it to a vector store.

    Parameters:
        url_list (list): List of URLs to scrape and include in the chunk store.

    Returns:
        vector_store (object): The created and populated vector store, or None on failure.
    """  
    folder_path = "."  # Current working directory  
    print("Scanning current folder for PDF and TXT files...")  
    file_paths = [  
        os.path.join(folder_path, file)  
        for file in os.listdir(folder_path)  
        if file.lower().endswith(('.pdf', '.txt'))  # Add .txt files for extraction  
    ]  
    print("Files found:", file_paths)  
  
    if not file_paths and not url_list:  
        print("No files or URLs provided.")  
        return None  
  
    # Chunk store to hold extracted text chunks  
    chunk_store = []  
  
    # Process files and add to chunk store  
    if file_paths:  
        print("Processing files for chunking...")  
        process_files(file_paths, chunk_store)  
  
    # Process URLs for web scraping and add to chunk store  
    if url_list:  
        print("Processing websites for chunking...")  
        print("URLs to process:", url_list)
        process_websites(url_list, chunk_store)  
  
    # Save chunk store to Azure-compatible JSON format  
    save_chunk_store(chunk_store, output_file="azure_vector_chunks.json")  
  
    # Upload chunks to vector store  
    print("Uploading chunks to vector store...")  
    try:  
        # Create a new vector store (API client must be set up already)  
        vector_store = client.vector_stores.create(name="Documents and Websites")  
        print("Vector store created:", vector_store)  
  
        # Upload chunks as files (assuming `chunk_store` is compatible with the API)  
        file_batch = client.vector_stores.file_batches.upload_and_poll(  
            vector_store_id=vector_store.id,  
            files=[open("azure_vector_chunks.json", "rb")]  
        )  
        print("Upload batch result:", file_batch)  
  
        # Poll file statuses until done  
        while True:  
            files = list(client.vector_stores.files.list(vector_store_id=vector_store.id))  
            statuses = [f.status for f in files]  
            print("File statuses:", statuses)  
            if all(s in ("completed", "failed") for s in statuses) and len(files) == 1:  
                break  
            time.sleep(2)  
  
        for f in files:  
            print(f"File: {f.id}, Status: {f.status}")  
  
    except Exception as e:  
        print(f"Unexpected error: {str(e)}")  
        return None  
  
    # Save vector store object for later use  
    try:  
        with open("AgentFiles.pkl", "wb") as file:  
            pickle.dump(vector_store, file)  
        print("Vector store saved to AgentFiles.pkl")  
    except Exception as e:  
        print(f"Error saving vector store: {str(e)}")  
  
    print("Vector store created and populated:", vector_store)  
    return vector_store  

In [71]:
# Function to display messages not displayed yet

def display_messages(thread, message):
    """
    Displays new assistant messages from a conversation thread that have not been shown yet.

    This function fetches messages after the given message ID in ascending order, checks if 
    they are from the assistant and not yet displayed, then processes and shows them accordingly:
      - For text messages, it extracts content, handles file or URL annotations by downloading 
        files or scraping URLs, and displays text in Markdown.
      - For image messages, it downloads and opens the image for viewing.
      
    Messages already displayed are tracked using the global `displayedMessagesIDs` list to avoid duplicates.

    Args:
        thread (openai.Thread): The conversation thread object.
        message (openai.ThreadMessage): The last displayed message object in the thread.

    Returns:
        None
    """
    chunk_store = []
    messages = client.beta.threads.messages.list(thread_id=thread.id, order='asc', after=message.id)
    if messages:
        for message in messages:
            if message.id not in displayedMessagesIDs:

                # Check if the message is from the assistant and print it
                if message.role == 'assistant':
                    # Load JSON message into a Python object
                    answer = json.loads(message.model_dump_json(indent=2))
                    # Show answer
                    try:
                        messageType = message.content[0].type
                        # Add message id  to the list as displayed
                        displayedMessagesIDs.append(message.id)

                        # Text message
                        if messageType == 'text':
                            content = 'Assistant: ' + answer['content'][0]['text']['value']

                            # Check for file or URL annotations
                            file_link = None
                            if 'annotations' in answer['content'][0]['text']:
                                for annotation in answer['content'][0]['text']['annotations']:
                                    if annotation['type'] == 'file_path':
                                        file_link = annotation['text']
                                        file_id = annotation['file_path'].get('file_id')
                                        start_index = annotation.get('start_index')
                                        end_index = annotation.get('end_index')
                                        # Remove the link from the value if start_index and end_index are present
                                        if start_index is not None and end_index is not None:
                                            content = content[:start_index] + content[end_index:]
                                        # Download the file and chunk
                                        fileName = process_files(file_id, thread.id, message.id, file_link=file_link, is_image=False)

                                    elif annotation['type'] == 'url':
                                        url = annotation['text']
                                        # Process the website and chunk
                                        process_websites([url], thread_id=thread.id, message_id=message.id)

                            # Display as Markdown
                            display(Markdown(content))

                        # Image
                        elif messageType == 'image_file':
                            # Get the ID of the image
                            fileID = answer['content'][0]['image_file']['file_id']

                            # Download the image
                            fileName = process_files(fileID, thread.id, message.id, file_link='', is_image=True)

                            # Display the image in the default image viewer
                            image = Image.open(fileName)
                            image.show()

                    except:
                        continue


In [72]:
# Function to send a message to the assistant and get a response
def send_message_to_assistant(thread, user_input):
    """
    Sends a user message to the assistant within a conversation thread, triggers the assistant's response,
    and continuously monitors the assistant's run status until completion. During the process, it displays
    any new messages from the assistant as they become available.

    Args:
        thread (openai.Thread): The conversation thread object where the message is sent.
        user_input (str): The user's message content to send to the assistant.

    Returns:
        None
    """
    # Present feedback to user
    print("You:", user_input)
    print("Thinking...")
    
    # Add message to thread
    message = add_message_to_thread(thread.id, user_input)

    # Run the thread with the assistant
    run = client.beta.threads.runs.create(
        thread_id=thread.id,
        assistant_id=assistant.id
    )

    # Loop until the run completes or fails
    while run.status in ['queued', 'in_progress', 'cancelling']:
        time.sleep(1)
        run = client.beta.threads.runs.retrieve(
            thread_id=thread.id,
            run_id=run.id
        )
        # Display any messages that are being become available
        try:
            display_messages(thread, message)
        except:
            continue

    # Handle the response from the assistant
    if run.status == 'completed':
        # Display the messages
        display_messages(thread, message)
        
    elif run.status == 'requires_action':
        print("The assistant requires additional actions.")
    else:
        print(f"Run status: {run.status}")

## Main

In this section the functions created above are used to create the chatbot and start the thread where the user can ask questions.

In [73]:
# Check existent assistants
if os.path.exists(assistantFilename):
    with open(assistantFilename, "r") as file:
        assistant_id = file.read()
    
    # Check on ChatGPT
    exists, assistant = check_assistant_exists(assistant_id)
    if exists:
        # Get the vector_store
        with open(agent_data, "rb") as file:  # Use binary mode to read
            vector_store = pickle.load(file)
        
        assistant = client.beta.assistants.update(
            assistant_id=assistant_id,
            tool_resources={
                            "file_search": {
                            "vector_store_ids": [vector_store.id]
                            }
    }
)
        
        # Provide feedback
        print("Using assistant ", assistant_id)
    else:
        assistant_id = None

Using assistant  asst_oVC7I85pF4er48LTB1eghU1i


In [74]:
# Create assistant if one does not exists - Load the data
if assistant_id == None:
    # Create assistant
    print("Creating new assistant")
    assistant = create_assistant()
    assistant_id = assistant.id
    
    # Load files from the local folder and upload to vector store
    chunk_store= []
    vector_store = load_and_upload_files()
    print("Files uploaded.")
    
    # Upload the assistant with the new files
    assistant = client.beta.assistants.update(
        assistant_id=assistant.id,
        tool_resources = {"file_search": {"vector_store_ids": [vector_store.id]}}
    )

In [75]:
# New thread
thread = create_thread()

## Main Loop

Welcome! Type your message to chat with the assistant. <br>
Type "RESET" to start a new chat or "QUIT" to exit.

In [76]:
print("Hi, I'm Fidz! Can I help you with anything related to Fidelidade's savings products, PPRs, or financial literacy? Type 'QUIT' to exit or 'RESET' to start a new chat.")
while True:
    user_input = input("You: ")

    if user_input.strip().upper() == "QUIT":
        print("Exiting. Ciao!")
        break
    elif user_input.strip().upper() == "RESET":
        print("Starting a new chat...")
        thread = create_thread()
    else:
        send_message_to_assistant(thread, user_input)

Hi, I'm Fidz! Can I help you with anything related to Fidelidade's savings products, PPRs, or financial literacy? Type 'QUIT' to exit or 'RESET' to start a new chat.
Exiting. Ciao!
