## Author: Bourahima COULIBALY


### Subject:
#### Context

Modjo is a Conversational Intelligence platform that helps its clients to leverage their customer interactions (audio calls and video calls) to coach their client facing teams, share key insights with the rest of the company, and keep business data accurate and relevant (notably by filling CRM fields). 

Given how audio and video recording are deep to dive in and long to listen to, being able to extract the key insights to bring to the attention of the users is crucial. 

In this small exercise you will work on how to summarize information from a call. This summary would be displayed next to the call playback, sent to the CRM etc.

#### Exercice

Using python and by making your code clean, organized and documented, please perform the following tasks. You can apply it to the provided transcript. 

Deliverable includes:

- A zip file of the functional python project
- A document with explanations and results of run on the provided transcript. Chose whatever format suits you best, Jupyter Notebook, PDF from a text editor, slides…


## Libraries

In [23]:
from langchain.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from openai import OpenAI

import pandas as pd
import requests
import json
from typing import List, Dict, Any

import os
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

## Data Extractor

In this section, we have decided to pre-process our data to use only relevant information, in order to generate a summary and answer users' questions.

We therefore need to extract the essential information, i.e. the speaker's name and the content of the conversation. Timestamps and other data are not required when we rearrange the conversation in order.

In [24]:
class DataExtractor:
    """
    This class is designed to extract, transform, and format conversation data from a specified API.

    Attributes:
        API_URL (str): The URL of the API to fetch data from. 
        session (requests.Session): A session object for making HTTP requests.
        res (List[Dict[str, Any]]): Raw data extracted from the API.
        data (pd.DataFrame): Processed data stored as a pandas DataFrame.

    Methods:
        extract_data() -> None:
            Fetches data from the API and stores it in self.res.

        transform_data() -> pd.DataFrame:
            Processes the raw data into a structured DataFrame.

        data_frame_to_text() -> str:
            Converts the DataFrame to a formatted text string.

        dataframe_to_json() -> json:
            Converts the DataFrame to a JSON string.

        get_conversation_data(format: str = 'dataframe') -> Any:
            Returns the conversation data in the specified format.
    """
    
    def __init__(self, api_url: str = None):
        """
        Initializes the DataExtractor with an optional API URL.

        Args:
            api_url (str, optional): The URL of the API to fetch data from.
                                    If not provided, a default URL is used.
        """
        self.API_URL = api_url or "https://file.notion.so/f/f/a81bac85-8169-4578-8cb9-a0c53a5432d9/368a9500-48b4-4c7d-a82d-dd00f0b4e61f/segments_(4).json?id=36bd1e43-fbfa-4f7f-b3f1-42b2609f3100&table=block&spaceId=a81bac85-8169-4578-8cb9-a0c53a5432d9&expirationTimestamp=1719568800000&signature=t5MQ1qg7ezFEzpGfzTPXTflJvJHBGZ-XziORO8C9Ruk&downloadName=demo-segments.json"
        self.session = requests.Session()
        self.res: List[Dict[str, Any]] = []
        self.data: pd.DataFrame = pd.DataFrame()

    def extract_data(self) -> None:
        """
        Extracts data from the API and stores it in self.res.

        Raises:
            requests.RequestException: If there's an error during the API request.
        """
        try:
            response = self.session.get(self.API_URL)
            response.raise_for_status()
            res = response.json()
            self.res = res.get("segments", [])
        except requests.RequestException as e:
            print(f"Erreur lors de l'extraction des données: {e}")
            self.res = []

    def transform_data(self) -> pd.DataFrame:
        """
        Transforms the extracted data into a structured DataFrame.

        This method processes the raw data, combining consecutive entries
        from the same speaker and removing duplicates.

        Returns:
            pd.DataFrame: The processed data as a pandas DataFrame.
        """
        if not self.res:
            self.extract_data()
        
        data = [{"speaker": item.get("speaker"), "content": item.get("content")} 
                for item in self.res]
        data = pd.DataFrame(data)

        indexes_to_drop = []

        for ligne in range(data.shape[0] - 1):
            if data.iloc[ligne, 0] == data.iloc[ligne + 1, 0]:
                # Combiner les contenus des lignes successives ayant le même speaker
                data.at[ligne, 'content'] = data.at[ligne, 'content'] + " " + data.at[ligne + 1, 'content']
                indexes_to_drop.append(ligne + 1)

        # Supprimer les lignes marquées pour suppression
        data = data.drop(indexes_to_drop)
        self.data = data.reset_index(drop=True)
        return self.data

    def data_frame_to_text(self) -> str:
        """
        Converts the DataFrame to a formatted text string.

        Returns:
            str: A string representation of the conversation, with each line
                formatted as "speaker: content".
        """
        if self.data.empty:
            self.transform_data()
        
        return "\n".join(f"{row.speaker}: {row.content}" for _, row in self.data.iterrows())

    def dataframe_to_json(self) -> json:
        """
        Converts the DataFrame to a JSON string.

        Returns:
            json: A JSON string representation of the DataFrame.
        """
        if self.data.empty:
            self.transform_data()
        
        return self.data.to_json(orient='records')

    def get_conversation_data(self, format: str = 'dataframe') -> Any:
        """
        Returns the conversation data in the specified format.

        Args:
            format (str): The desired output format. 
                        Options are 'dataframe', 'text', or 'json'.

        Returns:
            Any: The conversation data in the requested format.

        Raises:
            ValueError: If an unsupported format is specified.
        """
        if format == 'dataframe':
            return self.transform_data()
        elif format == 'text':
            return self.data_frame_to_text()
        elif format == 'json':
            return self.dataframe_to_json()
        else:
            raise ValueError("Format non supporté. Utilisez 'dataframe', 'text', ou 'json'.")

dataextractor = DataExtractor()

In [25]:
data = dataextractor.get_conversation_data()
data.head()

Unnamed: 0,speaker,content
0,spk2,Oui bonjour Celeste.
1,spk0,Bonjour Merlin. Comment vas-tu ?
2,spk2,Très bien et toi ?
3,spk0,"Super. Écoute, ça va super. Je suis ravie de c..."
4,spk2,"Non, là je t'avoue que avec le gros brume que ..."


In [26]:
data_json = dataextractor.get_conversation_data(format='json')

In [27]:
texte = dataextractor.get_conversation_data(format='text')

with open("data/conversation.txt", "w", encoding="utf-8") as file:
    file.write(texte)

print(texte)

spk2: Oui bonjour Celeste.
spk0: Bonjour Merlin. Comment vas-tu ?
spk2: Très bien et toi ?
spk0: Super. Écoute, ça va super. Je suis ravie de commencer mon après-midi avec toi. Je vois que la caméra n'a pas été réparée. Je sais pas si t'es tout seul ou t'es avec...
spk2: Non, là je t'avoue que avec le gros brume que j'ai...
spk0: Ah oui ! Ok, pas de problème. Non, mais je ne savais pas si du coup, Arthur et l'autre Merlin se connectaient derrière toi ou s'il y avait d'autres personnes qui arrivaient.
spk2: Non, alors Merlin, je ne sais pas, mais Arthur, ben voilà, il est là, parfait.
spk0: Ben voilà. Bonjour Arthur, enchanté. Salut Arthur.
spk1: Je suis là, bonjour Celeste.
spk2: Ah c'est Merlin, parfait. Donc, je suis juste contente de reposer le contexte par rapport à notre échange de la dernière fois et puis l'objectif de cette petite call tous ensemble.
spk0: Oui, effectivement, l'idée c'était, si ça vous va, l'agenda du coup juste que je vous propose effectivement, c'est peut-être

# 1. Summarization

**Using GPT-3.5-turbo**, implement in python summarizers that take as input a json like the one you were provided, and that must be able to perform the following tasks:

1.1  a **60-word max. summary**. 

1.2  a **free-format summary**. It will be used by the callers’ peers to know at a glance what happened during the call. It can be a manager who comes to coach the caller, to get information on the deal, another rep who will be involved on the deal too and needs to know what happened during the call etc. Please justify the choices you make regarding the format you choose.


To address this need, we will engage in prompt engineering to find the best configuration for obtaining an optimal summary of the conversation. We will also ensure to minimize hallucinations (adding additional information not present in the original data) and improve the overall quality of the summary.

In [28]:
class OpenAIClient:
    """
    A client for interacting with the OpenAI API.

    Attributes:
        system_message (dict): The system message to be used in the chat prompt.
        client (OpenAI): The OpenAI client instance.

    Methods:
        __init__(api_key): Initializes the OpenAI client with the given API key.
        get_completion(prompt, model): Sends a prompt to the OpenAI API and returns the completion.
        get_system_message(system_message): Sets the system message for the client.
    """
    
    def __init__(self, api_key=OPENAI_API_KEY):
        """
        Initializes the OpenAI client.

        Parameters:
            api_key (str): The API key for the OpenAI API.
        """
        self.system_message = None
        if api_key is None:
            raise ValueError("OpenAI API key is missing. Please set it in the .env file.")
        else:
            self.client = OpenAI(api_key=api_key)
        

    def get_completion(self, prompt, model="gpt-3.5-turbo"):
        """
        Sends a prompt to the OpenAI API and returns the completion.

        Parameters:
            prompt (str): The prompt to be sent to the OpenAI API.
            model (str): The model to be used for generating the completion. Default is "gpt-3.5-turbo".

        Returns:
            str: The content of the completion response.
        """
        try:
            messages = [self.system_message, {"role": "user", "content": prompt}]
            response = self.client.chat.completions.create(
                model=model,
                messages=messages,
                temperature=0,
            )
            return response.choices[0].message.content
        except Exception as e:
            print(f"An error occurred when calling the OpenAI API: {e}")
            return None
    
    def get_system_message(self, system_message):
        """
        Sets the system message for the client.

        Parameters:
            system_message (dict): The system message to be used in the chat prompt.
        """
        self.system_message = system_message


class Summarizer:
    """
    A summarizer for generating conversation summaries using the OpenAI API.

    Attributes:
        openai_client (OpenAIClient): An instance of the OpenAIClient.

    Methods:
        __init__(openai_client): Initializes the summarizer with the given OpenAI client.
        summarize_60_words(texte): Generates a 60-word summary of the given conversation.
        summarize_free_format(texte): Generates a comprehensive summary of the given conversation.
        summarize_structured(texte): Generates a structured summary of the given conversation.
    """

    def __init__(self, openai_client):
        """
        Initializes the summarizer with the given OpenAI client.

        Parameters:
            openai_client (OpenAIClient): An instance of the OpenAIClient.
        """
        self.openai_client = openai_client
        self.openai_client.get_system_message({
            "role": "system",
            "content": (
                "You are a helpful expert on conversation summarization wizards."
            )
        })

    def summarize_60_words(self, texte):
        """
        Generates a 60-word summary of the given conversation.

        Parameters:
            texte (str): The conversation text to be summarized.

        Returns:
            str: The 60-word summary of the conversation.
        """
        prompt = f"""
        You are an expert in conversation summarization. Your task is to create a concise and informative summary of the following conversation, delimited by triple backticks.

        Guidelines:
        1. Identify the key points, Persons and main themes of the conversation.
        2. Capture the essence of the exchange, including important opinions or decisions made.
        3. Use clear and precise language.
        4. Avoid superfluous details and focus on the essential.
        5. Strictly adhere to the 60-word limit.
        6. Ensure the summary is coherent and self-contained.

        Conversation: ```{texte}```

        Summary (max 60 words):
        """
        return self.openai_client.get_completion(prompt)

    def summarize_free_format(self, texte):
        """
        Generates a comprehensive summary of the given conversation.

        Parameters:
            texte (str): The conversation text to be summarized.

        Returns:
            str: The comprehensive summary of the conversation.
        """
        prompt = f"""
        You are a highly skilled sales conversation analyst. Your task is to create a comprehensive and actionable summary of the following conversation, delimited by triple backticks.

        Objective: 
        Generate a detailed summary that will be invaluable for sales managers, coaches, and team members involved in the deal.

        Guidelines:
        1. Identify and highlight the key points, Persons, outcomes, and any critical decisions made during the call.
        2. Outline the main topics discussed and the flow of the conversation.
        3. Capture important customer information, including needs, pain points, and objections.
        4. Note any commitments made by either party or next steps agreed upon.
        5. Highlight potential areas for improvement or coaching opportunities for the sales representative.
        6. Include relevant context about the deal's status, size, or importance if mentioned.
        7. Use clear, professional language and organize the summary in a logical structure.
        8. Provide insights that could be useful for deal strategy or future interactions with the customer.

        Your summary should be thorough enough to give a clear understanding of the call to someone who wasn't present, yet concise enough to be quickly digestible by busy professionals.

        Conversation: ```{texte}```

        Detailed Summary:
        """
        return self.openai_client.get_completion(prompt)

    def summarize_structured(self, texte):
        """
        Generates a structured summary of the given conversation.

        Parameters:
            texte (str): The conversation text to be summarized.

        Returns:
            str: The structured summary of the conversation.
        """
        prompt = f"""
        As an expert in business communication analysis, create a comprehensive and structured summary of the following conversation. Your summary should be valuable for team members who did not participate in the call, providing them with clear insights and actionable information.

        Structure your summary as follows:

        1. Purpose of the Call:
        - Clearly state the main objective(s) of the conversation.
        - Include any context or background information that sets the stage for the call.

        2. Key Points Discussed:
        - Provide a bulleted list of the main topics covered.
        - For each point, include brief but essential details.
        - Highlight any significant insights, challenges, or opportunities mentioned.

        3. Results and Next Steps:
        - Summarize the outcomes of the conversation.
        - List any decisions made or conclusions reached.
        - Outline the agreed-upon next steps or future plans.

        4. Action Points:
        - Create a clear, bulleted list of specific tasks or actions to be taken.
        - For each action item, specify who is responsible (if mentioned) and any deadlines.
        - Include any follow-up meetings or communications planned.

        5. Additional Insights (optional):
        - Note any underlying issues, potential risks, or opportunities not explicitly discussed but implied.
        - Provide any strategic recommendations based on the conversation content.

        Guidelines:
        - Use clear, concise language.
        - Focus on factual information and avoid personal interpretations unless specifically relevant.
        - Ensure the summary is self-contained and understandable without additional context.
        - Aim for a balance between comprehensiveness and brevity.

        Conversation: ```{texte}```

        Detailed Summary:
        """
        return self.openai_client.get_completion(prompt)


    
openai_client = OpenAIClient()
summarizer = Summarizer(openai_client)

### Make Summarization

In [29]:
dicto_summary = {
        "60-word summary": summarizer.summarize_60_words(texte),
        "Free-format summary": summarizer.summarize_free_format(texte),
        "Structured summary": summarizer.summarize_structured(texte)
    }

In [30]:
dicto_summary["60-word summary"]

"```spk0, spk1, and spk2 discuss Modjo's features, sales performance, and onboarding. They address call tagging, objection handling, and AI usage. Pricing and user feedback are key points. They plan a follow-up meeting for further discussion. The conversation emphasizes Modjo's potential for improving sales efficiency and collaboration.```"

In [31]:
dicto_summary["Free-format summary"]

"The conversation involved three participants: Celeste, Merlin, and Arthur. They discussed various topics related to improving sales performance, particularly through the use of Modjo, an AI tool. Celeste highlighted the importance of understanding the level of knowledge and objectives for the call. They touched on topics such as improving commercial performance through AI, workshops for handling objections, and tracking sales performance metrics like conversion rates and revenue targets.\n\nMerlin and Arthur shared insights on their operational processes, including conducting workshops, tracking objections, and setting sales targets. They discussed the challenges of note-taking during calls and the need for better tools to enhance productivity. The conversation also delved into the pricing model for Modjo and the potential for future product enhancements without additional costs.\n\nThe participants agreed to schedule a follow-up meeting for a detailed presentation of Modjo's features

In [32]:
dicto_summary["Structured summary"]

"1. Purpose of the Call:\n- The main objective of the conversation was to discuss the performance improvement in commercial activities, particularly through the use of AI to enhance sales scripts and pitch demos.\n- The call aimed to provide context from previous discussions and establish the purpose of the group call.\n\n2. Key Points Discussed:\n- Discussion on improving commercial performance through AI, focusing on script similarity and pitch demo testing.\n- Operational responsibilities shared between team members, with a focus on workshops and objection handling.\n- Exploration of interactions with potential clients like Furnitures and SquareChair.\n- Review of current sales performance metrics, including conversion rates and revenue targets.\n- Challenges in tracking and analyzing sales contributions and objections.\n- Introduction to Modjo's capabilities in call analysis, note-taking, and CRM integration.\n- Business model discussions, including pricing, user licenses, and pote

# 2. Question Anwering

**Using GPT-3.5-turbo,** implement in python question answering method that take as input a json like the one you were provided, and that must be able to perform the answer the following questions:

1. What are the next steps?
2. Was budget mentioned, if so what was the amount?
3. Who is Lancelot? 
4. Which CRM does the prospect use? 
5. What are the main objections raised ?

## Version 1

In [33]:
class MarketingResearchAssistant_v1:
    """
    A virtual assistant for answering questions based on a given conversation using the OpenAI API.

    Attributes:
        openai_client (OpenAI): The OpenAI client instance.
        model (str): The model used for generating responses.
        system_message (dict): The system message containing the guidelines for generating responses.

    Methods:
        __init__(model): Initializes the assistant with the specified model.
        rag(query, conversation): Generates a response to the query based on the given conversation.
        set_model(model): Sets the model for the assistant.
        get_model(): Returns the current model of the assistant.
    """
    
    def __init__(self, model="gpt-3.5-turbo"):
        """
        Initializes the marketing research assistant with the specified model.

        Parameters:
            model (str): The model used for generating responses. Default is "gpt-3.5-turbo".
        """
        self.openai_client = OpenAI(api_key=OPENAI_API_KEY)
        self.model = model
        self.system_message = {
            "role": "system",
            "content": (
                "You are an expert AI assistant tasked with answering questions using only the information provided in the given conversation. Follow these guidelines: "
                "1. Carefully analyze the question and the provided conversation."
                "2. Answer solely based on the information in the conversation. Do not make assumptions or add outside information."
                "3. If the answer is not in the conversation, clearly state 'I cannot answer this question based on the provided conversation.' and said why"
                "4. Cite relevant parts of the conversation to support your answer."
                "5. If the conversation contains contradictory information, mention it and explain the different perspectives."
                "6. Structure your response logically:"
                "a) Start with a direct sentence answering the question."
                "b) Provide additional details or explanations if necessary."
                "c) Conclude by summarizing the key points."
                "7. Limit your response to three sentences maximum, unless more details are absolutely necessary for a complete and accurate answer."
                "8. If you are uncertain about any element of your answer, clearly indicate your level of certainty."
            )
        }

    def rag(self, query, conversation):
        """
        Generates a response to the query based on the given conversation.

        Parameters:
            query (str): The question to be answered.
            conversation (str): The conversation text to be analyzed.

        Returns:
            str: The generated response from the OpenAI API.
        """
        messages = [
            self.system_message,
            {
                "role": "user",
                "content": f"Question: {query}\nConversation: {conversation}"
            }
        ]

        try:
            response = self.openai_client.chat.completions.create(
                model=self.model,
                messages=messages,
            )
            return response.choices[0].message.content
        except Exception as e:
            print(f"An error occurred while processing the query: {e}")
            return None

    def set_model(self, model):
        """
        Sets the model for the assistant.

        Parameters:
            model (str): The model to be set.
        """
        self.model = model

    def get_model(self):
        """
        Returns the current model of the assistant.

        Returns:
            str: The current model used by the assistant.
        """
        return self.model


assistant = MarketingResearchAssistant_v1()

### Answering questions

In [34]:
query = "What are the next steps? "
assistant.rag(query, texte)

"The next steps involve setting up a meeting on Monday at 4:30 pm for a more in-depth discussion regarding Modjo, addressing questions about onboarding and product features. The pricing structure remains stable with no extra costs for additional features mentioned during the conversation.\n\nIn summary, the next steps include a detailed discussion on Monday at 4:30 pm to further explore Modjo's functionalities and onboarding processes. The pricing remains unchanged, ensuring no additional costs for new features."

In [35]:
query = "Was budget mentioned, if so what was the amount? "
assistant.rag(query, texte)

'The budget amount was not explicitly mentioned in the conversation. The discussion mainly focused on demonstrating the value proposition and the pricing model of the tool. There were references to a monthly price of 99 euros for five users, totaling around 495 euros without additional features or discounts. Further details were not provided regarding a specific budget amount mentioned.'

In [36]:
query = "Who is Lancelot? "
assistant.rag(query, texte)

'Answer: Lancelot is not mentioned or discussed in the conversation provided. \n\nI cannot answer this question based on the provided conversation.'

In [37]:
query = "Which CRM does the prospect use?  "
assistant.rag(query, texte)

'Answer: The prospect uses Hubspot as their CRM tool.  \nIn the conversation, the salesperson mentions, "Je suis à eux depuis trois mois maintenant et du coup, qui de mieux qu\'un utilisateur Modjo pourra vous partager des insights," indicating that the prospect uses Hubspot as their CRM.  \n'

In [38]:
query = "What are the main objections raised ? "
assistant.rag(query, texte)

'The main objections raised during the conversation pertain to the pricing and value proposition of the tool Modjo, with considerations about the ROI, the budget constraints, and the necessity for discounts or added features. Additionally, discussions around the effectiveness and setup process of the tool, especially regarding onboarding and future product developments, were also prominent topics. The conversation highlighted the importance of user involvement, trackable results, and the need for consistent performance improvement.'

## Version 2 (LangChain doc retriever with ChromaDB)

In [39]:
class DocumentProcessor:
    """
    A class to process documents by loading, splitting, and creating a vector store.

    Attributes:
        api_key (str): The API key for OpenAI.
        persist_directory (str): The directory to persist the vector store.
        embedding (OpenAIEmbeddings): The embeddings instance from OpenAI.

    Methods:
        load_and_split_documents(file_path): Loads and splits documents from the given file path.
        create_vector_store(texts): Creates and persists a vector store from the given texts.
    """
    
    def __init__(self):
        """
        Initializes the DocumentProcessor with the OpenAI API key and embedding.
        """
        self.api_key = OPENAI_API_KEY
        self.persist_directory = 'data'
        self.embedding = OpenAIEmbeddings()

    def load_and_split_documents(self, file_path):
        """
        Loads and splits documents from the given file path.

        Parameters:
            file_path (str): The path to the document file.

        Returns:
            list: A list of split document chunks.
        """
        loader = TextLoader(file_path)
        documents = loader.load()
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200,
            separators=["\n"])
        return text_splitter.split_documents(documents)

    def create_vector_store(self, texts):
        """
        Creates and persists a vector store from the given texts.

        Parameters:
            texts (list): A list of text documents.

        Returns:
            Chroma: The created and persisted vector store.
        """
        vectordb = Chroma.from_documents(
            documents=texts,
            embedding=self.embedding,
            persist_directory=self.persist_directory
        )
        vectordb.persist()
        return Chroma(
            persist_directory=self.persist_directory,
            embedding_function=self.embedding
        )

class MarketingResearchAssistant_v2:
    """
    A virtual assistant for answering questions based on a given context using a retrieval chain.

    Attributes:
        llm (ChatOpenAI): The language model used for generating responses.

    Methods:
        create_retrieval_chain(vectordb): Creates a retrieval chain using the given vector store.
        process_query(chain, query): Processes the query using the given chain and returns the response.
    """

    def __init__(self):
        """
        Initializes the marketing research assistant with a ChatOpenAI instance.
        """
        self.llm = ChatOpenAI(temperature=0.0, api_key=OPENAI_API_KEY)

    def create_retrieval_chain(self, vectordb):
        """
        Creates a retrieval chain using the given vector store.

        Parameters:
            vectordb (Chroma): The vector store to be used for retrieval.

        Returns:
            RetrievalChain: The created retrieval chain.
        """
        retriever = vectordb.as_retriever()
        system_prompt = (
            "You are an expert AI assistant tasked with answering questions using only the information provided in the given context. Follow these guidelines: "
            "1. Carefully analyze the question and the provided context."
            "2. Answer solely based on the information in the context. Do not make assumptions or add outside information."
            "3. If the answer is not in the context, clearly state 'I cannot answer this question based on the provided context.' and said why"
            "4. Cite relevant parts of the context to support your answer."
            "5. If the context contains contradictory information, mention it and explain the different perspectives."
            "6. Structure your response logically:"
            "a) Start with a direct sentence answering the question."
            "b) Provide additional details or explanations if necessary."
            "c) Conclude by summarizing the key points."
            "7. Limit your response to three sentences maximum, unless more details are absolutely necessary for a complete and accurate answer."
            "8. If you are uncertain about any element of your answer, clearly indicate your level of certainty."
            "Context: {context}"
        )
        prompt = ChatPromptTemplate.from_messages([
            ("system", system_prompt),
            ("human", "{input}"),
        ])
        question_answer_chain = create_stuff_documents_chain(self.llm, prompt)
        return create_retrieval_chain(retriever, question_answer_chain)

    def process_query(self, chain, query):
        """
        Processes the query using the given chain and returns the response.

        Parameters:
            chain (RetrievalChain): The retrieval chain to be used for processing the query.
            query (str): The query to be processed.

        Returns:
            str: The generated response from the retrieval chain.
        """
        llm_response = chain.invoke({"input": query})
        return llm_response['answer']


doc_processor = DocumentProcessor()
query_processor = MarketingResearchAssistant_v2()

# Traitement des documents
texts = doc_processor.load_and_split_documents("data/conversation.txt")
vectordb = doc_processor.create_vector_store(texts)

# Traitement de la requête
chain = query_processor.create_retrieval_chain(vectordb)

### Answering questions

In [40]:
result = query_processor.process_query(chain, "Was budget mentioned, if so what was the amount?")
print(result)

I cannot answer this question based on the provided context.


In [41]:
result = query_processor.process_query(chain, "What are the next steps?")
print(result)

The next steps involve progressing towards a specific number by 2024 to achieve the set objective, despite a significant gap between the high and low points. The focus seems to be on reaching a certain target to fulfill the goal, indicating a need for strategic planning and actions to bridge the existing gap.


In [42]:
result = query_processor.process_query(chain, "Who is Lancelot?")
print(result)

I cannot answer this question based on the provided context.


In [43]:
result = query_processor.process_query(chain, "Which CRM does the prospect use?")
print(result)

I cannot answer this question based on the provided context.


In [44]:
result = query_processor.process_query(chain, "What are the main objections raised?")
print(result)

I cannot answer this question based on the provided context.


Concise comparison of MarketingResearchAssistant v1 and v2 methods:

Approach:
- v1: Direct use of OpenAI API
- v2: Retrieval chain with vector storage

Data processing:
- v1: No preprocessing, entire conversation provided per query
- v2: Document preprocessing (loading, splitting, vectorization)

Storage and retrieval:
- v1: No storage, on-the-fly processing
- v2: Persistent vector storage (Chroma)

v2 advantages: Increased efficiency for large volumes, precise contextual retrieval, potentially faster for repeated queries

v2 disadvantages: Increased complexity, higher initial resource requirements

Both methods maintain similar guidelines for generating consistent and structured responses.