# Mojo 

Using GPT-3 for summarization and prompt answering

## How to run

This jupyter notebook requires the following libraries (can be installed via pip)
```pip
json
configparser
langchain
openai
tiktoken
```

- Install "demo-segments.json" in the same repository
- Create a file config.cfg with the following pattern
```cfg
[OPENAI]
OPENAI_API_KEY = <token>
```


## Summarization

We posess a json documents that corresponds to the transcript (time-dated) of a conversation. We want to obtain 
1. a 60 word max summary.
2. A free-format summary used by the callers' peers to know what happened during the call.

### 1. 60 word summary

Let us first check out the json

In [1]:
import json

with open("demo-segments.json") as f:
    data = json.load(f)
data

{'language': 'fr-FR',
 'isStereo': False,
 'jobname': 'modjo-zoom-b2a540b4da309e1f',
 'segments': [{'startTime': 13.62,
   'endTime': 15.36,
   'content': 'Oui bonjour Celeste.',
   'speaker': 'spk2',
   'uuid': 'd4478b0c-ae03-4ee8-bb08-a68a9a357b5b',
   'blockUuid': '9e6a9250-9fd7-4b49-9b76-0df57ec043a6'},
  {'startTime': 15.58,
   'endTime': 18.24,
   'content': 'Bonjour Merlin. Comment vas-tu ?',
   'speaker': 'spk0',
   'uuid': '6b37c91b-6ba0-4911-b551-6338c042988d',
   'blockUuid': 'd340f539-7c41-4a7c-afb7-68b099ef3f70'},
  {'startTime': 18.24,
   'endTime': 19.6,
   'content': 'Très bien et toi ?',
   'speaker': 'spk2',
   'uuid': 'e133d694-bec6-47d1-8190-8fc1e1c71231',
   'blockUuid': '2383fb4d-fb85-4e37-a66e-875d1600303e'},
  {'startTime': 19.6,
   'endTime': 25.98,
   'content': "Super. Écoute, ça va super. Je suis ravie de commencer mon après-midi avec toi. Je vois que la caméra n'a",
   'speaker': 'spk0',
   'uuid': '7976e5e3-af10-4130-aeda-0426e023ce39',
   'blockUuid': '96

Based on this excerpt, there can be the following strategies:
- Directly ask chatgpt for a summary if the context is not too big
- Use a local db of vectors with the json contents and then ask for a summary
- Get only the text with a "-" symbol whenever there is a speaker change and then get the summary (with or without local vectors)

The prompt must include the limit of 60 words as a constraint.

In [2]:
def organize_json(json_dict):
    out = ""
    tmp = ""
    previous_speaker = ""
    for block in json_dict["segments"]:
        speaker = block["speaker"]
        if speaker == previous_speaker:
            tmp += " " + block["content"]
        else:
            out += tmp
            tmp = "\n- " + block["content"]
            previous_speaker = speaker
    return out


textified_conversation = organize_json(data)
print(textified_conversation)


- Oui bonjour Celeste.
- Bonjour Merlin. Comment vas-tu ?
- Très bien et toi ?
- Super. Écoute, ça va super. Je suis ravie de commencer mon après-midi avec toi. Je vois que la caméra n'a pas été réparée. Je sais pas si t'es tout seul ou t'es avec...
- Non, là je t'avoue que avec le gros brume que j'ai...
- Ah oui ! Ok, pas de problème. Non, mais je ne savais pas si du coup, Arthur et l'autre Merlin se connectaient derrière toi ou s'il y avait d'autres personnes qui arrivaient.
- Non, alors Merlin, je ne sais pas, mais Arthur, ben voilà, il est là, parfait.
- Ben voilà. Bonjour Arthur, enchanté. Salut Arthur.
- Je suis là, bonjour Celeste.
- Ah c'est Merlin, parfait. Donc, je suis juste contente de reposer le contexte par rapport à notre échange de la dernière fois et puis l'objectif de cette petite call tous ensemble.
- Oui, effectivement, l'idée c'était, si ça vous va, l'agenda du coup juste que je vous propose effectivement, c'est peut-être juste de faire un petit tour de table avan

In [3]:
import configparser

# import openai
from langchain.text_splitter import RecursiveCharacterTextSplitter

# from langchain.llms import OpenAI
from langchain.chat_models import (
    ChatOpenAI,
)  # bug with langchain_openai library on my machine
from langchain.chains.question_answering import load_qa_chain
from langchain.schema import Document

config = configparser.ConfigParser()
config.read("config.cfg")
OPENAI_API_KEY = config["OPENAI"]["OPENAI_API_KEY"]
# openai.api_key = OPENAI_API_KEY

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_text(textified_conversation)
documents = [Document(page_content=text) for text in texts]
# llm = OpenAI(openai_api_key=OPENAI_API_KEY)
llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name="gpt-3.5-turbo-16k")
qa_chain = load_qa_chain(llm, chain_type="map_reduce")

question = """Provide a summary in maximum 60 words of what happened during the call, given the transcript provided. Use the language of the transcript for summarization"""
answer = qa_chain.run(input_documents=documents, question=question)
print(answer)

  warn_deprecated(
  warn_deprecated(


During the call, the participants discussed the benefits and implementation of Modjo, a tool for transparently sharing interactions with prospects and clients. They emphasized the importance of improving commercial performance, enhancing discourse, tracking sales activities, and using AI for note-taking and coaching. The speaker mentioned budget limitations and negotiations, the onboarding process, and scheduling a meeting for further discussion.


In [4]:
# Checking that the 60 words condition is respected
len(answer.split(" "))

59

In [14]:
# # Trying directly though a prompt template
# from langchain.chains.llm import LLMChain
# from langchain.chains.combine_documents.stuff import StuffDocumentsChain
# from langchain_core.prompts import PromptTemplate
# query_template = f"""
# Write a concise summary of the following text, in fewer than 60 words using the same language as the original text which is here below:
# "{textified_conversation}"
# consise summary:
# """

# llm_chain = LLMChain(llm=llm, prompt=PromptTemplate(input_variables=[textified_conversation], template=query_template
#         ))
# result = llm_chain.run("summary")

# # failure: too long text?

In [18]:
# Problem using jq library on windows... Hence using own definition of the JSOLoader following https://github.com/langchain-ai/langchain/issues/4396
import json
from pathlib import Path
from typing import Callable, Dict, List, Optional, Union

from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader


# def organize_json(json_dict):
#     out = ""
#     tmp = ""
#     previous_speaker = ""
#     for block in json_dict["segments"]:
#         speaker = block["speaker"]
#         if speaker == previous_speaker:
#             tmp += " " + block["content"]
#         else:
#             out += tmp
#             tmp = "\n- " + block["content"]
#             previous_speaker = speaker
#     return out


class JSONLoader(BaseLoader):
    def __init__(
        self,
        file_path: Union[str, Path],
        metadata_func: Optional[Callable[[dict, dict], dict]] = None,
    ):
        """
        Initializes the JSONLoader with a file path, an optional content key to extract specific content,
        and an optional metadata function to extract metadata from each record.
        """
        self.file_path = Path(file_path).resolve()
        self.metadata_func = metadata_func

    def create_documents(self, processed_data):
        """
        Creates Document objects from processed data.
        """
        documents = []
        for item in processed_data:
            content = item.get("content", "")
            metadata = item.get("metadata", {})
            document = Document(page_content=content, metadata=metadata)
            documents.append(document)
        return documents

    def process_json(self, data):
        """
        Processes JSON data to prepare for document creation, extracting content based on the content_key
        and applying the metadata function if provided.
        """
        processed_data = []
        for item in data["segments"]:
            content = item["content"]
            metadata = {"speaker": item["speaker"]}
            processed_data.append({"content": content, "metadata": metadata})
        return processed_data

    def load(self) -> List[Document]:
        """
        Load and return documents from the JSON file.
        """
        docs = []
        with open(self.file_path, mode="r", encoding="utf-8") as json_file:
            try:
                data = json.load(json_file)
                processed_json = self.process_json(data)
                docs = self.create_documents(processed_json)
            except json.JSONDecodeError:
                print("Error: Invalid JSON format in the file.")
        return docs

In [20]:
# Trying directly with the full json and appropriate langchain tools
# from langchain_community.document_loaders import JSONLoader

loader = JSONLoader(file_path="demo-segments.json")
docs = loader.load()
docs

[Document(page_content='Oui bonjour Celeste.', metadata={'speaker': 'spk2'}),
 Document(page_content='Bonjour Merlin. Comment vas-tu ?', metadata={'speaker': 'spk0'}),
 Document(page_content='Très bien et toi ?', metadata={'speaker': 'spk2'}),
 Document(page_content="Super. Écoute, ça va super. Je suis ravie de commencer mon après-midi avec toi. Je vois que la caméra n'a", metadata={'speaker': 'spk0'}),
 Document(page_content="pas été réparée. Je sais pas si t'es tout seul ou t'es avec...", metadata={'speaker': 'spk0'}),
 Document(page_content="Non, là je t'avoue que avec le gros brume que j'ai...", metadata={'speaker': 'spk2'}),
 Document(page_content="Ah oui ! Ok, pas de problème. Non, mais je ne savais pas si du coup, Arthur et l'autre Merlin", metadata={'speaker': 'spk0'}),
 Document(page_content="se connectaient derrière toi ou s'il y avait d'autres personnes qui arrivaient.", metadata={'speaker': 'spk0'}),
 Document(page_content='Non, alors Merlin, je ne sais pas, mais Arthur, be

In [35]:
from langchain.chains.llm import LLMChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain_core.prompts import PromptTemplate

# document_prompt = PromptTemplate(
#     input_variables=["page_content", "metadata"], template="""{metadata["speaker"]}: {page_content}"""
# )
document_prompt = PromptTemplate(
    input_variables=["page_content", "metadata"],
    template="""- {speaker}: {page_content}""",
)
document_variable_name = "context"

query = f"""
Write a concise summary of the following text, in fewer than 60 words using the same language as the original text:french
"""

stuff_prompt_override = """Given this text extracts:
-----
{context}
-----
Please answer the following question:
{query}"""

prompt = PromptTemplate(
    template=stuff_prompt_override, input_variables=["context", "query"]
)

llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name="gpt-3.5-turbo-16k")

# Instantiate the chain
llm_chain = LLMChain(llm=llm, prompt=prompt)

chain = StuffDocumentsChain(
    llm_chain=llm_chain,
    document_prompt=document_prompt,
    document_variable_name=document_variable_name,
)

chain.run(input_documents=docs, query=query)

BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 16385 tokens. However, your messages resulted in 24790 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}

In [44]:
# As the context is too long for this model (but not to much, we used up to 16/21 = 1.3 times the maximum amount of token, so it's only a 30% overflow)
# Let us try reordering documents and loading from a chroma DB with
from langchain_community.document_transformers import (
    LongContextReorder,
)
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

retriever = Chroma.from_documents(
    docs, embedding=OpenAIEmbeddings(api_key=OPENAI_API_KEY)
).as_retriever(
    search_kwargs={"k": int(794 / 1.4)}  # taking into account the previous remark
)
query = "Write a concise summary of the following text, in fewer than 60 words using the same language as the original text: french"

# Get relevant documents ordered by relevance score
docs2 = retriever.invoke(query)

reordering = LongContextReorder()
reordered_docs = reordering.transform_documents(docs2)

chain.run(input_documents=reordered_docs, query=query)

"La conversation concerne la prise de notes et la synthèse des appels téléphoniques afin d'améliorer la productivité et les performances des commerciaux. L'objectif est de créer un outil d'intelligence conversationnelle qui analyse les appels et génère des résumés automatiques pour faciliter la formation et le coaching des équipes de vente."

In [46]:
# Problem using jq library on windows... Hence using own definition of the JSOLoader following https://github.com/langchain-ai/langchain/issues/4396
import json
from pathlib import Path
from typing import Callable, Dict, List, Optional, Union

from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader


class JSONLoader(BaseLoader):
    def __init__(
        self,
        file_path: Union[str, Path],
        metadata_func: Optional[Callable[[dict, dict], dict]] = None,
    ):
        """
        Initializes the JSONLoader with a file path, an optional content key to extract specific content,
        and an optional metadata function to extract metadata from each record.
        """
        self.file_path = Path(file_path).resolve()
        self.metadata_func = metadata_func

    def create_documents(self, processed_data):
        """
        Creates Document objects from processed data.
        """
        documents = []
        for item in processed_data:
            content = item.get("content", "")
            metadata = item.get("metadata", {})
            document = Document(page_content=content, metadata=metadata)
            documents.append(document)
        return documents

    def process_json(self, data):
        """
        Processes JSON data to prepare for document creation, extracting content based on the content_key
        and applying the metadata function if provided.
        """
        processed_data = []
        tmp = ""
        previous_speaker = ""
        for item in data["segments"]:
            content = item["content"]
            speaker = item["speaker"]
            if speaker == previous_speaker:
                tmp += " " + content
            else:
                processed_data.append(
                    {"content": tmp, "metadata": {"speaker": previous_speaker}}
                )
                tmp = content
                previous_speaker = speaker
        return processed_data

    def load(self) -> List[Document]:
        """
        Load and return documents from the JSON file.
        """
        docs = []
        with open(self.file_path, mode="r", encoding="utf-8") as json_file:
            try:
                data = json.load(json_file)
                processed_json = self.process_json(data)
                docs = self.create_documents(processed_json)
            except json.JSONDecodeError:
                print("Error: Invalid JSON format in the file.")
        return docs

In [47]:
loader = JSONLoader(file_path="demo-segments.json")
docs = loader.load()
docs

[Document(page_content='', metadata={'speaker': ''}),
 Document(page_content='Oui bonjour Celeste.', metadata={'speaker': 'spk2'}),
 Document(page_content='Bonjour Merlin. Comment vas-tu ?', metadata={'speaker': 'spk0'}),
 Document(page_content='Très bien et toi ?', metadata={'speaker': 'spk2'}),
 Document(page_content="Super. Écoute, ça va super. Je suis ravie de commencer mon après-midi avec toi. Je vois que la caméra n'a pas été réparée. Je sais pas si t'es tout seul ou t'es avec...", metadata={'speaker': 'spk0'}),
 Document(page_content="Non, là je t'avoue que avec le gros brume que j'ai...", metadata={'speaker': 'spk2'}),
 Document(page_content="Ah oui ! Ok, pas de problème. Non, mais je ne savais pas si du coup, Arthur et l'autre Merlin se connectaient derrière toi ou s'il y avait d'autres personnes qui arrivaient.", metadata={'speaker': 'spk0'}),
 Document(page_content='Non, alors Merlin, je ne sais pas, mais Arthur, ben voilà, il est là, parfait.', metadata={'speaker': 'spk2'})

In [70]:
retriever = Chroma.from_documents(
    docs, embedding=OpenAIEmbeddings(api_key=OPENAI_API_KEY)
).as_retriever(search_kwargs={"k": 200})
query = "Write a concise summary of the following text, in fewer than 60 words (this is a hard requirement: the summary cannot exceed 60 words) using the same language as the original text: french"

# Get relevant documents ordered by relevance score
docs2 = retriever.invoke(query)

reordering = LongContextReorder()
reordered_docs = reordering.transform_documents(docs2)

result = chain.run(input_documents=reordered_docs, query=query)
print(f"""The results is {len(result.strip().split(" "))} words long""")
result

The results is 65 words long


"L'IA est utilisée pour faire un résumé automatique des appels téléphoniques, en captant les informations importantes telles que les prochaines étapes, les objections et les problèmes des prospects. Cela permet de créer un compte rendu synthétique et homogène de l'échange, facilitant ainsi la prise de notes et l'accompagnement des commerciaux dans leurs réponses. L'IA permet également de repérer des mots clés et d'analyser les performances."

This is not a bad summary at all despite the quite unconventional use of one document per line of dialog.

Looking for alternatives to summarize documents based on [this article](https://medium.com/@abonia/summarization-with-langchain-b3d83c030889#:~:text=Map%2DReduce%20Method&text=The%20map_reduce%20technique%20is%20designed,to%20create%20a%20final%20summary.)

1. The Stuffchain works best in the scenario when documents are very big. It summarizes individually each document, then combines the summaries. In our case this could be interesting to try to avoid the token limit we dealt with above. We could interpret the whole conversation as a single document, then cut it in documents just big enough for the context window to handle. The major drawback with this approach that I can see is that we can potentially cut an important dialog midway and lose the "order" of the steps in the conversation if it is really long. Also it supposedly does not scale well with large amounts of documents
2. The Map-Reduce Method works a similar fashion even at document level it will work out a summary per chunk. It is supposedly more scalable but requires a bigger setup. It may be more indicated here for a summarization task.
3. Refine: can "update" a summary based on new information. This is not exactly what we are looking for, except based on [langchain's tutorial](https://python.langchain.com/v0.1/docs/use_cases/summarization/#quickstart)

Let us try the load_summarization tool from langchain using the full document as input as in [this tuto](https://dxiaochuan.medium.com/summarising-your-meeting-with-chatgpt-and-langchain-8eb646cfcdd1) which shows good results

In [114]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain


def retranscription_from_json(json_dict):
    out = ""
    tmp = ""
    previous_speaker = ""
    for block in json_dict["segments"]:
        speaker = block["speaker"]
        if speaker == previous_speaker:
            tmp += " " + block["content"]
        else:
            out += tmp
            tmp = f"""\n - {speaker}: """ + block["content"]
            previous_speaker = speaker
    return out


textified_conversation = retranscription_from_json(data)

# Choosing appropriate chunk size to avoid the maximum limit size specified by OpenAI ChatGPT 3.5
target_len = 300  # for each intermediary summary
chunk_size = 3000  # <4k
chunk_overlap = 200

text_splitter = CharacterTextSplitter(
    separator="\n -",
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap,
    length_function=len,
)

texts = text_splitter.split_text(
    textified_conversation,
)

# Create Document objects for the texts
docs = [Document(page_content=t) for t in texts[:]]

llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name="gpt-3.5-turbo-16k")

prompt_template = """Act as a professional technical meeting minutes writer. 
Tone: formal
Format: Technical meeting summary
Length:  300
Tasks:
- highlight action items and owners
- highlight the agreements
- Use bullet points (maximum 20)
{text}
CONCISE SUMMARY IN FRENCH:"""
prompt = PromptTemplate(template=prompt_template, input_variables=["text"])
refine_template = (
    "Your job is to produce a final summary\n"
    "We have provided an existing summary up to a certain point: {existing_answer}\n"
    "We have the opportunity to refine the existing summary"
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{text}\n"
    "------------\n"
    f"Given the new context, refine the original summary in French within {target_len} words: following the format"
    "Title: <title of the meeting>"
    "Participants: <participants>"
    "Discussed: <Discussed-items-with-bullet-points>"
    "Follow-up actions: <a-list-of-follow-up-actions-with-owner-names>"
    f"Conclusion: <a-global-summary-with-maximum-60-words-and-much-less-if-possible>"
    "------------\n"
    "If the context isn't useful, return the original summary. Highlight agreements and follow-up actions and owners in the answer."
    "if there are any participants that correpond to generic speakers such as spk0, spk1, spk2 do not include them in the list of participnts"
    f"The final summary must not exceed 60 words."
)
refine_prompt = PromptTemplate(
    input_variables=["existing_answer", "text"],
    template=refine_template,
)
chain = load_summarize_chain(
    llm,
    chain_type="refine",
    return_intermediate_steps=True,
    question_prompt=prompt,
    refine_prompt=refine_prompt,
)
resp = chain({"input_documents": docs}, return_only_outputs=True)
print(resp["output_text"])

Created a chunk of size 3001, which is longer than the specified 3000


Title: Réunion de mise en œuvre de Modjo
Participants: Celeste, Merlin, Arthur, Lancelot
Discuté:
- Problèmes techniques avec la caméra
- Récapitulatif de la discussion précédente avec Merlin
- Potentiel de Modjo pour améliorer les ventes et le service client chez RoundTable
- Planification de proposition de tarification personnalisée
- Formalisation des routines de coaching et du processus
- Ateliers de formation commerciale et scripts
- Objectifs d'amélioration des performances et défis
- Suivi des contributions aux ventes et individualisation
- Amélioration du processus de prise de notes et système automatisé de prise de notes
- Intégration de Modjo avec des outils VOIP, Visio et CRM
- Transcription automatique des conversations pour l'analyse des performances
- Discussion sur l'importance des différentes parties de l'appel et des objections
Actions de suivi:
- Celeste fera un suivi avec Merlin et Arthur sur la fonctionnalité de la caméra
- Celeste planifiera une réunion avec Lancel

In [None]:
# not bad at all. Reading myself the Json, I believe Lancelot was not there though (when I rerun the code sometimes he gets out of the list sometimes in)

In [118]:
################### Trying out the same thing for the 60-word summary -> does not respect the 60words constraint somehow
# Choosing appropriate chunk size to avoid the maximum limit size specified by OpenAI ChatGPT 3.5

llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name="gpt-3.5-turbo-16k")

prompt_template = """Act as a professional technical meeting minutes writer. 
Tone: formal
Format: Technical meeting summary
Length:  60-80 words
{text}
CONCISE SUMMARY IN FRENCH:"""
prompt = PromptTemplate(template=prompt_template, input_variables=["text"])
refine_template = (
    "Your job is to produce a final summary\n"
    "We have provided an existing summary up to a certain point: {existing_answer}\n"
    "We have the opportunity to refine the existing summary"
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{text}\n"
    "------------\n"
    "Given the new context, refine the original summary in French with minimum 50 words and maximum 80 words"
)
refine_prompt = PromptTemplate(
    input_variables=["existing_answer", "text"],
    template=refine_template,
)
chain = load_summarize_chain(
    llm,
    chain_type="refine",
    return_intermediate_steps=True,
    question_prompt=prompt,
    refine_prompt=refine_prompt,
)
resp = chain({"input_documents": docs}, return_only_outputs=True)
print(resp["output_text"])

Modjo est un outil de coaching et d'analyse des appels qui permet de simplifier et d'optimiser les interactions commerciales. Son intelligence artificielle enregistre les appels et génère automatiquement un résumé avec les prochaines étapes et les objections. De plus, il offre des fonctionnalités d'analyse et de formation continue pour aider les managers à coacher leurs équipes. Gymlib est un client satisfait de Modjo, qui a constaté une amélioration de son processus de vente et une économie de temps. Des négociations sont possibles pour réduire le prix de la licence mensuelle de 99 euros.


In [119]:
len(resp["output_text"])

596

### 2. Free-format summary

When thinking about how the summary will be used, we can improve a little bit by providing the following elements:
- Asking GPT the summary from above in the form of bullet points
- Asking chatgpt what's the topic of the discussion and generate a title based on it.
- Getting the names of all the speakers (directly filtering through the json "speaker field" and asking gpt to find the name), summarizing their contributions.
- Asking GPT for
    - problem raised if any
    - problem solved if any
    - future action that must be taken


Anything else would be context specific: for exemple in the context of sales, tracing customers to a CRM or in the context of software development tracing problems to issues raised in a ticket-system. We won't tackle this here.

In [44]:
question = "Can you find a title for this conversation (using the same language as the provided transcript)?"
title = qa_chain.run(input_documents=documents, question=question)

question = """Provide a summary given the transcript provided in 60 words max. Use the language of the transcript for summarization. Make a maximum of 10 bullet points in this summary with the key elements. Use the same language as the provided transcript"""
summary = qa_chain.run(input_documents=documents, question=question)

question = """Provide for each speaker in the provided transcript its main impact in the meeting in the form: - speaker_name : impact with a bullet point for each speaker"""
speakers_summary = qa_chain.run(input_documents=documents, question=question)

question = """Provide a conclusion for the meeting: with max 10 bullet points, mention the problems solved, the problems raised, and future actions decided in this meeting"""
conclusion = qa_chain.run(input_documents=documents, question=question)

In [45]:
output = f"""
title: {title}

summary: {summary}

speakers_summary: {speakers_summary}

conclusions: {conclusion}
"""

print(output)


title: Title: "Discussion de planification pour une réunion"

summary: - Celeste discusses the repair of the camera and greets Merlin and Arthur.
- The purpose of the call is to discuss knowledge, scopes, and goals at RoundTable.
- The speaker wants visibility on the business model and a customized pricing proposal.
- Modjo is a tool to improve sales and CSM goals by enhancing sales techniques and addressing objections.
- The team is working on scripts and speech techniques to increase sales performance.
- Workshops and call reviews are conducted to improve performance and reach sales goals.
- Modjo is a tool for transcribing and analyzing conversations to improve performance and productivity.
- The tool integrates with VOIP tools, Visio tools, and Hubspot CRM.
- Modjo helps in understanding pitches, objection handling, and successful strategies.
- The speaker emphasizes the importance of data visibility, collaboration, and training within teams.

speakers_summary: - Celeste: Confirms

This is not super satisfactory... It would make more sense to try other input formats... For exemple one "Page" per sentence instead of he recursive splitter may be more indicated here.
The conclusions as well missed the next call decided at the end. Maybe there should be several dedicated prompts for actionables like "next meetings" and these could be aggregated in a neat fashion.

## Question answering

Using GPT-3.5-turbo provide a method that can answer the following questions using the provided json path
1. What are the next steps?
2. Was budget mentioned, if so what was the amount?
3. Who is Lancelot? 
4. Which CRM does the prospect use? 
5. What are the main objections raised ?

In [41]:
def qa_method(json_path: str, question: str) -> str:
    """
    Answer the question on the transcript

    Parameters
    ----------
    - `json_path` (str) : path of the json transccript
    - `question` (str) : plain text question

    Returns
    -------
    - (str): the reply to the question
    """
    with open(json_path) as f:
        data = json.load(f)
    textified_conversation = organize_json(data)
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    texts = text_splitter.split_text(textified_conversation)
    documents = [Document(page_content=text) for text in texts]
    llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name="gpt-3.5-turbo-16k")
    qa_chain = load_qa_chain(llm, chain_type="map_reduce")
    answer = qa_chain.run(input_documents=documents, question=question)
    return answer


answer = qa_method("demo-segments.json", "What are the next steps")
print(answer)

The next steps involve analyzing the data to determine the typology of calls and the individuals driving the push. The goal is to gain visibility on this information. This allows the managers to identify who is effectively pushing the live note and making sales. They can then use this information to coach the rest of the team who may be struggling to push certain offers. The process also involves continuous training and listening to qualifications and demos from top performers using Modjo.


In [47]:
answer = qa_method("demo-segments.json", "Who is Lancelot? What's his job")
print(answer)

I don't know who Lancelot is or what his job is. The given text does not provide any information about Lancelot or his job.


How disappointing... This is probably because the text should not be cut with a recursivecurser but rather "line by line"

In [52]:
def qa_method(json_path: str, question: str) -> str:
    """
    Answer the question on the transcript

    Parameters
    ----------
    - `json_path` (str) : path of the json transccript
    - `question` (str) : plain text question

    Returns
    -------
    - (str): the reply to the question
    """
    with open(json_path) as f:
        data = json.load(f)
    textified_conversation = organize_json(data)
    documents = [
        Document(page_content=text) for text in textified_conversation.split("- ")
    ]
    llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name="gpt-3.5-turbo-16k")
    qa_chain = load_qa_chain(llm, chain_type="map_reduce")
    answer = qa_chain.run(input_documents=documents, question=question)
    return answer


answer = qa_method("demo-segments.json", "Who is Lancelot? What's his job")
print(answer)

I don't know who Lancelot is or what his job is. The provided text does not contain any information about Lancelot or his job.


Looks like the problem is deeper, maybe the QA chain is in cause here. I would investigate how it works and check how I can make sure important bits of information are not ignored.

In [None]:
#################################### Going further

In [121]:
# Using the same character split as above in the refineing/load_summarize_qa_chain
docs

[Document(page_content="spk2: Oui bonjour Celeste.\n - spk0: Bonjour Merlin. Comment vas-tu ?\n - spk2: Très bien et toi ?\n - spk0: Super. Écoute, ça va super. Je suis ravie de commencer mon après-midi avec toi. Je vois que la caméra n'a pas été réparée. Je sais pas si t'es tout seul ou t'es avec...\n - spk2: Non, là je t'avoue que avec le gros brume que j'ai...\n - spk0: Ah oui ! Ok, pas de problème. Non, mais je ne savais pas si du coup, Arthur et l'autre Merlin se connectaient derrière toi ou s'il y avait d'autres personnes qui arrivaient.\n - spk2: Non, alors Merlin, je ne sais pas, mais Arthur, ben voilà, il est là, parfait.\n - spk0: Ben voilà. Bonjour Arthur, enchanté. Salut Arthur.\n - spk1: Je suis là, bonjour Celeste.\n - spk2: Ah c'est Merlin, parfait. Donc, je suis juste contente de reposer le contexte par rapport à notre échange de la dernière fois et puis l'objectif de cette petite call tous ensemble.\n - spk0: Oui, effectivement, l'idée c'était, si ça vous va, l'agenda 

In [132]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import RetrievalQA

vectorstore = Chroma.from_documents(
    documents=docs, embedding=OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
)
prompt_template = """Act as a professional technical meeting summarizer.
Tone: formal
Format: Question Answering, concise.
Length:  The minimum amount of words required up to 100. No more.
Point of View: act as an external tool, you are not part of a specific team. You don't say "you", you just mention the name of the actors at play.
Use the following pieces of context provided (a transcription of a professional videocall) to answer the question at the end.
If you don't know the answer just say it, don't make up an answer.
Answer the question precisely without adding unrelated information.
-----------\n
context: {context}
-----------\n
CONCISE AND HELPFUL ANSWER IN FERNCH:"""
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectorstore.as_retriever(),
    chain_type_kwargs={"prompt": PromptTemplate.from_template(prompt_template)},
)
answer = qa_chain({"query": "Who is Lancelot?"})
print(answer["result"])

Lancelot est le CEO de SquareChair et il est dans le même incubateur que vous.


In [133]:
answer = qa_chain(
    {
        "query": "What are the next steps as follow-up to the current conversation that people agree to throughout the conversation?"
    }
)
print(answer["result"])

La question posée à la fin de la réunion est de savoir s'il faut mettre en place cet outil maintenant ou attendre que l'équipe soit plus organisée pour l'améliorer sur une base solide.


In [134]:
answer = qa_chain(
    {
        "query": """Was there any clear mention of a budget? If so answer with the following format:
                   budget = <budget here with dots between every thousand expressed in euros> € 
                   If there was no budget don't make up an answer, say it clearly"""
    }
)
print(answer["result"])

Le budget actuel de l'entreprise est serré et il n'est pas possible de dépenser 1000 euros ou même 500 euros par mois pour l'outil en question. Ils utilisent déjà Intercom et Hubspot, qui sont coûteux. En ce qui concerne les utilisateurs, ils envisagent de commencer avec 4 licences et si le manager est inclus, cela passerait à 5.


In [136]:
answer = qa_chain({"query": """What CRM does the prospect use"""})
print(answer["result"])

Le sujet principal de la réunion était l'intégration automatique des données dans le CRM HubSpot.


In [137]:
answer = qa_chain({"query": """What are the main objections raised"""})
print(answer["result"])

Le participant a mentionné qu'il était important d'être respecté et de se faire challenger par rapport à l'IA. Il souhaite que des propositions d'amélioration lui soient faites pour améliorer leur pitch.


In [None]:
# TODO:
# Creer structure de repo:
# cleaned_notebook.py
# fichier python pour la summary task
# fichier python pour la QA task
# fichier python final commente main.py
# 1 test par exp pour le jsonloader readme