# All endpoints for model v1

## Imports

In [26]:
import json
import os
from langchain.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
import ollama
from ollama import chat
import re

In [None]:
# MODEL 
MODEL = 'qwen2.5:7b'
# MODEL = 'qwen2.5:1.5b'

# Après plusieurs tests, avec 3000 il ne génère pas la recette du gateau au chocolat contrairement à -1
MAX_OUTPUT_TOKENS = 3000  # -1 = infinite

In [28]:
# To prelo!ad a model and leave it in memory (for faster inference)
!curl -X POST http://localhost:11434/api/generate -H "Content-Type: application/json" -d "{\"model\": \"qwen2.5:7b\", \"keep_alive\": -1}"
# To unload a model and free up memory
# !curl -X POST http://localhost:11434/api/generate -H "Content-Type: application/json" -d "{\"model\": \"qwen2.5:7b\", \"keep_alive\": 0}"

EMBEDDING_MODEL_NAME = "thenlper/gte-small"
# Load embeddings
embedding_model = HuggingFaceEmbeddings(
    model_name=EMBEDDING_MODEL_NAME,
    multi_process=True,
    model_kwargs={"device": "cpu"},  # replace 'cpu' by 'cuda' if you have Nvidia gpu
    encode_kwargs={"normalize_embeddings": True},  # Set `True` for cosine similarity
)
KNOWLEDGE_VECTOR_DATABASE = FAISS.load_local("../outputs/rag_embeddings_thenlper_gte-small", embedding_model, allow_dangerous_deserialization=True)

{"model":"qwen2.5:7b","created_at":"2025-03-27T10:01:18.5565449Z","response":"","done":true,"done_reason":"load"}


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   154  100   113  100    41    495    179 --:--:-- --:--:-- --:--:--   675


## Fake data using BDD for tests

In [29]:
# In this example the 3 questions were not from the same subcategory
# I let you write this function MAGB

questions_mcq_from_bdd = """ 
{
        "question": "An international application is published , together  with the search report drawn up by the CNIPA , 18 m onths + 1 day after the filing date (no priority claimed). The application has no more than 35 pages and 15 claims. Which statement reflects all actions the CN  applicant  needs to take for entry into the EP phase 25 months  after the filing date ? A request for early processing has been filed.",
        "options": [
            "A. Complete and file Form 1200 and pay the filing fee and the search fee",
            "B. Complete and file Form 1200 and pay the filing fee, the search fee and the renewal fee for the third year",
            "C. Complete and file Form 1200, pay the filing fee and the search fee and appoint a representative",
            "D. None of the above statements"
        ]
}
{
        "question": "A European patent application was filed on 6 February 2023. The search, filing and designation fees were paid within a month of the date of filing. What is the latest point in time for withdrawing the application, if the applicant wishes to obtain a refund of the designation fee?",
        "options": [
            "A. Six months after the date of mention of the publication of the European search report",
            "B. Date of mention of the publication of the European search report",
            "C. The designation fee was validly paid and can no longer be refunded",
            "D. Date of the start of substantive examination"
        ]
}
{
        "question": "On 10 October 2024, an applicant files a request for entry into the European phase, together with a debit order, according to which the filing fee, the designation fee, the examination fee and the renewal fee for the third year are to be debited from the applicant's deposit account. It is specified that the debit order is to be executed on 18 October 2024. On the evening of 10 October 2024, the applicant notices that the renewal fee is not yet due and should not be debited from the deposit account. What is the latest point in time for revoking the order to debit the renewal fee in Central Fee Payment (CFP)?",
        "options": [
            "A. 10 October 2024",
            "B. 17 October 2024",
            "C. 18 October 2024",
            "D. A debit order cannot be revoked in part"
        ]
}"""

questions_open_from_bdd = """ 
International application WO -X was filed at the EPO on 27 August 2024. No fees have been paid.  1. What fees are due on filing for WO -X? Fee amounts need not be mentioned. 2. What is the time limit for paying these fees? 3. What happens if these fees are not paid within the time limit, and what can you do about it?

On 25  October 2019, the Spanish University Isabel  II and the company Tomato Matters filed a European patent application in Spanish, accompanied by a translation into English. Tomato Matters employs more than 260 employees.  The University Isabel  II has filed two patent applications with the EPO over the past five years. On 10 October 2024, Tomato Matters transfers its rights to Naranjas Navel , a company which employs 9 members of staff and whose annual turnover is EUR  1 million. Naranjas Navel has never filed any patent applications with the EPO.  In a communication from the EPO under Rule 71(3) EPC dated 10 October 2024, the name of the applicants is given as: Isabe  III (clerical error) and Tomato Matters.  1. What has to be done to obtain a Unitary Patent as soon as possible for Isabel  II and Naranjas Navel? Is it possible to benefit from the compensation scheme?  Please list the necessary steps at minimum cost. You should identify the fees that have to be paid, but you do not need to specify their amounts.  2. Let us now suppose that the request for unitary effect has been refused. What is the time limit for lodging  an application to reverse this  decision, and to whom should the application be addressed?

In March 2018, a European patent application was filed in French. A European patent was granted in June 2023. Unitary effect has been registered and the proprietor has filed a statement concerning licences of right. The patent has also been validated in Sp ain and in Croatia; the European patent is still in force in these states.  The proprietor filed a request for limitation of the patent. The examining division has issued an interlocutory decision, indicating that the patent with amended claims and an amended description meets the requirements of the EPC. The mention of the limita tion will be published in the last European Patent Bulletin of 2024.  1. To maintain the existing patents, what translations must be filed, at which offices? 2. Do any fees have to be paid? Fee amounts need not be mentioned.
"""

## *get_context*

In [30]:
def get_context(query, k=5, KNOWLEDGE_VECTOR_DATABASE=KNOWLEDGE_VECTOR_DATABASE):
    """ 
    Retrieves relevant context for a given query.

    Parameters:
    query (str): The input query for which context is needed.
    k (int, optional): The number of relevant context elements to retrieve (default is 5).

    Returns:
    list: A list containing relevant context elements.
    """

    # Retrieve docs
    retrieved_docs = KNOWLEDGE_VECTOR_DATABASE.similarity_search(query=query, k=k)

    return retrieved_docs

## *generate_mcq*

In [31]:
import json
import re

def validate_json_format_mcq(llm_output, type):
    """
    Attempts to extract and validate a JSON structure from the LLM output.

    Parameters:
    llm_output (str): Raw output from the LLM.
    type (str): question or answer

    Returns:
    dict: A valid JSON object if found and correctly formatted, otherwise None.
    """

    if type == 'question':
        try:
            json_match = re.search(r'\{.*\}', llm_output, re.DOTALL)
            if json_match:
                cleaned_json = json.loads(json_match.group())
                if "question" in cleaned_json and "options" in cleaned_json:
                    return cleaned_json
        except json.JSONDecodeError:
            pass
        return None
    
    elif type == 'answer':
        try:
            json_match = re.search(r'\{.*\}', llm_output, re.DOTALL)
            if json_match:
                cleaned_json = json.loads(json_match.group())
                if "Answer" in cleaned_json and "Justification" in cleaned_json:
                    return cleaned_json
        except json.JSONDecodeError:
            pass
        return None


def call_formatting_llm_mcq(llm_output, type):
    """
    Calls an LLM specialized in formatting text into the correct JSON format.

    Parameters:
    llm_output (str): Raw output from the initial LLM.
    type (str): question or answer

    Returns:
    dict: A valid JSON object containing the question and options.
    """

    if type == 'question':
        SYSTEM_PROMPT = """You are an AI specialized in converting multiple-choice legal questions into JSON format.
        Ensure the output strictly follows this structure:
        ```json
        {"question": "...", "options": ["A ....", "B ...", "C ...", "D ..."]}
        """

    elif type == 'answer':
        SYSTEM_PROMPT = """You are an AI specialized in converting multiple-choice legal questions into JSON format.
        Ensure the output strictly follows this structure:
        ```json
        {"Answer": "...", "Justification": "..."}
        """

    user_prompt = f"""
    The following text needs to be formatted as a valid JSON multiple-choice question:
    {llm_output}
    
    Please convert it into the required JSON format.
    """

    response = chat(model=MODEL, messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_prompt},
    ])
    
    return validate_json_format_mcq(response['message']['content'], type)


def clean_generate_mcq_output(llm_output, type):
    """
    Cleans and extracts a valid JSON multiple-choice question from the LLM output.
    If the initial output is not valid JSON, a specialized LLM is called to correct it.

    Parameters:
    llm_output (str): Raw output from the LLM.

    Returns:
    dict: A properly formatted multiple-choice question.
    """
    result = validate_json_format_mcq(llm_output, type)
    if result:
        return result
    
    # If not valid, call formatting LLM
    formatted_result = call_formatting_llm_mcq(llm_output, type)
    if formatted_result:
        return formatted_result
    
    raise ValueError("Failed to convert LLM output into valid JSON format.")

In [32]:
def generate_mcq(questions):
    """
    Generates an MCQ question.

    Parameters:
    questions (str): String of validated mcq questions from one subcategory as exemple.

    Returns:
    question (dict): {'question': '...',
                      'options': ['A ....', 'B ...', ...]}
    """

    # Retrieve context
    retrieved_docs = get_context(questions, k=3)
    context = "\nExtracted documents:\n"
    context += "".join([f'Content: {doc.page_content} \nSource: {doc.metadata['ref']}\n\n' for i, doc in enumerate(retrieved_docs)])
    # context_sources = "".join([f'\nSource: {doc.metadata['ref']}, Url: {doc.metadata.get('url', 'N/A')}' for i, doc in enumerate(retrieved_docs)])


    # Build prompt
    SYSTEM_PROMPT = f"""
    You are an AI specialized in generating multiple-choice legal questions based on given legal texts.
    ### Instructions:
    - Generate a new legal multiple-choice question based on the provided context.
    - Ensure the question aligns with the style and complexity of the given examples.
    - Provide four answer options (A, B, C, D), with only one being correct.
    - Format the output strictly as a JSON object with the following structure:
        ```json
        {{'question': '...', 'options': ['A ....', 'B ...', 'C ...', 'D ...']}}
    """

    user_prompt = f"""
    ### Context:
    {context}

    ### Examples of Previous Questions:
    {questions}

    Generate a new question that follows the same format and is correct based on the context. Write it in a json.
    """

    # Initial attempt to get the answer
    attempt_count = 0
    max_attempts = 3  # Limit number of attempts to prevent infinite loops

    while attempt_count < max_attempts:
        question_mcq = chat(model=MODEL,
                            messages=[{"role":"system", "content":SYSTEM_PROMPT},
                                      {"role":"user","content":user_prompt}],
                            options = {"num_predict":MAX_OUTPUT_TOKENS}
                            )
        
        # Put question in correct json format
        try:
            cleaned_question_mcq = clean_generate_mcq_output(question_mcq['message']['content'], type='question')
            return cleaned_question_mcq  # If valid, return it
        except ValueError:
            attempt_count += 1  # Increment attempt count
            print(f"Attempt {attempt_count} failed. Retrying...")
    
    # If all attempts fail, raise an exception or return None
    raise ValueError("Failed to generate a valid MCQ after multiple attempts.")

## *generate_mcq_answer*

In [33]:
def generate_mcq_answer(question):
    """
    Generates an answer for a MCQ question.

    Parameters:
    question (str): The input question for which an answer is needed.

    Returns:
    answer (str): The generated response from the AI with the context used.
    """

    # Convert the question in string, in case the question is a json.
    question = str(question)

    # Retrieve context
    retrieved_docs = get_context(question, k=5)
    context = "\nExtracted documents:\n"
    context += "".join([f'Content: {doc.page_content} \nSource: {doc.metadata['ref']}\n\n' for i, doc in enumerate(retrieved_docs)])
    context_sources = "".join([f'\nSource: {doc.metadata['ref']}, Url: {doc.metadata.get('url', 'N/A')}' for i, doc in enumerate(retrieved_docs)])

    # Build prompt
    SYSTEM_PROMPT = f"""
    You are an AI specialized in answering legal multiple-choice questions based on provided legal texts.
    ### Instructions:
    - When given a multiple-choice legal question, provide the correct answer followed by an explanation.
    - Your answer should begin with the correct choice (e.g., "Answer A").
    - After that, explain why this choice is correct based on the provided legal context.
    - Then, explain why the other choices (B, C, D) are incorrect, using relevant legal reasoning from the context.
    - Use the legal context provided to back up your reasoning.
    - Make sure to clearly distinguish between the correct answer and the incorrect ones.
    """

    user_prompt = f"""
    ### Context:
    {context}

    ### Legal Question:
    {question}

    Answer the question by:
    1. Starting with the correct answer (e.g., "Answer A").
    2. Explaining why this choice is correct according to the provided legal text.
    3. Explaining why the other options (B, C, D) are incorrect based on the legal context.
    """

    # Initial attempt to get the answer
    attempt_count = 0
    max_attempts = 3  # Limit number of attempts to prevent infinite loops

    while attempt_count < max_attempts:
        answer_mcq = chat(model=MODEL,
                            messages=[{"role":"system", "content":SYSTEM_PROMPT},
                                      {"role":"user","content":user_prompt}],
                            options = {"num_predict":MAX_OUTPUT_TOKENS}
                            )

        # Put answer in correct json format
        try:
            cleaned_answer_mcq = clean_generate_mcq_output(answer_mcq['message']['content'], type='answer')
            # Add context to Justification
            cleaned_answer_mcq['Justification'] += f'\n\nSources:\n{context_sources}'
            return cleaned_answer_mcq  # If valid, return it
        except ValueError:
            attempt_count += 1  # Increment attempt count
            print(f"Attempt {attempt_count} failed. Retrying...")
    
    # If all attempts fail, raise an exception or return None
    raise ValueError("Failed to generate a valid MCQ after multiple attempts.")

## *generate_open*

In [34]:
def generate_open(questions):
    """
    Generates an Open question.

    Parameters:
    questions (str): String of validated questions from one subcategory as exemple.

    Returns:
    question (str): The new question.
    """
    

    # Retrieve context
    retrieved_docs = get_context(questions, k=5)
    context = "\nExtracted documents:\n"
    context += "".join([f'Content: {doc.page_content} \nSource: {doc.metadata['ref']}\n\n' for i, doc in enumerate(retrieved_docs)])
    # context_sources = "".join([f'\nSource: {doc.metadata['ref']}, Url: {doc.metadata.get('url', 'N/A')}' for i, doc in enumerate(retrieved_docs)])


    # Build prompt
    SYSTEM_PROMPT = f"""You are an AI designed to generate legal questions based on the provided legal context. Your task is to generate a **detailed legal scenario** followed by **three to five structured questions**, ensuring that all questions can be answered using the given legal texts.

    ### **Instructions:**
    1. **Replicate the Structure**:
    - Review the example questions provided and follow the same format.
    - Start with a **detailed contextualization** of the scenario (at least 3-5 sentences).
    - Follow the scenario with **three to five sub-questions** labeled (a), (b), (c), etc.

    2. **Ensure Answerability**:
    - The generated questions must be fully answerable using the provided legal texts.
    - Ensure that each question directly relates to legal principles or procedures found in the given legal extracts.

    3. **Maintain Complexity & Relevance**:
    - Use real-world legal situations and terminology.
    - Keep the complexity and depth similar to the example questions.

    ### **Example Question Format:**
    *"On 12 August 2022, a divisional European patent application EP-F3 is filed in Italian per fax by three joint applicants: A, B, and C. On 12 September 2022, a translation of EP-F3 in the language of the proceedings of its parent application is filed. EP-F3's parent application is EP-F2, which is a divisional application of EP-F1. EP-F3 comprises 1 page abstract, 40 pages description, and 2 pages with 13 claims. A is an Italian university. B is an Italian enterprise which employs 500 persons, and which has an annual turnover of EUR 40 million and an annual balance sheet total of EUR 40 million. C is an Italian national resident in the USA. On 4 October 2022, a noting of loss of rights is sent because no fees have been paid. A transfer of rights is planned for 19 December 2022: Applicant B will transfer its rights in respect of EP-F3 to applicant C.*

    **a.** What procedural steps must be taken for the transfer of rights to be recorded?  
    **b.** Under what circumstances is the filing in Italian valid? What steps need to be taken and what fees need to be paid to ensure that EP-F3 remains pending?  
    **c.** What needs to be done if the applicants want to pay the examination fee at the reduced rate provided for in Article 14(1) of the Rules relating to Fees?"  
    """

    user_prompt = f"""### **Example Questions:**
    {questions}

    ### **Legal Text Extracts:**
    {context}

    ### **Generate a New Question:**
    - Create a **detailed scenario** (3-5 sentences) based on the provided legal context.
    - Follow the scenario with **three to five structured sub-questions** (labeled a, b, c, etc.).
    - Ensure that each sub-question is **answerable using the provided legal texts** and **maintains the complexity of the example questions**.
    - The output should **only contain the generated question**, without additional explanations.
    """

    # Redact an answer
    question_open = chat(model=MODEL,
                            messages=[{"role":"system", "content":SYSTEM_PROMPT},
                                      {"role":"user","content":user_prompt}],
                            options = {"num_predict":MAX_OUTPUT_TOKENS}
                            )

    return question_open['message']['content']

## *generate_open_answer*

In [35]:
def generate_open_answer(question):
    """
    Generates an answer to a given open question.

    Parameters:
    question (str): The input question for which an answer is needed.

    Returns:
    answer (str): The generated response from the AI with the context used.
    """


    # Retrieve context
    retrieved_docs = get_context(question)
    context = "\nExtracted documents:\n"
    context += "".join([f'Content: {doc.page_content}\nSource: {doc.metadata['ref']}\n' for i, doc in enumerate(retrieved_docs)])
    context_sources = "".join([f'\nSource: {doc.metadata['ref']}, Url: {doc.metadata.get('url', 'N/A')}' for i, doc in enumerate(retrieved_docs)])


    # Build prompt
    SYSTEM_PROMPT = f"""You are an AI specialized in answering open-ended legal questions based on provided legal texts. Your task is to generate a detailed, accurate, and well-reasoned answer to the given question using the provided legal texts. Every answer must be supported by specific legal sources from the context provided.
    ### Instructions:
    1. **Answer Generation**:
    - Provide a clear, well-explained answer to the user's legal question.
    - The answer must strictly be based on the provided legal texts. Do not include any additional information not supported by the given texts.
    - For each part of the answer, explain how the relevant legal sources from the context support your reasoning.
    
    2. **Source Citation**:
    - After each point in the answer, cite the specific legal text(s) that were used to form that part of the answer.
    - Cite articles, sections, or specific clauses of the law, clearly linking them to the answer.
    
    3. **Explanation of Relevance**:
    - For each source used, provide a brief explanation of why that particular legal text is relevant to the question and how it supports the answer.
    
    4. **Validity**:
    - Your answer is only valid if it is directly supported by the legal texts provided in the context.

    5. **Legal Terminology**:
    - Use correct legal terminology and ensure clarity when referencing legal sources.

    ### Example Answer Flow:
    **Question**: "What conditions must be met for a contract to be voidable due to duress under the Civil Code?"

    **Answer**:
    - A contract may be voidable if one party was under duress, but this duress must be severe enough to impair the will of the affected party. According to Article 123 of the Civil Code, duress must be such that the affected party was left with no free choice in entering the contract.
    - **Source**: Article 123 of the Civil Code states: "A contract may be voidable if it was entered into under duress, provided that the duress was so severe that it compromised the free will of the affected party."
    - **Explanation of Relevance**: This article defines duress and explicitly ties the concept to the condition that it must be severe enough to affect free will. The wording "so severe" emphasizes that the severity of duress is a key factor in determining the validity of the contract.

    **Question**: "Can a contract be voidable due to lack of consent?"

    **Answer**:
    - Yes, under the Civil Code, a contract may be voidable if one party lacked the capacity to give consent. This includes situations where the individual was unable to understand the nature of the contract. Article 123 of the Civil Code outlines that contracts entered into by individuals lacking the legal capacity to understand the terms are voidable.
    - **Source**: Article 123 of the Civil Code states: "A contract is voidable when one party lacks the legal capacity to understand the terms of the agreement."
    - **Explanation of Relevance**: This article provides the legal basis for the voidability of a contract when consent is impaired due to the lack of understanding, which directly addresses the question about lack of consent.

    ### Example Legal Texts:
    - **Legal Text 1**: "A contract may be voidable if it was entered into under duress, provided that the duress was so severe that it compromised the free will of the affected party."
    - **Legal Text 2**: "A contract is voidable when one party lacks the legal capacity to understand the terms of the agreement, as specified in Article 123 of the Civil Code."
    """
    
    user_prompt = f"""### Legal Texts:
    {context}

    ### Question:
    {question}

    ### Answer:
    Please provide a detailed, accurate answer to the question. 
    1. Cite the relevant legal text(s) used in your answer.
    2. For each citation, explain why that source is relevant to the answer and how it supports your reasoning.
    3. Ensure your answer is strictly based on the legal texts provided. If the question cannot be answered using the available legal texts, state that explicitly.
    4. Use correct legal terminology and ensure clarity when referencing the sources.
    """

    # Redact an answer
    answer = chat(model=MODEL,
                            messages=[{"role":"system", "content":SYSTEM_PROMPT},
                                      {"role":"user","content":user_prompt}],
                            options = {"num_predict":MAX_OUTPUT_TOKENS}
                            )


    # Assemble answer and context_sources
    final_answer = f'{answer['message']['content']}\n\nSources:{context_sources}'

    return final_answer

## *generate_feedback*

In [36]:
def generate_feedback(ai_question, ai_answer, user_answer):
    """
    Generates an AI-generated feedback on the user_answer. 
    The ai_question and ai_answer were generated before. When we gave an ai_question to an user,
    we also take the ai_answer. So when the user answer, we can give all, ai_question, ai_answer (the correct one), 
    and user_answer to give a feedback to the user.

    Parameters:
    ai_question (str): The question generated by AI.
    ai_answer (str): The correct answer.
    user_answer (str): The user answer.

    Returns:
    feedback (str): The correct answer and the explaination why the user is wrong including the context.
    """

    # Convert the question in string, in case the question is a json.
    ai_question = str(ai_question)
    ai_answer = str(ai_answer)

    # Retrieve context
    ai_question_context = get_context(ai_question, k=3)
    ai_answer_context = get_context(ai_answer, k=3)
    user_answer_context = get_context(user_answer, k=3)
    # Combine all retrieved contexts
    all_contexts = ai_question_context + ai_answer_context + user_answer_context
    context = "\nExtracted documents:\n"
    context += "".join([f'Content: {doc.page_content} \nSource: {doc.metadata['ref']}\n\n' for i, doc in enumerate(all_contexts)])
    context_sources = "".join([f'\nSource: {doc.metadata['ref']}, Url: {doc.metadata.get('url', 'N/A')}' for i, doc in enumerate(all_contexts)])

    # Build prompt
    SYSTEM_PROMPT = f"""You are an AI designed to provide feedback on legal answers, both for multiple-choice questions (MCQs) and open-ended responses. When a user answers a question, your task is to explain whether their answer is correct or incorrect, using the legal context and specific sources to support your feedback.
    ### Instructions:
    - If the user answers a multiple-choice question (MCQ):
        1. Start by acknowledging the user's chosen answer.
        2. If the user's answer is correct, explain why it is correct using the legal context and cite relevant legal sources.
        3. If the user's answer is incorrect, explain why it is wrong, referencing specific legal articles or sections from the context.
        4. Provide the correct answer and back it up with legal reasoning from the context.
        5. If the user selected a partially correct answer, explain the distinction and provide clarification on what was missed.

    - If the user answers an open-ended question:
        1. Acknowledge the user's answer and assess its correctness.
        2. If the answer is correct, explain why it is correct using relevant legal context and sources.
        3. If the answer is incorrect or incomplete, explain where it went wrong, citing the legal context and relevant articles or sections.
        4. Provide the correct explanation and elaborate on any nuances or details the user might have missed.
        5. If the user is partially correct, explain what is correct and where they need to elaborate or correct their understanding.

    ### Example Response for an MCQ:
    User Answer: "Answer B."

    If the answer is wrong:
    - Start with: "Your answer, 'Answer B', is incorrect."
    - Explain the error: "According to Article 123 of the Civil Code, the correct interpretation is..."
    - Provide the correct answer: "The correct answer is 'Answer A' because..."
    - Cite specific legal sources: "As stated in Article 45 of the Civil Code, the situation described aligns with..."

    If the answer is partially correct:
    - Start with: "Your answer, 'Answer B', is partially correct."
    - Explain the partial correctness: "You correctly identified that the issue involves Article 123, but the application to the scenario is incomplete."
    - Clarify the distinction: "The key point is that Article 123 applies in a different context. Therefore, the correct answer is 'Answer A.'"

    ### Example Response for Open-Ended Questions:
    User Answer: "The law allows the contract to be voided if it was signed under duress."

    If the answer is wrong:
    - Start with: "Your answer is incorrect."
    - Explain the error: "While duress may lead to a contract being voidable, it is important to note that the law specifically requires that the duress must have been severe enough to affect the will of the person involved, as outlined in Article 123 of the Civil Code."
    - Provide the correct explanation: "The contract can only be voided if it meets the specific conditions outlined in Article 123, which states that..."

    If the answer is partially correct:
    - Start with: "Your answer is partially correct."
    - Explain the correct parts: "You are right that duress can impact the validity of a contract."
    - Clarify the missed details: "However, the law also specifies that the duress must have been significant enough to prevent free consent. Therefore, the correct interpretation includes this additional detail."
    """
    
    user_prompt = f"""### Context:
    {context}
    
    ### Correct Answer:
    {ai_answer}

    ### User's Answer:
    {user_answer}

    ### Legal Question:
    {ai_question}

    ### Instructions:
    - Provide feedback on the user's answer (both for multiple-choice and open-ended responses).
    - If it's an MCQ, explain why the answer is correct or incorrect, using legal context and citing relevant articles.
    - If it's an open-ended response, assess whether the answer is correct or not, and explain using the legal context and articles.
    - Provide the correct answer or explanation and back it up with legal sources from the context.
    """

    # Redact an answer
    feedback = chat(model=MODEL,
                            messages=[{"role":"system", "content":SYSTEM_PROMPT},
                                      {"role":"user","content":user_prompt}],
                            options = {"num_predict":MAX_OUTPUT_TOKENS}
                            )
                    

    # Assemble final answer
    final_answer = f'{feedback['message']['content']}\n\nContext:{context_sources}'

    return final_answer

## *chat_with_ai*

In [37]:
def chat_with_ai(conversation_history, user_message):
    """
    Based on the history of the conversation, initialy filled with question, user_answer, feedback.

    Parameters:
    conversation_history (str): Initialy the quesiton, user_answer, feedback. The history is filled with new messages.
    user_message (str): New message from user.

    Returns:
    answer (str): The answer for the user_message, base on the context from history.
    context_sources (str): The context used to answer with real link.
    """


    # Retrieve context
    conversation_history_context = get_context(conversation_history, k=5)
    user_message_context = get_context(user_message, k=3)
    # Combine all retrieved contexts
    all_contexts = conversation_history_context + user_message_context
    context = "\nExtracted documents:\n"
    context += "".join([f'Content: {doc.page_content} \nSource: {doc.metadata['ref']}\n\n' for i, doc in enumerate(all_contexts)])
    context_sources = "".join([f'\nSource: {doc.metadata['ref']}, Url: {doc.metadata.get('url', 'N/A')}' for i, doc in enumerate(all_contexts)])

    # Build prompt
    SYSTEM_PROMPT = f"""You are an AI specialized in helping users understand legal concepts and answer legal questions. The conversation history and legal texts provided are your sources for generating responses. Your role is to engage in an ongoing conversation with the user, answering their questions, explaining legal concepts, and clarifying misunderstandings based on the legal context provided.

    ### Instructions:
    1. **Conversation History**: Refer to the conversation history as context for understanding the user's current question or doubt. Always base your responses on the conversation history and legal texts provided.
    2. **Legal Context**: Use the legal context (texts, articles, or sections) provided to answer questions or clarify points. If needed, quote specific legal articles or reference them when explaining a concept.
    3. **Discussion Flow**: Engage with the user in a conversational style. If they ask why their answer isn't correct, provide a detailed explanation using the legal context and reasoning.
    4. **Interactive Exploration**: Encourage the user to ask follow-up questions or seek further clarification about specific parts of the legal text. Offer suggestions to explore the legal texts together and make sure to reference specific articles when necessary.
    5. **Supportive Dialogue**: If the user's understanding of a legal concept or their answer is incorrect, explain where they went wrong and guide them towards the correct interpretation of the law. Use the legal context to back up your explanation.

    ### Example Flow:
    User: "I think the contract is voidable due to duress. Isn't that right?"
    AI: "Let's take a look at the legal context. According to Article 123 of the Civil Code, a contract may be voidable if one party was under severe duress. However, for duress to be a valid reason to void the contract, it must meet specific criteria. Let me walk you through the exact conditions outlined in the law."

    User: "But the text just mentions duress, doesn't it?"
    AI: "Yes, the term 'duress' is mentioned, but it's crucial to understand that the law specifies the severity of duress required. For instance, Article 123 requires the duress to be 'so severe that it compromises the freedom of choice of the person involved.' This distinction is important. Let's dive deeper into what 'severe' means under the law."
    """
    
    user_prompt = f"""### Conversation History:
    {conversation_history}

    ### Legal Context:
    {context}

    ### User's Question:
    {user_message}

    ### Instructions:
    - Provide a conversational response based on the conversation history and the legal context.
    - If the user has a misunderstanding or an incorrect answer, explain why it is wrong using the relevant legal text and guide them to the correct understanding.
    - Encourage the user to ask more questions if they need further clarification on specific legal points or sections.
    - Reference legal articles and sections as needed to back up your explanation.
    - Keep the conversation open and interactive, so the user feels comfortable discussing and exploring the legal concepts.
    """

    # Redact an answer
    answer = chat(model=MODEL,
                            messages=[{"role":"system", "content":SYSTEM_PROMPT},
                                      {"role":"user","content":user_prompt}],
                            options = {"num_predict":MAX_OUTPUT_TOKENS}
                            )

    # Assemble answer
    final_answer = f'{answer['message']['content']}\n\nContext:\n{context_sources}'

    return final_answer

# Exemple of an exchange: 

## MCQ

In [38]:
# Initialize history
history_mcq = ''

In [39]:
# Generate a new question
question_mcq = generate_mcq(questions_mcq_from_bdd)
history_mcq += f'Question:\n{question_mcq}'
print(question_mcq)

{'question': 'An applicant has filed a European patent application on 15 March 2023, but failed to pay the examination fee within the given time frame after receiving a communication from the EPO. The applicant wishes to proceed with further processing. According to the guidelines provided, what is the latest point in time for filing a request for further processing and which actions should be taken?', 'options': ['A. File the request by 15 September 2023 and pay the further processing fee, as per Rule 70(1).', 'B. File the request by 15 December 2023 and pay the further processing fee, as per Rule 70(1).', 'C. File the request by 15 March 2024 and pay the further processing fee, as per Rule 70(1).', 'D. None of the above statements']}


In [40]:
# Generate ai_answer (the correct one)
correct_answer_mcq = generate_mcq_answer(question_mcq)
history_mcq += f'\n\nCorrect answer:\n{question_mcq}'
print(correct_answer_mcq)

{'Answer': 'A', 'Justification': 'Option A is correct because it falls within a reasonable timeframe for responding to an EPO communication after receiving it. The applicant should aim to file their request promptly to avoid losing rights, as per Rule 39(1) and the guidelines provided in E-IX, 2.1. Options B, C, and D are incorrect based on the specified deadlines.\n\nSources:\n\nSource: Guidelines for Examination in the EPO, E-VIII, 2, Url: https://www.epo.org/en/legal/guidelines-epc/2024/e_viii_2.html\nSource: Guidelines for Examination in the EPO, C-II, 1, Url: https://www.epo.org/en/legal/guidelines-epc/2024/c_ii_1.html\nSource: Guidelines for Examination in the EPO, A-III, 15, Url: https://www.epo.org/en/legal/guidelines-epc/2024/a_iii_15.html\nSource: Guidelines for Examination in the EPO, A-III, 11.2.5, Url: https://www.epo.org/en/legal/guidelines-epc/2024/a_iii_11_2_5.html\nSource: Guidelines for Examination in the EPO, A-VI, 2.5, Url: https://www.epo.org/en/legal/guidelines-ep

In [41]:
# The user answer
# I used an answer from another question.
user_answer_mcq = """Answer D
Legal basis Rule 126(2) EPC The document was delivered to the addressee nine days after the date it bears, so the period expires later by the number of days by which the seven days were exceeded, i.e. 12 January 2025 + two  days. E vidence of late receipt needs to be filed with the response.
"""
history_mcq += f'\n\nUser Answer:\n{user_answer_mcq}'

In [42]:
# Generat AI feedback
feed_back_mcq = generate_feedback(question_mcq, correct_answer_mcq, user_answer_mcq)
history_mcq += f'\n\nFeedback:\n{feed_back_mcq}'
print(feed_back_mcq)

### Feedback on User’s Answer:

The user's answer (Option D) is incorrect. According to Rule 70(1) EPC, a request for further processing must be filed within nine months of the date on which the applicant received the communication from the EPO regarding failure to pay the examination fee.

Here's an explanation with legal context and relevant sources:

### Explanation:
The correct answer is **Option A**. The user's selection (Option D) is incorrect because it does not adhere to the deadline specified by Rule 70(1) EPC, which requires that a request for further processing be filed within nine months of receiving the communication from the EPO.

- According to Rule 70(1) EPC: "If the applicant has failed to pay an amount due under Articles 96 or 98, 99 or 104, 135, 233, 234, 235, 237, 242, 283 or Rule 71(1), the EPO shall communicate that fact to him and inform him that he has nine months from receipt of this communication in which to pay the amount due."

Given the user's scenario:
- T

In [None]:
# Open chat
user_message_mcq = "I don't really understand why it's not the answer D."
chat_answer_mcq = chat_with_ai(history_mcq, user_message_mcq)
history_mcq += f'\n\nUser message:\n{user_message_mcq}'
history_mcq += f'\n\nChat answer:\n{chat_answer_mcq}'
print(chat_answer_mcq)

Hey there! It sounds like you're having some trouble understanding why option A isn't correct. Let's break it down step by step using the information from the Guidelines for Examination in the EPO and Case Law.

First, let’s look at the section on ambiguity (F-III, 11) where we discuss sufficiency of disclosure. The key point here is that an ambiguity in a claim can lead to different types of objections under Art. 83 and Art. 84 of the EPO Rules of Procedure.

### Ambiguity and Insufficiency Under Art. 83
- **Art. 83 (Sufficiency of disclosure)**: The claims must be sufficiently disclosed so that a person skilled in the art can carry out the invention without undue burden.
- **Insufficiency**: If an ambiguity makes it impossible to carry out the entire scope of the claim, then Art. 83 might apply.

### Ambiguity and Scope Under Art. 84
- **Art. 84 (Clarity)**: The claims must be clear and concise.
- **Ambiguity in Claims**: If an ambiguity does not affect the whole scope of the claim o

## Open

In [44]:
# Initialize history
history_open = ''

In [45]:
# Generate a new question
question_open = generate_open(questions_open_from_bdd)
history_open += f'Question:\n{question_open}'
print(question_open)

 Application XYZ was filed on March 31, 2024, in a non-official EPO language (Chinese). The applicant has not yet submitted the required translation of the application within two months from the filing date. 

a) According to Rule 6(1), what is the time limit for submitting the translation?

b) If the applicant fails to submit the translation by the deadline, under which provision will the application be deemed withdrawn?

c) What steps must the applicant take if they receive an invitation from the EPO under Rule 58 to rectify this deficiency?


In [46]:
# Generate ai_answer (the correct one)
correct_answer_open = generate_open_answer(question_open)
history_open += f'\n\nCorrect answer:\n{correct_answer_open}'
print(correct_answer_open)

 

**Answer:**

a) According to Rule 40(3), the time limit for submitting the translation is **two months from the filing date** of the application. The relevant legal text excerpt states:

"Rule 40(3): If the previously filed application is not in an official EPO language, the applicant must also file a translation into one such language within two months of the filing date (Rule 40(3))."

This rule directly addresses the situation where an application is initially filed in a non-official EPO language and mandates that a translation be submitted within this specified time frame.

b) If the applicant fails to submit the translation by the deadline, the application will be deemed withdrawn under **Article 14(2)**. This can be seen from the following relevant legal text excerpt:

"Rule 58: Where the translation is not filed in due time, the EPO will invite the applicant to rectify this deficiency within a non-extendable period of two months (see Art. 14(2))."

The failure to file the req

In [47]:
# The user answer
# I take an answer from another question. Not correlated at all.
user_answer_open = """Q1 The applicant has to request entry into the regional phase before the EPO either by using Form 1200 or in a separate letter giving all the information required by Form 1200. The applicant has to: select the box for early processing on Form 1200 pay the filing fees and search fees PCT-C was filed in Chinese, searched by the CNIPA as International Searching Authority and, in accordance with Rule 48.3(a) PCT, published in Chinese. Since the Euro-PCT application was published in 'another language', Article 153(4) EPC requires that a translation into German, English or French be filed. In accordance with Rule 159(1)(a) EPC, this translation has to be filed upon entry into the European phase, i.e. today. submit a translation of the amended claims filed under Article 34 PCT pay claims fees for 4 claims and the examination and designation fees The applicant has to specify the application documents on which the European grant procedure is to be based (Rule 159(1)(b) EPC). The applicant must file a valid request for examination, which includes paying the examination fee (see point 15 of the notice from the EPO concerning the request for early processing, OJ EPO 2013, 156). The applicant should waive the right to be asked under Rule 70(2) EPC whether it wishes to proceed further (see point 16 of the notice from the EPO), and waive the right to receive the communication under Rule 161(2) EPC, (see OJ EPO 2011, 354)."""
history_open += f'\n\nUser Answer:\n{user_answer_open}'

In [48]:
# Generat AI feedback
feed_back_open = generate_feedback(question_open, correct_answer_open, user_answer_open)
history_open += f'\n\nFeedback:\n{feed_back_open}'
print(feed_back_open)

### Feedback on the User’s Answer:

#### Multiple-Choice Question:
The user's answer for the multiple-choice question is mostly correct. The applicant needs to file a translation of the application in German, English, or French within two months from entry into the European phase (Rule 159(1)(a) EPC). However, there are some minor inaccuracies and unnecessary details.

#### Open-Ended Questions:
The user’s answer for the open-ended questions contains several correct elements but lacks detail and clarity. Here is a more structured response with explanations:

### Q1: Required Steps if PCT-C was filed in Chinese
- **Correct**: The applicant must submit a translation of the application into German, English, or French.
- **Correct**: This submission must be done upon entry into the European phase (Rule 159(1)(a) EPC).
- **Incorrect**: Paying claims fees and examination fees is not directly related to submitting the required translation for an initial non-official language filing. These are

In [49]:
# Open chat
user_message_open = "Explain me with more details and references."
chat_answer_open = chat_with_ai(history_open, user_message_open)
history_open += f'\n\nUser message:\n{user_message_open}'
history_open += f'\n\nChat answer:\n{chat_answer_open}'
print(chat_answer_open)

 It sounds like you're interested in understanding how claims involving "use" and defining a physical entity by reference to another entity are handled during examination at the European Patent Office (EPO). Let's break this down step-by-step.

### Use Claims

When it comes to use claims, such as "the use of substance X as an insecticide," these are essentially viewed as equivalent to process claims. Specifically, a use claim like this is interpreted as a process of killing insects using the substance X, not necessarily the substance itself in its final form (e.g., with additional additives). 

For example:
- **Claim:** "The use of a transistor in an amplifying circuit."
  - This is equivalent to: "A process of amplifying using a circuit containing the transistor."

It's important to note that these claims are not interpreted as referring to the specific setup or the process of building such a circuit. The key here is understanding what the claim is really about.

### Defining Physical

In [50]:
# Open chat
user_message_open = "Give me chocolate cake recipies."
chat_answer_open = chat_with_ai(history_open, user_message_open)
history_open += f'\n\nUser message:\n{user_message_open}'
history_open += f'\n\nChat answer:\n{chat_answer_open}'
print(chat_answer_open)

 It seems like you might have mixed up your requests! You asked for a chocolate cake recipe but we've been discussing legal guidelines related to patent applications. If you're interested in learning more about how inventions are protected or any other legal topic, feel free to ask! For now, let's talk about the chocolate cake recipe you were looking for.

Here’s a simple and delicious chocolate cake recipe that you can try:

### Ingredients:
- 125g softened butter
- 175g caster sugar
- 4 large eggs
- 170g self-raising flour
- 50g cocoa powder
- 1.5 tsp baking powder
- ½ tsp salt
- 2 tbsp milk
- A pinch of cream of tartar (optional, but it helps with a better texture)

### Method:
1. **Preheat** the oven to 180°C and grease and line a 23cm round cake tin.
2. In a large bowl, **cream together** the softened butter and sugar until light and fluffy.
3. **Beat in** the eggs one at a time, ensuring each is fully incorporated before adding the next.
4. Sift in the flour, cocoa powder, baking