<img src="https://raw.githubusercontent.com/Hilma-Corporation/logo/main/logo_transparent_background.png" alt="Description de l'image" style="width: 100%; max-width: 600px; height: auto;" />


**Notice:**

This code is the property of Hilma Corporation. Reproduction, distribution, or use of this code without the explicit consent of Hilma Corporation is strictly prohibited.


# Introduction

This Jupyter Notebook demonstrates how to generate frequently asked questions (FAQs) and their corresponding answers using the OpenAI API. The process involves generating questions based on given themes, querying the OpenAI API for answers, and saving the results in structured files for further use.

## Steps Involved

1. **Load Environment Variables**
   - Load the API key from a `.env` file to interact with the OpenAI API.

2. **Define Necessary Functions**
   - Define functions to query the OpenAI API and generate questions based on given themes.
   - Define functions to create training and validation files.

3. **Set Themes and Generate Questions**
   - Define the themes for which FAQs are to be generated.
   - Query the OpenAI API to generate 50 questions per theme.
   - Save the generated questions into an `instructions.json` file.

4. **Load Instructions and Generate Q&A Pairs**
   - Load the instructions from the `instructions.json` file.
   - Query the OpenAI API to generate answers for each question and follow-up questions.
   - Save the results in structured training and validation files.

## Code Implementation

The implementation involves four main parts:

1. **Loading Environment Variables**: This step involves loading the API key from the `.env` file to ensure secure access to the OpenAI API.

2. **Defining Necessary Functions**: Functions are defined to query the OpenAI API to generate questions and responses. Another function is used to create training and validation files.

3. **Setting Themes and Generating Questions**: Themes are defined, and the OpenAI API is queried to generate 50 questions for each theme. The questions are then saved to an `instructions.json` file.

4. **Loading Instructions and Generating Q&A Pairs**: The instructions are loaded from the `instructions.json` file, and the OpenAI API is used to generate answers and follow-up questions for each instruction. The results are saved in structured training and validation files.


### Part 1

The provided Python code performs the following tasks:

1. **Imports necessary modules**:
   - `os`: This module provides a way to interact with the operating system, especially for accessing environment variables.
   - `dotenv`: This module is used to load environment variables from a `.env` file.

2. **Loads environment variables**:
   - `load_dotenv()`: This function loads the environment variables defined in the `.env` file into the environment.

3. **Retrieves the API key**:
   - `os.getenv('OPENAI_API_KEY')`: This function fetches the value of the `OPENAI_API_KEY` environment variable and stores it in the `api_key` variable.

4. **Validates the API key**:
   - The code checks if `api_key` is `None` or empty. If it is, it raises a `ValueError` with a message indicating that the API key was not found and instructs the user to set the `OPENAI_API_KEY` in the `.env` file.


In [1]:
import os
from dotenv import load_dotenv

# Charger les variables d'environnement depuis le fichier .env
load_dotenv()

# Récupérer la clé API depuis la variable d'environnement
api_key = os.getenv('OPENAI_API_KEY')

if not api_key:
    raise ValueError("API key not found. Please set the OPENAI_API_KEY in the .env file.")


### Part 2


The provided Python code accomplishes the following tasks:

1. **Imports necessary modules**:
   - `json`: This module is used to work with JSON data, enabling encoding and decoding of JSON.
   - `requests`: This module allows sending HTTP requests, which is used to interact with the OpenAI API.

2. **Defines a function to query OpenAI API for questions**:
   - `query_openai_for_questions(api_key, topic, model='gpt-4', temperature=0.7, max_tokens=1500)`: This function generates 50 frequently asked questions about a given topic using the OpenAI API. 
   - **Parameters**:
     - `api_key`: The API key to access the OpenAI API.
     - `topic`: The topic for which questions are to be generated.
     - `model`: The model to use for generation (default is "gpt-4").
     - `temperature`: Controls the creativity of the model (default is 0.7).
     - `max_tokens`: The maximum number of tokens in the generated text (default is 1500).
   - **Process**:
     - Creates a prompt specifying the requirements for the questions.
     - Sets up headers including the authorization using the provided API key.
     - Sends a POST request to the OpenAI API with the prompt and additional parameters.
     - Checks the response status and extracts the generated questions from the response.
     - Parses the JSON response to get the list of questions.

3. **Defines a function to generate questions for multiple themes**:
   - `generate_questions_for_themes(api_key, themes)`: This function iterates over a list of themes, generates questions for each theme using the previously defined function, and collects all questions.
   - **Parameters**:
     - `api_key`: The API key to access the OpenAI API.
     - `themes`: A list of themes for which questions are to be generated.
   - **Process**:
     - Initializes an empty list to store the questions.
     - Iterates over each theme, prints a message indicating the current theme, generates questions using `query_openai_for_questions()`, and extends the list with the new questions.
     - Returns the collected list of questions.


In [2]:
import json
import requests

def query_openai_for_questions(api_key, topic, model='gpt-4', temperature=0.7, max_tokens=1500):
    """
    Generates 50 frequently asked questions about a given topic using the OpenAI API.

    Parameters:
    api_key (str): The API key for accessing the OpenAI API.
    topic (str): The topic for which to generate the questions.
    model (str): The model to use for generation (default is "gpt-4").
    temperature (float): Sampling temperature to control the creativity of the model (default is 0.7).
    max_tokens (int): The maximum number of tokens in the generated text (default is 1500).

    Returns:
    list: Generated questions by the OpenAI API.
    """
    prompt_content = f"""
    Please list in JSON format 50 frequently asked questions about {topic} from all levels of users. The questions should start with any of the following: “Where do I", "Is it okay to", "Can you help me", "I need to", "Is there a", "Do you know", "Where is the", "Can you tell me", "Can I change", "What are the", "How do I", "When is it", "Does {topic} have", "How to", "What is the difference", "Can users", "Can I", "What is”. You do not need to provide an answer or category to each question. The list should be a single dimension array of only questions.
    """

    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }

    data = {
        "model": model,
        "messages": [
            {"role": "user", "content": prompt_content}
        ],
        "temperature": temperature,
        "max_tokens": max_tokens
    }

    response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, data=json.dumps(data))
    response.raise_for_status()
    generated_questions = response.json()["choices"][0]["message"]["content"].strip()
    
    return json.loads(generated_questions)

def generate_questions_for_themes(api_key, themes):
    instructions = []
    for theme in themes:
        print(f"Generating questions for theme: {theme}")
        questions = query_openai_for_questions(api_key, theme)
        instructions.extend(questions)
    return instructions


### Part 3

The provided Python code does the following:

1. **Defines a list of themes**:
   - A list named `themes` is created, containing various topics related to finance, insurance, and real estate. Each item in the list is a string representing a specific theme.

2. **Generates questions for each theme**:
   - The function `generate_questions_for_themes(api_key, themes)` is called with the API key and the list of themes as arguments. This function generates questions for each theme and returns a list of all the questions.

3. **Saves the generated questions to a JSON file**:
   - The generated questions are saved to a JSON file named `instructions.json`.
   - The `open` function is used to create and open the file in write mode with UTF-8 encoding.
   - `json.dump` is used to write the list of questions to the file in JSON format, with pretty printing (indentation) for readability.
   - A confirmation message is printed to indicate that the questions were successfully saved to the file.


In [3]:
# Définir les thèmes
themes = [
    "Risk Management",
    "Portfolio Management",
    "Financial Modeling",
    "Insurance Underwriting",
    "Real Estate Investment",
    "Financial Derivatives",
    "Corporate Finance",
    "Investment Banking",
    "Personal Finance",
    "Behavioral Finance",
    "Insurance Claims Management",
    "Real Estate Appraisal",
    "Market Analysis",
    "Asset Management",
    "Credit Risk",
    "Financial Planning",
    "Insurance Law",
    "Mortgage Finance",
    "Capital Markets",
    "Insurance Fraud Detection",
    "Property Management",
    "Financial Regulation",
    "Hedge Funds",
    "Real Estate Development",
    "Wealth Management",
    "Insurance Product Development",
    "Financial Technology (FinTech)",
    "Real Estate Finance",
    "Actuarial Science",
    "Real Estate Economics"
]


# Générer les questions pour chaque thème
instructions = generate_questions_for_themes(api_key, themes)

# Sauvegarder les instructions dans un fichier JSON
instructions_file = "instructions.json"
with open(instructions_file, 'w', encoding='utf-8') as f:
    json.dump(instructions, f, ensure_ascii=False, indent=4)

print(f"Questions successfully saved to {instructions_file}")


Generating questions for theme: Risk Management
Generating questions for theme: Portfolio Management
Generating questions for theme: Financial Modeling
Generating questions for theme: Insurance Underwriting
Generating questions for theme: Real Estate Investment
Generating questions for theme: Financial Derivatives
Generating questions for theme: Corporate Finance
Generating questions for theme: Investment Banking
Generating questions for theme: Personal Finance
Generating questions for theme: Behavioral Finance
Generating questions for theme: Insurance Claims Management
Generating questions for theme: Real Estate Appraisal
Generating questions for theme: Market Analysis
Generating questions for theme: Asset Management
Generating questions for theme: Credit Risk
Generating questions for theme: Financial Planning
Generating questions for theme: Insurance Law
Generating questions for theme: Mortgage Finance
Generating questions for theme: Capital Markets
Generating questions for theme: In

### Part 4

The provided Python code performs several tasks related to generating and managing training and validation data for a machine learning model using the OpenAI API. Here's a detailed breakdown:

1. **Imports necessary modules**:
   - `json`: This module is used to work with JSON data.
   - `requests`: This module allows sending HTTP requests, used here to interact with the OpenAI API.
   - `pathlib.Path`: This module provides an object-oriented interface to the filesystem paths.

2. **Defines a function to query the OpenAI API**:
   - `query_openai(api_key, prompt, model='gpt-4', temperature=0.7, max_tokens=150)`: This function sends a prompt to the OpenAI API and retrieves a generated response and a follow-up question.
   - **Parameters**:
     - `api_key`: The API key to access the OpenAI API.
     - `prompt`: The initial prompt to be sent to the API.
     - `model`: The model to use for generation (default is "gpt-4").
     - `temperature`: Controls the creativity of the model (default is 0.7).
     - `max_tokens`: The maximum number of tokens in the generated text (default is 150).
   - **Process**:
     - Sends a POST request to the OpenAI API with the prompt and other parameters.
     - Extracts the generated response from the API's reply.
     - Constructs a follow-up prompt based on the initial response and sends another request to get a follow-up question.
     - Returns both the initial response and the follow-up question.

3. **Defines a function to create a validation file**:
   - `create_validation_file(train_file, valid_file, split_ratio)`: This function splits the data in the training file into training and validation sets based on a specified ratio.
   - **Parameters**:
     - `train_file`: The file containing the training data.
     - `valid_file`: The file to store the validation data.
     - `split_ratio`: The ratio of the total data to be used for validation.
   - **Process**:
     - Reads all lines from the training file.
     - Splits the lines into training and validation sets according to the split ratio.
     - Writes the split data back to their respective files.

4. **Defines a function to generate QA pairs**:
   - `generate_qa_pairs(instructions, train_file)`: This function processes each instruction to generate a QA pair using the OpenAI API and appends the results to a training file.
   - **Parameters**:
     - `instructions`: A list of instructions for which to generate QA pairs.
     - `train_file`: The file to store the generated QA pairs.
   - **Process**:
     - Iterates over each instruction.
     - For each instruction, retrieves the response and follow-up question using the `query_openai` function.
     - Writes the formatted result to the training file.

5. **Main script execution**:
   - Loads the list of instructions from a JSON file named `instructions.json`.
   - Calls `generate_qa_pairs` to generate QA pairs and store them in a file named `train.jsonl`.
   - Calls `create_validation_file` to split the `train.jsonl` file into training and validation files based on the specified split ratio.
   - Prints a completion message.


In [4]:
import json
import requests
from pathlib import Path

def query_openai(api_key, prompt, model='gpt-4', temperature=0.7, max_tokens=150):
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }

    data = {
        "model": model,
        "messages": [
            {"role": "user", "content": prompt}
        ],
        "temperature": temperature,
        "max_tokens": max_tokens
    }

    response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, data=json.dumps(data))
    response.raise_for_status()
    generated_response = response.json()["choices"][0]["message"]["content"].strip()

    followup_prompt = generated_response + "\nWhat is a likely follow-up question or request? Return just the text of one question or request."
    data["messages"] = [{"role": "user", "content": followup_prompt}]
    followup_response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, data=json.dumps(data))
    followup_response.raise_for_status()
    generated_followup = followup_response.json()["choices"][0]["message"]["content"].strip().replace("\"", "")

    return generated_response, generated_followup

def create_validation_file(train_file, valid_file, split_ratio):
    with open(train_file, 'r') as file:
        lines = file.readlines()
    valid_lines = lines[:int(len(lines) * split_ratio)]
    train_lines = lines[int(len(lines) * split_ratio):]
    with open(train_file, 'w') as file:
        file.writelines(train_lines)
    with open(valid_file, 'w') as file:
        file.writelines(valid_lines)

def generate_qa_pairs(instructions, train_file):
    for i, instruction in enumerate(instructions, start=1):
        print(f"Processing ({i}/{len(instructions)}): {instruction}")
        answer, followup_question = query_openai(api_key, instruction)
        result = json.dumps({
            'text': f'<s>[INST] {instruction}[/INST] {answer}</s>[INST]{followup_question}[/INST]'
        }) + "\n"
        with open(train_file, 'a') as file:
            file.write(result)

# Charger le fichier JSON des questions
train_file = "train.jsonl"
valid_file = "valid.jsonl"
split_ratio = 0.2

# Charger les instructions depuis le fichier
with open("instructions.json", 'r') as file:
    instructions = json.load(file)

# Générer les questions et réponses
generate_qa_pairs(instructions, train_file)

# Créer le fichier de validation
create_validation_file(train_file, valid_file, split_ratio)

print("Done! Training and validation JSONL files created.")


Processing (1/1514): Where do I start with risk management?
Processing (2/1514): Is it okay to ignore minor risks in risk management?
Processing (3/1514): Can you help me understand the difference between risk assessment and risk management?
Processing (4/1514): I need to create a risk management plan. Where do I start?
Processing (5/1514): Is there a standard risk management process that I can follow?
Processing (6/1514): Do you know any good risk management techniques?
Processing (7/1514): Where is the line drawn between acceptable and unacceptable risks?
Processing (8/1514): Can you tell me more about the role of risk management in a business?
Processing (9/1514): Can I change the risk management strategy midway through a project?
Processing (10/1514): What are the key elements of a risk management plan?
Processing (11/1514): How do I assess risks in my business operations?
Processing (12/1514): When is it appropriate to take a risk in business?
Processing (13/1514): Does Risk Manag