<a href="https://colab.research.google.com/github/AlbertoB12/AIMeditationCoach/blob/main/Dataset_creation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install colab-xterm
!pip install ollama
!pip install openai
!apt-get install -y pciutils
!apt-get install -y lshw
import json, torch, psutil, ollama
import numpy as np
from google.colab import runtime
from openai import OpenAI
from IPython.display import Javascript
%load_ext colabxterm
!curl -fsSL https://ollama.com/install.sh | bash
!nohup ollama serve &
!ollama pull llama3.1:8b

client = OpenAI(
  base_url="http://localhost:11434/v1",
  api_key="ollama",
)

# Detect device and use it (GPU or CPU)
if torch.cuda.is_available():
  print("Using GPU")
  device = torch.cuda.device(0)
else:
  print("Using CPU")
  device = torch.device("cpu")

with device:
  input_file_path = ""  # Base dataset
  output_file_path = ""  # Path for the new dataset
  print("Program to create a dataset for fine-tuning LLM")

  # Critical RAM threshold (e.g., stop processing if available RAM < 2GB)
  CRITICAL_RAM_THRESHOLD = 2 * 1024 * 1024 * 1024  # 2GB

  print("Creating empty dataset")
  dataset = []
  with open(input_file_path, "r", encoding="UTF-8") as file:
    json_objects = json.load(file)

  # Print the total number of examples left to generate
  print("Total number of paragraphs: %s" % len(json_objects))
  print("Starting the generation of examples for the dataset")
  model = 'llama3.1:8b'
  round = 0

  def check_memory():
    """Check available RAM and return whether processing should stop."""
    available_ram = psutil.virtual_memory().available
    return available_ram < CRITICAL_RAM_THRESHOLD

  while json_objects:
    try:
      print("Generating example number %s" % (round+1))
      json_object = json_objects.pop(0)  # Get and remove the first paragraph from the list
      # Initialize empty dictionary
      dictionary_responses = {}
      # Get suggested techniques
      techniques = json_object['suggested_techniques'].split(',')
      techniques = [technique.strip() for technique in techniques]
      # Generate outputs
      instruction_output1 = client.chat.completions.create(
        model=model,
        messages=[
          {
            "role": "user",
            "content": f"""
      You are a meditation coach specialized in guiding users through meditation sessions tailored to their emotional states. Your goal is to provide calming, personalized instructions for a variety of emotions, helping users manage feelings like stress, happiness, worry, or anxiety.

      Based on the prompt given, generate an introduction to the meditation session. The introduction should set a calming tone, establish a connection with the user, and prepare them for the techniques that follow. Incorporate pauses clearly marked in seconds (e.g., [5]), and take the user's intended outcome into account. The pauses won't be read, don't say 'to the count of [4] seconds' or similars. The tone should be supportive, calming, and focused on helping the user feel at ease.

      Provide only the meditation text as raw output without any titles, formatting, or explanations. The text will continue later, don't end it with sentences like 'as we approach the end of the session' or similars.

      User input: '{json_object}'."""
          }
        ]
      ).choices[0].message.content

      instruction_output2 = client.chat.completions.create(
        model=model,
        messages=[
          {
            "role": "user",
            "content": f"""
      You are a meditation coach specialized in guiding users through meditation sessions tailored to their emotional states. Your goal is to provide calming, personalized instructions for a variety of emotions, helping users manage feelings like stress, happiness, worry, or anxiety.

      Based on the user's prompt, generate the next part of the meditation session using the first suggested technique ({techniques[0]}). Ensure this segment maintains a consistent flow and tone, aligns with the intended session duration, and helps achieve the user's intended outcome. Incorporate pauses clearly marked in seconds (e.g., [5]). The pauses won't be read, don't say 'to the count of [4] seconds' or similars.

      Provide only the meditation text as raw output, appending to the previous script seamlessly. Do not include any titles, formatting, or explanations. The text will continue later, don't end it with sentences like 'as we approach the end of the session' or similars.

      User input: '{json_object}'.
      Generated script so far: '{instruction_output1}'."""
          }
        ]
      ).choices[0].message.content

      instruction_output3 = client.chat.completions.create(
        model=model,
        messages=[
          {
            "role": "user",
            "content": f"""
      You are a meditation coach specialized in guiding users through meditation sessions tailored to their emotional states. Your goal is to provide calming, personalized instructions for a variety of emotions, helping users manage feelings like stress, happiness, worry, or anxiety. Offer grounding techniques, mindfulness practices, breathing exercises, and affirmations to create a comforting and positive environment.

      Based on the user's emotional context and experience level, generate the next part of the meditation session using the second suggested technique ({techniques[1]}). Ensure this segment maintains a consistent flow and tone, aligns with the intended session duration, and helps achieve the user's intended outcome. Incorporate pauses clearly marked in seconds (e.g., [5]). The pauses won't be read, don't say 'to the count of [4] seconds' or similars.

      Provide only the meditation text as raw output, appending to the previous script seamlessly. Do not include any titles, formatting, or explanations. The text will continue later, don't end it with sentences like 'as we approach the end of the session' or similars.

      User input: '{json_object}'.
      Generated script so far: '{instruction_output1} {instruction_output2}'."""
          }
        ]
      ).choices[0].message.content

      instruction_output4 = client.chat.completions.create(
        model=model,
        messages=[
          {
            "role": "user",
            "content": f"""
      You are a meditation coach specialized in guiding users through meditation sessions tailored to their emotional states. Your goal is to provide calming, personalized instructions for a variety of emotions, helping users manage feelings like stress, happiness, worry, or anxiety. Offer grounding techniques, mindfulness practices, breathing exercises, and affirmations to create a comforting and positive environment.

      Based on the user's emotional context and experience level, generate the next part of the meditation session using the third suggested technique ({techniques[2]}). Ensure this segment maintains a consistent flow and tone, aligns with the intended session duration, and helps achieve the user's intended outcome. Incorporate pauses clearly marked in seconds (e.g., [5]). The pauses won't be read, don't say 'to the count of [4] seconds' or similars.

      Provide only the meditation text as raw output, appending to the previous script seamlessly. Do not include any titles, formatting, or explanations. The text will continue later, don't end it with sentences like 'as we approach the end of the session' or similars.

      User input: '{json_object}'.
      Generated script so far: '{instruction_output1} {instruction_output2} {instruction_output3}'."""
          }
        ]
      ).choices[0].message.content

      instruction_output5 = client.chat.completions.create(
        model=model,
        messages=[
          {
            "role": "user",
            "content": f"""
      You are a meditation coach specialized in guiding users through meditation sessions tailored to their emotional states. Your goal is to provide calming, personalized instructions for a variety of emotions, helping users manage feelings like stress, happiness, worry, or anxiety. Offer grounding techniques, mindfulness practices, breathing exercises, and affirmations to create a comforting and positive environment.

      Based on the user's emotional context and experience level, generate the final part of the meditation session to complete it. This section should gently transition the user out of the meditation state and back into full awareness. Ensure this segment smoothly concludes the session, aligns with the intended session duration, and reinforces the user's intended outcome. Include pauses clearly marked in seconds (e.g., [5]). The pauses won't be read, don't say 'to the count of [4] seconds' or similars.

      Provide only the meditation text as raw output, appending to the previous script seamlessly. Do not include any titles, formatting, or explanations. The text ends here, include some ending sentences. At the very end, add a wise and supportive sentence, from Stoicism or philosophy.

      User input: '{json_object}'.
      Generated script so far: '{instruction_output1} {instruction_output2} {instruction_output3} {instruction_output4}'."""
          }
        ]
      ).choices[0].message.content

      # Append example generated
      example = {
          "instruction": "You are a domain expert and content creator for an AI-driven app specializing in meditation and wellness.",
          "input": json_object,
          "output": instruction_output1 + ' ' + instruction_output2 + ' ' + instruction_output3 + ' ' + instruction_output4 + ' ' + instruction_output5
      }
      dataset.append(example)
      print("Example number %s generated" % (round+1))
      round+=1
      # Save the current dataset to the JSON file
      with open(output_file_path, "w", encoding='UTF-8') as json_file:
        json.dump(dataset, json_file, indent=4)
      # Remove the processed paragraph from the input file
      with open(input_file_path, "w", encoding="UTF-8") as file:
          json.dump(json_objects, file, indent=4)
    except Exception as e:
      print(f"An error occurred: {e}")
      print("Finalysing execution.")
      # Disconnect from Google Colab
      display(Javascript('google.colab.kernel.disconnect()'))
  print("No more examples to generate. Finalysing execution.")
# Disconnect from Google Colab
display(Javascript('google.colab.kernel.disconnect()'))

Collecting colab-xterm
  Downloading colab_xterm-0.2.0-py3-none-any.whl.metadata (1.2 kB)
Downloading colab_xterm-0.2.0-py3-none-any.whl (115 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.6/115.6 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: colab-xterm
Successfully installed colab-xterm-0.2.0
Collecting ollama
  Downloading ollama-0.4.7-py3-none-any.whl.metadata (4.7 kB)
Downloading ollama-0.4.7-py3-none-any.whl (13 kB)
Installing collected packages: ollama
Successfully installed ollama-0.4.7
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libpci3 pci.ids
The following NEW packages will be installed:
  libpci3 pci.ids pciutils
0 upgraded, 3 newly installed, 0 to remove and 30 not upgraded.
Need to get 343 kB of archives.
After this operation, 1,581 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/u