### Test Case Situations 

#### NOTE: Memoro II is still being tested

- Add Arabic
- Add queryless option
- Voice detection and saving personas

##### S1: One conversation between two people (immediate)
- Prompt to ask direct questions about the conversation
- Prompt to ask questions about things which have been mentioned at different instances

##### S2: One conversation between two people (past)
- Prompt to ask direct questions about the conversation
- Prompt to ask questions about things which have been mentioned at different instances

##### S3: One conversation between three (or more) people (immediate)
- Prompt to ask direct questions about the conversation
- Prompt to ask questions about things which have been mentioned at different instances

##### S4: One conversation between three (or more) people (past)
- Prompt to ask direct questions about the conversation
- Prompt to ask questions about things which have been mentioned at different instances

##### S5: One conversation between two conflicting people (immediate)
The two people have a conflicting "opinion" about a subject
- Prompt to ask direct questions about the conversation
- Prompt to ask questions about the opinion of either of the speakers

##### S6: One conversation between two conflicting people (past)
The two people have a conflicting "opinion" about a subject
- Prompt to ask direct questions about the conversation
- Prompt to ask questions about the opinion of either of the speakers

##### S7: One conversation between three (or more) conflicting people (immediate)
The two people have a conflicting "opinion" about a subject
- Prompt to ask direct questions about the conversation
- Prompt to ask questions about the opinion of either of the speakers

##### S8: One conversation between three (or more) conflicting people (past)
The two people have a conflicting "opinion" about a subject
- Prompt to ask direct questions about the conversation
- Prompt to ask questions about the opinion of either of the speakers

##### S9: Ask a question which requires information from two (or more) different conversations
- Do I know any marketing managers?
- What are the different meetings I have had over the past week? (Specify duration)

##### S10: Ask a question about information which has different forms in different conversations
Example
- Sarah was promoted to Head of the Marketing Department (in conversation 1)
- Sarah was promoted to Head of the PR Department (in conversation 2)

### Run this cell to check your `openai` version

In [1]:
pip show openai

Name: openai
Version: 1.35.14
Summary: The official Python library for the openai API
Home-page: 
Author: 
Author-email: OpenAI <support@openai.com>
License: 
Location: /Users/muddassirkhalidi/anaconda3/lib/python3.11/site-packages
Requires: anyio, distro, httpx, pydantic, sniffio, tqdm, typing-extensions
Required-by: langchain-openai
Note: you may need to restart the kernel to use updated packages.


### Use this cell to make installations

#### You need openai version 1.35.14

In [None]:
!pip install playsound
!pip install -U openai
!pip install -U openai-whisper
!pip install pyaudio
!pip install wave
!pip install numpy
!!pip install transformers

### `FFmpeg` Installation

#### On Windows:

##### Download
Go to the FFmpeg Official Website and download the latest build for Windows.

##### Extract
Extract the downloaded ZIP file to a directory, for example, C:\FFmpeg.

##### Environment Variable:
- Right-click on 'This PC' or 'Computer' on your desktop or File Explorer, and select 'Properties'.

- Click on 'Advanced system settings' and then 'Environment Variables'.

- Under 'System Variables', find and select 'Path', then click 'Edit'.

- Click 'New' and add the path to your FFmpeg bin directory, e.g., C:\FFmpeg\bin.

- Click 'OK' to close all dialog boxes.


#### On macOS:

You can install `ffmpeg` using Homebrew:

`brew install ffmpeg`

#### On Linux:
For Ubuntu and other Debian-based distributions, you can install ffmpeg from the apt repository:

`sudo apt update`

`sudo apt install ffmpeg`



### Use this cell to import any libraries

In [None]:
import os
import openai
from openai import OpenAI
from dotenv import load_dotenv, find_dotenv
from playsound import playsound
import pyaudio
import wave
import numpy as np
import whisper
import warnings
from datetime import datetime
import tiktoken
from transformers import pipeline

# Load environment variables from .env file
load_dotenv(dotenv_path=os.path.join(os.getcwd(), '.env'))
classifier = pipeline("sentiment-analysis", model="distilbert/distilbert-base-uncased-finetuned-sst-2-english")
os.environ["TOKENIZERS_PARALLELISM"] = "false"
tokenizer = tiktoken.get_encoding('cl100k_base')

2024-07-29 10:51:59.620982: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


### Microphone Device Selection

#### The `PyAudio` library requires you to choose a device with which you want to input speech. 

#### Use the cell below to decide on which audio device you will use for your microphone.

### RUN THIS CELL BEFORE THE MAIN CODE CELL

In [2]:
def list_audio_devices():
    """
    Lists all available audio input devices.

    Returns:
    - list: A list of tuples containing device name.
    """
    p = pyaudio.PyAudio()
    devices = []
    for i in range(p.get_device_count()):
        device_info = p.get_device_info_by_index(i)
        devices.append(device_info['name'])
    p.terminate()
    return devices

def get_device_index_by_name(name): 
    """
    Finds the name of an audio device.

    Args:
    - name (str): The name of the device.

    Returns:
    - int: The index of the device.
    
    Note: This is a helper function which will be used in getAudio().
    """
    devices = list_audio_devices()
    for device_name in devices:
        if name.lower() in device_name.lower():
            return name
    return None

device_list = list_audio_devices()
devices = []
for index, name in device_list:
    devices.append(name)
    print(name)
    
device = input('Choose a device from the list above by name: ')
while device not in devices:
    device = input('Choose a valid device: ')

Index | Device
0     |  MacBook Pro Microphone
1     |  MacBook Pro Speakers
2     |  ZoomAudioDevice
Choose a device from the list above by name: MacBook Pro Microphone


### Functions Definition Cell
#### Recording Audio using `pyAudio`
#### Speech to Text using `Whisper`
#### GPT Model: `gpt-4o-mini`

In [3]:
def get_OPENAI_API():
    """
    Loads the OpenAI API key from the environment variables.

    Returns:
    - str: The OpenAI API key.
    """
    openai.api_key = os.getenv('OPENAI_API_KEY')
    if not openai.api_key:
        raise ValueError("OpenAI API key is not set. Please set the 'OPENAI_API_KEY' environment variable in your .env file.")
    return openai.api_key

def is_positive(text, classifier):
    # Preprocess text if necessary
    processed_text = text  # Assuming no specific preprocessing is required

    # Use Transformers pipeline for sentiment analysis
    result = classifier(processed_text)

    # Extract the sentiment label from the result
    sentiment_label = result[0]['label']
    if sentiment_label == 'NEGATIVE':
        return False
    else:
        return True
    
def getAudio(device_name=device, chunk_size=1024, 
             format=pyaudio.paInt16, channels=1, rate=16000, silence_threshold=1000, silence_duration=5):
    """
    Records audio until a period of silence is detected and saves it to a file.

    Args:
    - output_filename (str): Name of the output WAV file.
    - device_name (str): Name of the input audio device.
    - chunk_size (int): Number of frames per buffer.
    - format: Audio format (e.g., pyaudio.paInt16).
    - channels (int): Number of audio channels.
    - rate (int): Sampling rate in Hz.
    - silence_threshold (int): Amplitude threshold for silence detection.
    - silence_duration (int): Duration of silence required to stop recording (in seconds).

    Returns:
    - str: The name of the saved audio file.
    
    Note: Start talking only when you see the message "Please start speaking. Recording..." 
    If your conversation/prompt is over, but Memoro continues to record, just interrupt it.
    """
    device_index = get_device_index_by_name(device_name)
    if device_index is None:
        raise ValueError(f"Device '{device_name}' not found.")

    # Variables to store audio frames and silence detection
    audio_frames = []
    silent_chunks = 0
    max_silent_chunks = int(rate / chunk_size * silence_duration)

    def is_silent(data, threshold=silence_threshold):
        """Returns 'True' if below the silence threshold."""
        max_amplitude = np.max(np.abs(data))
        return max_amplitude < threshold

    def callback(in_data, frame_count, time_info, status):
        nonlocal silent_chunks, audio_frames
        audio_frames.append(in_data)
        audio_data = np.frombuffer(in_data, dtype=np.int16)
        if is_silent(audio_data):
            silent_chunks += 1
        else:
            silent_chunks = 0
        if silent_chunks > max_silent_chunks:
            return (None, pyaudio.paComplete)
        return (in_data, pyaudio.paContinue)

    # Initialize PyAudio
    p = pyaudio.PyAudio()

    try:
        # Open stream
        stream = p.open(format=format,
                        channels=channels,
                        rate=rate,
                        input=True,
                        frames_per_buffer=chunk_size,
                        stream_callback=callback,
                        input_device_index=device_index)

        print("Please start speaking. Recording...")
        stream.start_stream()

        # Keep the stream active while recording
        while stream.is_active():
            pass

        # Stop and close the stream
        stream.stop_stream()
        stream.close()

    except KeyboardInterrupt: 
        # Handle keyboard interruption for noisy environments
        print("Recording interrupted by user.")
    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        p.terminate()

    # Save the recorded audio to a file
    output_filename = os.path.join(os.getcwd(), 'audios','recorded_speech.wav')
    try:
        with wave.open(output_filename, 'wb') as wf:
            wf.setnchannels(channels)
            wf.setsampwidth(p.get_sample_size(format))
            wf.setframerate(rate)
            wf.writeframes(b''.join(audio_frames))
    except Exception as e:
        print(f"Failed to save audio file: {e}")

    return output_filename

def play_audio(file_path):
    """
    Plays an audio file.

    Args:
    - file_path (str): The path of the audio file.
    """
    playsound(file_path)
    
def speech_to_text():
    """
    Converts recorded audio to text using Whisper model.

    Returns:
    - str: The transcribed text.
    """
    audio = getAudio()

    # Suppress the FP16 warning
    warnings.filterwarnings("ignore", category=UserWarning, message="FP16 is not supported on CPU; using FP32 instead")

    # Load the Whisper model
    model = whisper.load_model("base")  
    '''
    Choose among tiny, base, small, medium, large models
    The higher the model, higher the accuracy. But more accuracy means 
    it will take a lot longer to transcribe the audio.
    '''

    print('Processing speech...')
    # Transcribe the audio file
    result = model.transcribe(audio)
    print('Transcribed!')
    text = result['text']
    print(text)
    return text

def text_to_speech(text):
    """
    Converts text to speech and plays the audio.

    Args:
    - text (str): The text to be converted to speech.
    """
    response = openai.audio.speech.create(
        model="tts-1",
        voice="onyx",
        input=text
    )
    response_path = os.path.join(os.getcwd(), 'audios', 'response_voice.mp3')  # Contains the audio you hear when Memoro responds
    warnings.filterwarnings("ignore", category=DeprecationWarning)
    response.stream_to_file(response_path)
    play_audio(response_path)

def write_to_file(text):
    """
    Writes the text to a file.

    Args:
    - text (str): The text to be written.

    Returns:
    - str: The file path.
    """
    file_path = os.path.join(os.getcwd(), 'buffer', 'short_term_buffer.txt')
    with open(file_path, 'a') as file:
        file.write(text)
        
    return 

def read_from_file(file_path):
    """
    Reads text from a file.

    Args:
    - file_path (str): The path of the file.

    Returns:
    - str: The read text.
    
    Note: We are not using this function right now and may discard it after 
    integrating Memoro II with PineCone.
    """
    with open(file_path, 'r', encoding='utf-8') as file:
        text = file.read()
    return text

def get_context():
    text = speech_to_text()
    context = f'\n\nTimestamp: {str(datetime.now())}\nConversation:\n{text}'
    write_to_file(context)

def get_prompt():
    query = speech_to_text()
    prompt = f'\n\nTimestamp: {str(datetime.now())}\nQuestion: {query}'
    write_to_file(prompt)
    short_buffer = os.path.join(os.getcwd(), 'buffer', 'short_term_buffer.txt')
    long_buffer = os.path.join(os.getcwd(), 'buffer', 'long_term_buffer.txt')
    for file in [short_buffer, long_buffer]:
        context = read_from_file(file)
        response = openai.chat.completions.create(
            model='gpt-4o-mini',
            messages=[
                {"role": "system", "content": "Your name is Memoro and you are a memory assistant listening to my conversations. You are given context with timstamps for the different conversations I have had. After you answer the question, ask for a follow up question."},
                {"role": "user", "content": context},
                {"role": "user", "content": prompt}

            ]
        )
        text = response.choices[0].message.content
        if is_positive(text, classifier):
            break
        text_to_speech('Searching long term buffer...')
            
#     response = text + '\nIs there anything else I can help you with?'
    text_to_speech(text)
    print(text)

def buffer_exceeded():
    short_buffer = os.path.join(os.getcwd(), 'buffer', 'short_term_buffer.txt')
    text = read_from_file(short_buffer)
    # Define the tokenizer for GPT-3.5-turbo using cl100k_base encoding
    tokenizer = tiktoken.get_encoding('cl100k_base')

    # Encode the context and count the tokens
    tokens = tokenizer.encode(text)
    num_tokens = len(tokens)
    if num_tokens > 2000:
        return True
    else:
        return False
    
def move_to_long_term_buffer():
    # Read the entire content of the short term buffer file
    short_term_buffer = os.path.join(os.getcwd(), 'buffer', 'short_term_buffer.txt')
    long_term_buffer = os.path.join(os.getcwd(), 'buffer', 'long_term_buffer.txt')
    with open(short_term_buffer, 'r', encoding='utf-8') as file:
        text = file.read()

    # Tokenize the text
    tokens = tokenizer.encode(text)

    # Split the tokens into two parts
    first_n_tokens = tokens[:2000]
    remaining_tokens = tokens[2000:]

    # Decode the first 2000 tokens and remaining tokens back to text
    first_n_text = tokenizer.decode(first_n_tokens)
    remaining_text = tokenizer.decode(remaining_tokens)

    # Write the first n tokens to the new file
    with open(long_term_buffer, 'a', encoding='utf-8') as file:
        file.write(first_n_text)

    # Update the original file with the remaining tokens
    with open(short_term_buffer, 'w', encoding='utf-8') as file:
        file.write(remaining_text)
        
def intro():
    file_path = os.path.join(os.getcwd(), 'audios','intro_prompt_voice.mp3')
    play_audio(file_path)

# Main Method

In [5]:
intro()

while True:
    if buffer_exceeded():
        move_to_long_term_buffer()
    
    try:
        choice = input('Enter 1 to record and 2 to retrieve memories: ')
        while choice not in ['1','2']:
            choice = input('Enter a valid choice [1,2]: ')
        if choice == '1':
            get_context()
        else:
            get_prompt()
        print('-'*50)
        print('Interrupt the kernel to end the program')
    except KeyboardInterrupt:
        print("Thank you!")
        break

Enter 1 to record and 2 to retrieve memories: 1
Please start speaking. Recording...
Processing speech...
Transcribed!
 Starts me. Let's talk. Whatever talk. I see how long it records. Does it have an SD thing? Okay, I'll stop this. I have a memorial.
--------------------------------------------------
Interrupt the kernel to end the program
Thank you!


### Testing Cases

#### Use the functions below to get audio files from the text files. 
- Get the content for the text files from ChatGPT
- After getting the content, create text files in the MEMORO-II/memoro-ii/test_files/text
- Name the text files like this: S{test case number}.txt
- Once the text files are created, run the cell below.
- Wait for some time for the cell to complete running, this will take some time.


In [6]:
def read_from_file(file_path):
    """
    Reads text from a file.

    Args:
    - file_path (str): The path of the file.

    Returns:
    - str: The read text.
    
    Note: We are not using this function right now and may discard it after 
    integrating Memoro II with PineCone.
    """
    with open(file_path, 'r', encoding='utf-8') as file:
        text = file.read()
    return text

def text_to_speech(text, index):
    """
    Converts text to speech and plays the audio.

    Args:
    - text (str): The text to be converted to speech.
    """
    response = openai.audio.speech.create(
        model="tts-1",
        voice="onyx",
        input=text
    )
    response_path = os.path.join(os.getcwd(),'test_files', 'text', f'S{index}.mp3')  # Contains the audio you hear when Memoro responds
    warnings.filterwarnings("ignore", category=DeprecationWarning)
    response.stream_to_file(response_path)
#     play_audio(response_path)

#Change the range to (6,11) for test cases 6-10
for x in range(1,6):
    text = read_from_file(os.path.join(os.getcwd(), 'test_files', 'audio' f'S{x}.txt'))
    print(text)
    text_to_speech(text, x)
    print('-'*50)
    