### Test Case Situations 

#### NOTE: Memoro II is still being tested

##### S1: One conversation between two people (immediate)
- Prompt to ask direct questions about the conversation
- Prompt to ask questions about things which have been mentioned at different instances

##### S2: One conversation between two people (past)
- Prompt to ask direct questions about the conversation
- Prompt to ask questions about things which have been mentioned at different instances

##### S3: One conversation between three (or more) people (immediate)
- Prompt to ask direct questions about the conversation
- Prompt to ask questions about things which have been mentioned at different instances

##### S4: One conversation between three (or more) people (past)
- Prompt to ask direct questions about the conversation
- Prompt to ask questions about things which have been mentioned at different instances

##### S5: One conversation between two conflicting people (immediate)
The two people have a conflicting "opinion" about a subject
- Prompt to ask direct questions about the conversation
- Prompt to ask questions about the opinion of either of the speakers

##### S6: One conversation between two conflicting people (past)
The two people have a conflicting "opinion" about a subject
- Prompt to ask direct questions about the conversation
- Prompt to ask questions about the opinion of either of the speakers

##### S7: One conversation between three (or more) conflicting people (immediate)
The two people have a conflicting "opinion" about a subject
- Prompt to ask direct questions about the conversation
- Prompt to ask questions about the opinion of either of the speakers

##### S8: One conversation between three (or more) conflicting people (past)
The two people have a conflicting "opinion" about a subject
- Prompt to ask direct questions about the conversation
- Prompt to ask questions about the opinion of either of the speakers

##### S9: Ask a question which requires information from two (or more) different conversations
- Do I know any marketing managers?
- What are the different meetings I have had over the past week? (Specify duration)

##### S10: Ask a question about information which has different forms in different conversations
Example
- Sarah was promoted to Head of the Marketing Department (in conversation 1)
- Sarah was promoted to Head of the PR Department (in conversation 2)

### Run this cell to check your `openai` version

In [11]:
pip show openai

Name: openai
Version: 1.35.13
Summary: The official Python library for the openai API
Home-page: 
Author: 
Author-email: OpenAI <support@openai.com>
License: 
Location: /Users/muddassirkhalidi/anaconda3/lib/python3.11/site-packages
Requires: anyio, distro, httpx, pydantic, sniffio, tqdm, typing-extensions
Required-by: langchain-openai
Note: you may need to restart the kernel to use updated packages.


### Use this cell to make installations

#### You need openai version 1.35.13

In [None]:
!pip install playsound
!pip install -U openai
!pip install -U openai-whisper
!pip install pyaudio
!pip install wave
!pip install numpy
!pip install tqdm
!pip install pinecone
!pip install nltk

### `FFmpeg` Installation

#### On Windows:

##### Download
Go to the FFmpeg Official Website and download the latest build for Windows.

##### Extract
Extract the downloaded ZIP file to a directory, for example, C:\FFmpeg.

##### Environment Variable:
- Right-click on 'This PC' or 'Computer' on your desktop or File Explorer, and select 'Properties'.

- Click on 'Advanced system settings' and then 'Environment Variables'.

- Under 'System Variables', find and select 'Path', then click 'Edit'.

- Click 'New' and add the path to your FFmpeg bin directory, e.g., C:\FFmpeg\bin.

- Click 'OK' to close all dialog boxes.


#### On macOS:

You can install `ffmpeg` using Homebrew:

`brew install ffmpeg`

#### On Linux:
For Ubuntu and other Debian-based distributions, you can install ffmpeg from the apt repository:

`sudo apt update`

`sudo apt install ffmpeg`



### Use this cell to import any libraries

In [None]:
import os
import openai
from openai import OpenAI
from dotenv import load_dotenv, find_dotenv
from playsound import playsound
import pyaudio
import wave
import numpy as np
import whisper
import warnings
from tqdm.auto import tqdm
import pinecone
import nltk
from tqdm import tqdm
from time import sleep
nltk.download('punkt')
from pinecone import Pinecone

# Load environment variables from .env file
load_dotenv(dotenv_path=os.path.join(os.getcwd(), '.env'))

### Microphone Device Selection

#### The `PyAudio` library requires you to choose a device with which you want to input speech. 

#### The function `getAudio()` has an argument `device_name`. Before running the `main` cell, change the 

#### default argument from `MacBook Pro Microphone` to the the device you want to use. 


### RUN THIS CELL BEFORE THE MAIN CODE CELL

In [29]:
def list_audio_devices():
    p = pyaudio.PyAudio()
    for i in range(p.get_device_count()):
        device_info = p.get_device_info_by_index(i)
        print(device_info['name'])
    p.terminate()

list_audio_devices()

### Main Code Cell
#### Recording Audio using `pyAudio`
#### Speech to Text using `Whisper`
#### GPT Model: `gpt-3.5-turbo`

In [3]:
def get_OPENAI_API():
    """
    Loads the OpenAI API key from the environment variables.

    Returns:
    - str: The OpenAI API key.
    """
    openai.api_key = os.getenv('OPENAI_API_KEY')
    if not openai.api_key:
        raise ValueError("OpenAI API key is not set. Please set the 'OPENAI_API_KEY' environment variable in your .env file.")
    return openai.api_key

def get_Model(): 
    """
    Determines the appropriate GPT-3.5 model based on the current date.

    Returns:
    - str: The model name to use.
    
    Note: We are not using this function right now 
    because the process_prompt function already decides the model.
    """
    current_date = datetime.datetime.now().date()
    target_date = datetime.date(2024, 6, 12)

    # Select the model based on the current date
    if current_date > target_date:
        llm_model = "gpt-3.5-turbo"
    else:
        llm_model = "gpt-3.5-turbo-0301"
    return llm_model

def list_audio_devices():
    """
    Lists all available audio input devices.

    Returns:
    - list: A list of tuples containing device index, name, max input channels, and default sample rate.
    """
    p = pyaudio.PyAudio()
    devices = []
    for i in range(p.get_device_count()):
        device_info = p.get_device_info_by_index(i)
        devices.append((i, device_info['name'], device_info['maxInputChannels'], device_info['defaultSampleRate']))
    p.terminate()
    return devices

def get_device_index_by_name(name): 
    """
    Finds the index of an audio device by its name.

    Args:
    - name (str): The name of the device.

    Returns:
    - int: The index of the device.
    
    Note: This is a helper function which will be used in getAudio().
    """
    devices = list_audio_devices()
    for index, device_name, _, _ in devices:
        if name.lower() in device_name.lower():
            return index
    return None

def getAudio(output_filename="recorded_speech.wav", device_name="MacBook Pro Microphone", chunk_size=1024, 
             format=pyaudio.paInt16, channels=1, rate=16000, silence_threshold=1000, silence_duration=5):
    """
    Records audio until a period of silence is detected and saves it to a file.

    Args:
    - output_filename (str): Name of the output WAV file.
    - device_name (str): Name of the input audio device.
    - chunk_size (int): Number of frames per buffer.
    - format: Audio format (e.g., pyaudio.paInt16).
    - channels (int): Number of audio channels.
    - rate (int): Sampling rate in Hz.
    - silence_threshold (int): Amplitude threshold for silence detection.
    - silence_duration (int): Duration of silence required to stop recording (in seconds).

    Returns:
    - str: The name of the saved audio file.
    
    Note: Start talking only when you see the message "Please start speaking. Recording..." 
    If your conversation/prompt is over, but Memoro continues to record, just interrupt it.
    """
    device_index = get_device_index_by_name(device_name)
    if device_index is None:
        raise ValueError(f"Device '{device_name}' not found.")

    # Variables to store audio frames and silence detection
    audio_frames = []
    silent_chunks = 0
    max_silent_chunks = int(rate / chunk_size * silence_duration)

    def is_silent(data, threshold=silence_threshold):
        """Returns 'True' if below the silence threshold."""
        max_amplitude = np.max(np.abs(data))
        return max_amplitude < threshold

    def callback(in_data, frame_count, time_info, status):
        nonlocal silent_chunks, audio_frames
        audio_frames.append(in_data)
        audio_data = np.frombuffer(in_data, dtype=np.int16)
        if is_silent(audio_data):
            silent_chunks += 1
        else:
            silent_chunks = 0
        if silent_chunks > max_silent_chunks:
            return (None, pyaudio.paComplete)
        return (in_data, pyaudio.paContinue)

    # Initialize PyAudio
    p = pyaudio.PyAudio()

    try:
        # Open stream
        stream = p.open(format=format,
                        channels=channels,
                        rate=rate,
                        input=True,
                        frames_per_buffer=chunk_size,
                        stream_callback=callback,
                        input_device_index=device_index)

        print("Please start speaking. Recording...")
        stream.start_stream()

        # Keep the stream active while recording
        while stream.is_active():
            pass

        # Stop and close the stream
        stream.stop_stream()
        stream.close()

    except KeyboardInterrupt: 
        # Handle keyboard interruption for noisy environments
        print("Recording interrupted by user.")
    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        p.terminate()

    # Save the recorded audio to a file
    try:
        with wave.open(output_filename, 'wb') as wf:
            wf.setnchannels(channels)
            wf.setsampwidth(p.get_sample_size(format))
            wf.setframerate(rate)
            wf.writeframes(b''.join(audio_frames))
        print(f"Audio saved to {output_filename}")
    except Exception as e:
        print(f"Failed to save audio file: {e}")

    return output_filename

def create_metadata(sentences, window=20, stride=4):
    """
    Creates metadata for sentences with a sliding window approach.

    Args:
    - sentences (list): List of tokenized sentences.
    - window (int): Number of sentences in each metadata entry.
    - stride (int): Step size between each metadata entry.

    Returns:
    - list: List of metadata entries.
    """
    data = []
    for i in tqdm(range(0, len(sentences), stride)):
        i_end = min(len(sentences), i + window)
        text = ' '.join(sentences[i:i_end])
        data.append({
            'text': text,
            'id': i,
        })
    return data

def get_PINECONE_API():
    """
    Initializes connection to Pinecone API and returns the index.

    Returns:
    - pinecone.Index: The Pinecone index.
    """
    # Initialize connection to Pinecone
    pinecone.api_key = os.getenv('PINECONE_API_KEY')
    
    index_name = 'memoro'
    # Initialize connection to Pinecone
    pc = Pinecone(api_key=pinecone.api_key)
    if index_name not in pc.list_indexes().names():
        # If the index does not exist, create it
        pinecone.create_index(
            name=index_name,
            dimension=1536,
            metric='cosine'
        )
    
    # Connect to index
    index = pc.Index(index_name)
    return index

def upsert_vectors(sentences, embed_model='text-embedding-ada-002'):
    """
    Upserts vectors into Pinecone from sentences.

    Args:
    - sentences (list): List of tokenized sentences.
    - embed_model (str): Model to use for generating embeddings.
    """
    new_data = create_metadata(sentences)
    batch_size = 100  # How many embeddings we create and insert at once

    for i in tqdm(range(0, len(new_data), batch_size)):
        # Find end of batch
        i_end = min(len(new_data), i + batch_size)
        meta_batch = new_data[i:i_end]
        # Get IDs
        ids_batch = [str(x['id']) for x in meta_batch]
        # Get texts to encode
        texts = [x['text'] for x in meta_batch]
        # Create embeddings (try-except added to avoid RateLimitError)

        try:
            res = openai.Embedding.create(input=texts, model=embed_model)
        except Exception as e:
            print(f"Error creating embeddings: {e}")
            done = False
            while not done:
                sleep(5)
                try:
                    res = openai.Embedding.create(input=texts, model=embed_model)
                    done = True
                except Exception as e:
                    print(f"Retrying due to error: {e}")
                    pass
                
        embeds = [record['embedding'] for record in res.to_dict()['data']]
        # Cleanup metadata
        meta_batch = [{'text': x['text']} for x in meta_batch]
        
        to_upsert = list(zip(ids_batch, embeds, meta_batch))
        
        index = get_PINECONE_API()
        print('Connected to the Pinecone API')
        # Upsert to Pinecone
        try:
            index.upsert(vectors=to_upsert)
            print(f"Upserted batch {i // batch_size + 1}")
        except Exception as e:
            print(f"Error upserting batch {i // batch_size + 1}: {e}")

def speech_to_text():
    """
    Converts recorded audio to text using Whisper model.

    Returns:
    - str: The transcribed text.
    """
    audio = getAudio()

    # Suppress the FP16 warning
    warnings.filterwarnings("ignore", category=UserWarning, message="FP16 is not supported on CPU; using FP32 instead")

    # Load the Whisper model
    model = whisper.load_model("base")  
    '''
    Choose among tiny, base, small, medium, large models
    The higher the model, higher the accuracy. But more accuracy means 
    it will take a lot longer to transcribe the audio.
    '''

    print('Processing speech...')
    # Transcribe the audio file
    result = model.transcribe(audio)
    print('Transcribed!')
    text = result['text']
    write_to_file(text)
    return text

def process_context():
    """
    Processes the context by recording speech, converting it to text, and upserting to Pinecone.
    """
    text = speech_to_text()
    print('Tokenizing...')
    sentences = nltk.sent_tokenize(text)
    print('Upserting...')
    upsert_vectors(sentences)
    print('Upserted.')

def text_to_speech(text):
    """
    Converts text to speech and plays the audio.

    Args:
    - text (str): The text to be converted to speech.
    """
    response = openai.Audio.create(
        model="tts-1",
        voice="onyx",
        input=text
    )
    response_path = os.path.join(os.getcwd(), 'response_voice.mp3')  # Contains the audio you hear when Memoro responds
    warnings.filterwarnings("ignore", category=DeprecationWarning)
    response.stream_to_file(response_path)
    play_audio(response_path)

def write_to_file(text):
    """
    Writes the text to a file.

    Args:
    - text (str): The text to be written.

    Returns:
    - str: The file path.
    """
    with open('STT_file.txt', 'w') as file:
        file.write(text)
        
    return os.path.join(os.getcwd(), 'STT_file.txt')

def read_from_file(file_path):
    """
    Reads text from a file.

    Args:
    - file_path (str): The path of the file.

    Returns:
    - str: The read text.
    
    Note: We are not using this function right now and may discard it after 
    integrating Memoro with PineCone.
    """
    with open(file_path, 'r', encoding='utf-8') as file:
        text = file.read()
    return text

def play_audio(file_path):
    """
    Plays an audio file.

    Args:
    - file_path (str): The path of the audio file.
    """
    playsound(file_path)
    
def get_prompt(embed_model='text-embedding-ada-002'):
    """
    Retrieves the prompt by converting speech to text and finding relevant contexts from Pinecone.

    Args:
    - embed_model (str): The embedding model to use.

    Returns:
    - str: The constructed prompt.
    """
    limit = 3750
    intro_path = os.path.join(os.getcwd(), 'intro_prompt_voice.mp3')
    play_audio(intro_path)
    query = speech_to_text()
    print("Recognized Prompt:", query)
    
    res = openai.Embedding.create(
        input=[query],
        model=embed_model
    )
    
    index = get_PINECONE_API()
    # Retrieve from Pinecone
    xq = res.to_dict()['data'][0]['embedding']
    
    # Get relevant contexts
    res = index.query(vector=xq, top_k=3, include_metadata=True)
    contexts = [
        x['metadata']['text'] for x in res['matches']
    ]

    # Build our prompt with the retrieved contexts included
    prompt_start = (
        "Answer the question based on the context below.\n\n" +
        "Context:\n"
    )
    prompt_end = (
        f"\n\nQuestion: {query}\nAnswer:"
    )
    # Append contexts until hitting limit
    for i in range(1, len(contexts)):
        if len("\n\n---\n\n".join(contexts[:i])) >= limit:
            prompt = (
                prompt_start +
                "\n\n---\n\n".join(contexts[:i-1]) +
                prompt_end
            )
            break
        elif i == len(contexts) - 1:
            prompt = (
                prompt_start +
                "\n\n---\n\n".join(contexts) +
                prompt_end
            )
    return prompt

def process_prompt():
    """
    Processes the prompt by generating a response from the GPT-3.5-turbo model and converting it to speech.
    """
    prompt = get_prompt()
    # Query gpt-3.5-turbo
    response = openai.ChatCompletion.create(
        model='gpt-3.5-turbo',
        messages=[
            {"role": "system", "content": "You are a memory assistant listening to my conversations."},
            {"role": "user", "content": prompt}
        ]
    )
    text = response['choices'][0]['message']['content']
    text_to_speech(text)


# Example usage
api = get_OPENAI_API()
context = process_context()

### Run the cell below to prompt Memoro II

In [None]:
process_prompt()