## <b><font color='darkblue'>Preface</font></b>
From this notebook, we would like to explore how to integrate our Model A (STT) into [**LangChain**](https://python.langchain.com/v0.1/docs/get_started/introduction) framework.

In [1]:
import os
from os import walk

### <b><font color='darkgreen'>Model A candidates</font></b>
From this [issue #30](https://github.com/Cockatoo-AI-Org/Cockatoo.AI/issues/30), we have evaluate a few model a models:
```
Raw data (LangEnum.cn)
+-------------------------+------------------------------+-------+----------+
|          Input          |             Name             | Score | Time (s) |
+-------------------------+------------------------------+-------+----------+
| cn_20240108_johnlee.wav |    SpeechRecognition/GCP     |  0.36 |   6.27   |
| cn_20240108_johnlee.wav | SpeechRecognition/WhisperAPI |  0.88 |   2.42   |
|  cn_20240121_chung.wav  |    SpeechRecognition/GCP     |  0.93 |   15.1   |
|  cn_20240121_chung.wav  | SpeechRecognition/WhisperAPI |  0.96 |   2.54   |
+-------------------------+------------------------------+-------+----------+

Ranking Table
+------------------------------+------------+--------------+
|             Name             | Avg. score | Avg. Time(s) |
+------------------------------+------------+--------------+
| SpeechRecognition/WhisperAPI |    0.92    |     2.48     |
|    SpeechRecognition/GCP     |    0.65    |    10.68     |
+------------------------------+------------+--------------+
```

"Although the current dataset size may limit our ability to fully assess Model A's capabilities (which we plan to address), <b>we will proceed with integrating `WhisperAPI` due to its superior performance to date</b>.

### <b><font color='darkgreen'>Testing dataset</font></b>
We will put our testing dataset under path `Cockatoo.AI/experiments/voice_dataset` (The root of repo is `Cockatoo.AI/`).

In [2]:
TEST_DATASET_PATH = '../voice_dataset'

In [3]:
for (dirpath, dirnames, filenames) in walk(TEST_DATASET_PATH):
    for file_name in filenames:
        file_path = os.path.join(dirpath, file_name)
        if file_path.endswith('.wav'):
            print(f'{file_path}')

../voice_dataset/what_is_cock_ai.wav


## <b><font color='darkblue'>Integration work</font></b>
Here we are going to explain the integration work step by step.

### <b><font color='darkgreen'>API to use model A</font></b>
Here we are going to introduce how to use our wrapper in usine model a to do speech to text transformation.

In [4]:
from speech_recognition_wrapper import SRWhisperWrapper
from dotenv import load_dotenv
import openai
import wrapper

load_dotenv()
openai.api_key = os.environ['OPENAI_API_KEY']

stt_agent = SRWhisperWrapper(wrapper.LangEnum.en)

In [5]:
resp = stt_agent.audio_2_text('../voice_dataset/what_is_cock_ai.wav')

In [6]:
resp.text

'Could you explain to me what Cockatoo AI is?'

### <b><font color='darkgreen'>LangChain API</font></b>
Here we are going to study and introduce how to integrate our model a wrapper into LangChain API. Let's recall what we have introduced how chains work in LangChain in this section **["Demonstrate the usage of RAG"](https://github.com/Cockatoo-AI-Org/Cockatoo.AI/blob/master/experiments/langchain_lab/README.md#demonstrate-the-usage-of-rag)**:
```python
  chat_model = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
  retriever = vectorstore.as_retriever(k=most_relevant_doc_count)
  qa_chain_with_rag = (
      {"context": retriever, "question": RunnablePassthrough()}  # 1) Integrate RAG
      | review_prompt_template                                   # 2) Put context retrieved RAG into question
      | chat_model                                               # 3) Feed in LLM
      | StrOutputParser()                                        # 4) Format the output
  )
  return qa_chain_with_rag
```

So we could play with chatbot with RAG this way:
```python
>>> from scripts import doc_loader
>>> doc_path = '...'  # Please replace `...` with doc path we want LLM to search for.
>>> vectorstore = doc_loader.create_vector_db_of_repo([doc_path])
>>> question = 'Where do we put the sample code of LangChain?'
>>> from chatbot_demo import rag
>>> rag_chatbot = rag.get_chatbot(vectorstore)
```

Let's play with it a bit from this notebook:

In [7]:
from langchain.prompts import (
    PromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    ChatPromptTemplate,
)
from langchain.schema.runnable import RunnablePassthrough
# from langchain_community.chat_models import ChatOpenAI
from langchain_openai import ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from chroma_utils_in_digest_repo_info import prepare_chroma_data, CHROMA_PATH, EMBEDDING_FUNC_NAME, COLLECTION_NAME, TEST_DOC_PATHS
import chromadb
from chromadb.utils import embedding_functions
from more_itertools import batched

import os
import json
import openai
import textwrap

In [8]:
qa_system_template_str = textwrap.dedent("""
Your job is to play senior engineer
to answer questions about how to use the repo `Cockatoo.AI` on Github.

Use the following context to answer questions. Be as detailed as possible,
but don't make up any information that's not from the context.

If you don't know an answer, say you don't know.

{context}
""").strip()

review_system_prompt = SystemMessagePromptTemplate(
    prompt=PromptTemplate(
        input_variables=["context"], template=qa_system_template_str))

review_human_prompt = HumanMessagePromptTemplate(
    prompt=PromptTemplate(
        input_variables=["question"], template="{question}"))

In [9]:
messages = [review_system_prompt, review_human_prompt]
review_prompt_template = ChatPromptTemplate(
    input_variables=["context", "question"],
    messages=messages)

In [10]:
resp = review_prompt_template.invoke(
    {
        'question': 'What is Cockatoo AI?',
        'context': 'Cockatoo AI is a framework to integrate model A as STT, model B as LLM and model C as TTS for further usage.'})

In [11]:
resp

ChatPromptValue(messages=[SystemMessage(content="Your job is to play senior engineer\nto answer questions about how to use the repo `Cockatoo.AI` on Github.\n\nUse the following context to answer questions. Be as detailed as possible,\nbut don't make up any information that's not from the context.\n\nIf you don't know an answer, say you don't know.\n\nCockatoo AI is a framework to integrate model A as STT, model B as LLM and model C as TTS for further usage."), HumanMessage(content='What is Cockatoo AI?')])

Then we could leverage below script to digest the knowledge of Cockatoo AI preserved in the repo. So we could integrate [**RAG**](https://python.langchain.com/v0.2/docs/tutorials/rag/) (Retrieval Augmented Generation) for later usage:
```shell
./chroma_utils_in_digest_repo_info.py 

$ ls -hl cockatoo_repo_embeddings/
total 172K
drwxr-x--- 2 johnkclee primarygroup 4.0K May 12 02:49 4a02abcf-767a-4962-bf71-14117ad65422
-rw-r----- 1 johnkclee primarygroup 164K May 12 02:49 chroma.sqlite3
```

Folder <font color='olive'>`cockatoo_repo_embeddings`</font> will be created to store the knowledge/information of our repo as vectorstore.

In [12]:
test_chroma_dataset = prepare_chroma_data(TEST_DOC_PATHS)

Total 5 documents collected!


In [13]:
test_chroma_dataset['ids']

['1', '2', '3', '4', '5']

In [14]:
document_indices = list(range(len(test_chroma_dataset['documents'])))

print(f'Ingesting doc({len(document_indices)}) ...')
for batch in batched(document_indices, 166):
    print(batch)

Ingesting doc(5) ...
(0, 1, 2, 3, 4)


Then we could retrieve the vectorstore instance for RAG query:

In [15]:
CHROMA_PATH

'cockatoo_repo_embeddings'

In [16]:
chromadb_client = chromadb.PersistentClient(CHROMA_PATH)

In [17]:
# embedding_func = embedding_functions.SentenceTransformerEmbeddingFunction(
#     model_name=EMBEDDING_FUNC_NAME)
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
      model_name="text-embedding-ada-002")

def embed_query(text, embedding_func=openai_ef):
    return embedding_func(text)[0]

openai_ef.embed_query = embed_query

In [18]:
embed_query('test')

[-0.015062220394611359,
 -0.009412189945578575,
 0.008481836877763271,
 -0.016855018213391304,
 -0.0310480035841465,
 0.012631077319383621,
 -0.015469674952328205,
 -0.02628079056739807,
 -0.006105020642280579,
 -0.014125076122581959,
 0.00924241729080677,
 0.02446082979440689,
 0.011951987631618977,
 -0.003948909696191549,
 -0.014070749282836914,
 0.010125234723091125,
 0.04498293623328209,
 0.00306439446285367,
 0.006152557209134102,
 -0.02683764509856701,
 -0.005076199304312468,
 0.010566643439233303,
 -0.005476862657815218,
 -0.004183195531368256,
 -0.009656663052737713,
 -0.0022325089666992426,
 0.02110612392425537,
 -0.03743145242333412,
 0.012013105675578117,
 -0.016909345984458923,
 0.014546112157404423,
 -0.012468095868825912,
 -0.023863229900598526,
 -0.037105489522218704,
 -0.03702399879693985,
 -0.026375863701105118,
 -0.010953725315630436,
 -0.0036059690173715353,
 0.0031900261528789997,
 0.007368128746747971,
 0.010539479553699493,
 0.02144566923379898,
 -0.00651247519999

In [19]:
collection = chromadb_client.get_collection(
    name=COLLECTION_NAME, embedding_function=openai_ef)

In [20]:
# Number of ingested documents
len(collection.get()['ids'])

5

In [21]:
resp = collection.query(
    query_texts=["What is Cockatoo AI?"],
    n_results=2)

In [22]:
resp["documents"][0][0]

'# Text to Speech (TTS) model notes\n\n## General: \n - [TTS Intro](https://huggingface.co/docs/transformers/tasks/text-to-speech)\n - [What is TTS](https://huggingface.co/tasks/text-to-speech) (Suno Bark, MS SpeechT5, MMS TTS)\n\n\n## Resources\n\n### Open AI TTS API\n - Pricing:\n   Model | Usage \n   -- | -- \n   Whisper | $0.006\xa0/ minute (rounded to the nearest second)\n   TTS | $0.015\xa0/ 1K characters\n   TTS HD | $0.030\xa0/ 1K characters\n - [API Tutorial](https://platform.openai.com/docs/guides/text-to-speech)\n\n### [Speech-T5-TTS](https://huggingface.co/microsoft/speecht5_tts)\n - [Interactive Demo](https://huggingface.co/spaces/Zhenhong/text-to-speech-SpeechT5-demo): only support for English, and the generated sound is too monotonic without cadence.\n - Currently the English TTS are good for most models that I have tried. The problem is that when it comes to Chinese-English mixed text, it is hard for model to synthesize the output voice\n\n### [Suno Bark Github](https:/

Finally, let's apply RAG in LLM query:

In [23]:
question = "What is Cockatoo AI?"
context = """
You are a senior software engineer in Cockatoo.AI to answer questions: {}
"""

In [24]:
good_reviews = collection.query(
    query_texts=[question],
    n_results=2,
    include=["documents"],
    # where={"Rating": {"$gte": 3}},
)

In [25]:
reviews_str = ",".join(good_reviews["documents"][0])

In [26]:
'''
answer_resp = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": context.format(reviews_str)},
        {"role": "user", "content": question},
    ],
    temperature=0,
    n=1)
'''
from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": context.format(reviews_str)},
        {"role": "user", "content": question},
    ],
    temperature=0,
    n=1)

In [27]:
response

ChatCompletion(id='chatcmpl-9zZSCZgVWnfImVBDNAay00ABj6fE9', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Cockatoo AI is a fictional company in this context, where you are a senior software engineer working on developing a language tutor focused on listening and speaking skills. The goal of Cockatoo AI is to provide users with a platform where they can practice listening and speaking in a foreign language, interact with AI language tutors, and improve their language skills effectively. The language tutor uses various models such as text-to-speech (TTS) and large language models (LLM) to create a personalized and interactive learning experience for users.', role='assistant', function_call=None, tool_calls=None, refusal=None))], created=1724461320, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=107, prompt_tokens=1422, total_tokens=1529))

In [28]:
print(response.choices[0].message.content)

Cockatoo AI is a fictional company in this context, where you are a senior software engineer working on developing a language tutor focused on listening and speaking skills. The goal of Cockatoo AI is to provide users with a platform where they can practice listening and speaking in a foreign language, interact with AI language tutors, and improve their language skills effectively. The language tutor uses various models such as text-to-speech (TTS) and large language models (LLM) to create a personalized and interactive learning experience for users.


### <b><font color='darkgreen'>Integrate RAG into LangChain</font></b>
<b><font size='3ptx'>A retrieval system is defined as something that can take string queries and return the most 'relevant' Documents from some source.</font></b>

A retriever follows the standard Runnable interface, and should be used via the standard runnable methods of `invoke`, `ainvoke`, `batch`, `abatch`.

When implementing a custom retriever, the class should implement the `_get_relevant_documents` method to define the logic for retrieving documents. Optionally, an async native implementations can be provided by overriding the `_aget_relevant_documents` method.

Example: A retriever that returns the first 5 documents from a list of documents
```python
from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever
from typing import List

class SimpleRetriever(BaseRetriever):
    docs: List[Document]
    k: int = 5
    
    def _get_relevant_documents(self, query: str) -> List[Document]:
        """Return the first k documents from the list of documents"""
        return self.docs[:self.k]

    async def _aget_relevant_documents(self, query: str) -> List[Document]:
        """(Optional) async native implementation."""
        return self.docs[:self.k]
```

In [29]:
from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever

class ChromaRetriever(BaseRetriever):
    pass

In [30]:
#from langchain_community.vectorstores import Chroma


In [31]:
# embedding_func = embedding_functions.SentenceTransformerEmbeddingFunction(
#     model_name=EMBEDDING_FUNC_NAME)
# vectorstore = Chroma(CHROMA_PATH, openai_ef)
vectorstore = Chroma(
    persist_directory=CHROMA_PATH, embedding_function=openai_ef)

In [32]:
vectorstore.similarity_search('model a')

[]

In [33]:
chat_model = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
qa_chain_with_rag = (
    {"context": retriever, "question": RunnablePassthrough()}  # 1) Integrate RAG
    | review_prompt_template                                   # 2) Put context retrieved RAG into question
    | chat_model                                               # 3) Feed in LLM
    | StrOutputParser()                                        # 4) Format the output
)

In [34]:
question = "What is Cockatoo AI?"
qa_chain_with_rag.invoke(question)

'Cockatoo AI is a repository on Github that contains code for a conversational AI system. The system is designed to interact with users in a natural language format, allowing for conversations and interactions similar to those with a human. The repository likely includes code for natural language processing, machine learning models, and other components necessary for building and deploying a conversational AI system.'

### <b><font color='darkgreen'>Integrate model A into LangChain</font></b>
<b><font size='3ptx'>Our final step is to turn voice into text before feeding into LangChain.</font></b>

Here we will leverage [**LCEL**](https://python.langchain.com/v0.1/docs/expression_language/) (Langchain Expression Language) to achieve the goal:

In [35]:
from langchain_core.runnables import RunnableLambda

stt_agent = SRWhisperWrapper(wrapper.LangEnum.en)

def parse_audio_file(data_dict):
    audio_file_path = data_dict['audio_file_path']
    context = data_dict.get('context', 'You are a software engineer and passinate with AI.')
    resp = stt_agent.audio_2_text(audio_file_path)
    return {'question': resp.text, 'context': context}

input_audio_parser = RunnableLambda(parse_audio_file)

In [36]:
prompt_str = "{context}\n{question}"
prompt = ChatPromptTemplate.from_template(prompt_str)

In [37]:
chain = (
    input_audio_parser                                           # 1) Turn speech into text
    | {"context": retriever, "question": RunnablePassthrough()}  # 2) Integrate RAG
    | prompt                                                     # 3) Put context retrieved RAG into question
    | chat_model                                                 # 4) Feed in LLM
    | StrOutputParser()                                          # 5) Format the output
)

In [38]:
resp = chain.invoke({'audio_file_path': '../voice_dataset/what_is_cock_ai.wav'})

In [39]:
resp

'Cockatoo AI is a software platform that utilizes artificial intelligence to provide personalized customer interactions and insights. It uses machine learning algorithms to analyze customer data and behavior, allowing businesses to better understand their customers and tailor their marketing strategies accordingly. As a software engineer passionate about AI, you may find Cockatoo AI to be an exciting tool to work with in developing innovative solutions for businesses.'

## <b><font color='darkblue'>Supplement</font></b>
* [Pinecone - LangChain Expression Language Explained](https://github.com/johnklee/ml_articles/blob/master/others/Langchain_expression_language/notebook.ipynb)
* [OpenAI - Introducing APIs for GPT-3.5 Turbo and Whisper](https://openai.com/index/introducing-chatgpt-and-whisper-apis/)
* [LangChain doc - Chains](https://python.langchain.com/v0.1/docs/modules/chains/)
* [LangChain doc - LangChain Expression Language (LCEL)](https://python.langchain.com/v0.1/docs/expression_language/)
> LangChain Expression Language, or LCEL, is a declarative way to easily compose chains together.
* [LangChain doc > Vectorstore: Chroma](https://python.langchain.com/v0.1/docs/integrations/vectorstores/chroma/)
* [RealPython: Playing and Recording Sound in Python](https://realpython.com/playing-and-recording-sound-python/#recording-audio)
* [Tutorial: Use Chroma and OpenAI to Build a Custom Q&A Bot](https://thenewstack.io/tutorial-use-chroma-and-openai-to-build-a-custom-qa-bot/)