## Clone Repo

In [None]:
!git clone -b vatsal/make_2_5x_faster https://github.com/metavoiceio/metavoice-src.git
%cd metavoice-src

Cloning into 'metavoice-src'...
remote: Enumerating objects: 589, done.[K
remote: Counting objects: 100% (323/323), done.[K
remote: Compressing objects: 100% (192/192), done.[K
remote: Total 589 (delta 220), reused 180 (delta 122), pack-reused 266 (from 1)[K
Receiving objects: 100% (589/589), 20.25 MiB | 14.38 MiB/s, done.
Resolving deltas: 100% (325/325), done.
/content/metavoice-src


## Install deps

upload config.ini in the metavoice-src folder in content

upload sample-audio.mp3 in content

Update requirements.txt

In [None]:
# torch>=2.1.0  # required to install audiocraft
# librosa
# tqdm
# tiktoken==0.5.1
# audiocraft
# numpy
# ninja
# fastapi
# uvicorn
# tyro
# deepfilternet
# pydub
# gradio
# huggingface_hub
# pinecone-client
# pinecone-text
# pinecone-notebooks
# langchain-community
# langchain-huggingface
# #faster-whisper
# langchain_groq
# groq

In [None]:
!pip install -r requirements.txt
!pip install --upgrade torch torchaudio
!pip install -e .

Collecting torch>=2.1.0 (from -r requirements.txt (line 1))
  Using cached torch-2.1.0-cp310-cp310-manylinux1_x86_64.whl.metadata (25 kB)
Collecting torchaudio<2.1.2,>=2.0.0 (from audiocraft->-r requirements.txt (line 5))
  Using cached torchaudio-2.1.1-cp310-cp310-manylinux1_x86_64.whl.metadata (6.4 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=2.1.0->-r requirements.txt (line 1))
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-nccl-cu12==2.18.1 (from torch>=2.1.0->-r requirements.txt (line 1))
  Using cached nvidia_nccl_cu12-2.18.1-py3-none-manylinux1_x86_64.whl.metadata (1.8 kB)
Collecting triton==2.1.0 (from torch>=2.1.0->-r requirements.txt (line 1))
  Using cached triton-2.1.0-0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.3 kB)
INFO: pip is looking at multiple versions of torchaudio to determine which version is compatible with other requirements. This could take a while.
Collecting 

In [None]:
!pip install huggingface_hub pinecone-client pinecone-text pinecone-notebooks langchain-community langchain-huggingface langchain_groq groq



## API


In [None]:
!pip install pyngrok

Collecting pyngrok
  Downloading pyngrok-7.2.0-py3-none-any.whl.metadata (7.4 kB)
Downloading pyngrok-7.2.0-py3-none-any.whl (22 kB)
Installing collected packages: pyngrok
Successfully installed pyngrok-7.2.0


In [None]:
## API
import configparser
import os, subprocess
import time
from pinecone import Pinecone,ServerlessSpec
from langchain_community.retrievers import PineconeHybridSearchRetriever
from pinecone_text.sparse import BM25Encoder
from langchain import LLMChain, PromptTemplate
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_groq import ChatGroq
from groq import Groq
from IPython.display import Audio, display
import warnings
warnings.filterwarnings("ignore")
from flask import Flask, request, jsonify, send_file, render_template
from pyngrok import ngrok

# # Loading Text-to-Speech model --Metavoice
# from fam.llm.fast_inference import TTS
# tts = TTS()

app = Flask(__name__)
config = configparser.ConfigParser()
config.read('config.ini')

# Get the API details
ngrok_auth_token = config['api']['ngrok_auth_token']

ngrok.set_auth_token(ngrok_auth_token)

# Function to connect to Pinecone DB
def connect_to_pinecone(api_key,index_name):
    # Initialize pinecone client
    pc =Pinecone(api_key=api_key)
    # # Listing the existing indices
    # pc.list_indexes().names()

    # create index if does not exist
    if index_name not in pc.list_indexes().names():
        pc.create_index(
            name=index_name,
            dimension=768, # dimension of dense vector
            metric='dotproduct',# dotproduct sparse matrix
            spec=ServerlessSpec(cloud='aws',region="us-east-1")
        )
    # storing the index name for further usage
    index = pc.Index(index_name)
    return index


# Function to retrieve context using Pinecone
def context_retriever(user_query,index, k=5):

    # Load the model for embeddings
    embeddings = HuggingFaceEmbeddings(model_name='all-mpnet-base-v2')

    # Load the model for sparse matrix encoding
    bm25_encoder =BM25Encoder().default()

    # Using Hybrid Search Reriever for context extraction
    retriever = PineconeHybridSearchRetriever(embeddings=embeddings,sparse_encoder=bm25_encoder,index=index, top_k= k)

    context = retriever.invoke(user_query)

    return context



# Function to process audio query
def process_audio(api_key, audio_file_path, model_name):
    # Set the API key in the environment variable (temporarily)
    os.environ['GROQ_API_KEY'] = api_key

    # Initialize the Groq client
    client = Groq()

    # Open the audio file and send it for transcription
    with open(audio_file_path, "rb") as file:
        transcription = client.audio.transcriptions.create(
            file=(audio_file_path, file.read()),
            model=model_name,
            # prompt="Specify context or spelling",  # Optional
            # response_format="json",  # Optional
            # language="en",  # Optional
            # temperature=0.0  # Optional
        )

    # # Clean up the environment variable (optional)
    # del os.environ['GROQ_API_KEY']

    # Return the transcribed text
    return transcription.text


# Function to generate LLM response
def generate_llm_response(context, question, api_key, model_name):
    # Use LLaMA 3 for response generation
    llm = ChatGroq(groq_api_key=api_key, model_name= model_name, temperature= 0.2)

    # Define the prompt template using PromptTemplate
    prompt_template = PromptTemplate(
        template="""
        You are an AI assistant of Growexx AI Solutions. Your job is to generate answers for the asked question on the basis of context provided.
        The generated response will further be provided to a voice cloning text-to-speech model to convert the text into your company's
        CEO Vikas Agarwal's voice response. The voice accent is Indian english.

        Ensure the response is clear, concise, and includes appropriate punctuation like
        commas and full stops for natural speech pauses.

        If the required information is not available in the context, respond with:
        "The information you requested is not available in the provided context."

        Context: {context}

        Question: {question}

        Provide the answer with proper punctuation for voice clarity:
        """,
        input_variables=["context", "question"]
    )
#  You need to generate responses in not more than 50 words.
    # Chain for RAG (Retrieval-Augmented Generation)
    rag_chain = LLMChain(
        llm=llm,
        prompt=prompt_template
    )

    if not context:
        # Handle case where context is empty
        formatted_context = "No context available."
    else:
        formatted_context = "\n".join(str(item) for item in context)

    response = rag_chain.run(context=formatted_context, question=question)
    return response


# Function to split input text into smaller chunks
def split_text(text, max_length=120):
    if len(text) <= max_length:
        return [text]

    sentences = text.split('.')
    result = []
    current_chunk = ''

    for sentence in sentences:
        if len(current_chunk) + len(sentence) + 1 <= max_length:
            if current_chunk:
                current_chunk += '.' + sentence.strip()
            else:
                current_chunk = sentence.strip()
        else:
            result.append(current_chunk+".")
            current_chunk = sentence.strip()

    if current_chunk:
        result.append(current_chunk+".")

    return result

def produce_output(user_query):
    # Read the config file
    config = configparser.ConfigParser()
    config.read('config.ini')

    # Get the API details
    api_config = config['api']
    pinecone_api_key = api_config['pinecone_api_key']
    groq_api_key = api_config['groq_api_key']

    # Required index
    index_name = "knowledge-base"
    llm_model = 'llama3-70b-8192'
    stt_model = 'whisper-large-v3'

    # Check if user_query is audio or text
    if user_query.endswith('.mp3'):
        text_query = process_audio(groq_api_key, user_query, stt_model)
    else:
        text_query = user_query

    # Connect to knowledge base
    index = connect_to_pinecone(pinecone_api_key, index_name)

    # Retrieve context
    content = context_retriever(text_query, index)
    context = []
    for i in content:
        context.append(i.page_content)

    # Generate response using LLM
    text_response = generate_llm_response(context, text_query, groq_api_key, llm_model)

    # Split the LLM response into chunks
    splitted_text = split_text(text_response)
    # print("Splitted text chunks:")
    # for chunk in splitted_text:
    #     print(chunk, "len of chunk", len(chunk))

    return text_response





In [None]:
print(produce_output("What is Machine learning"))

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


Machine Learning is a part or branch of Artificial Intelligence that trains machines to learn from experience. ML models can be fed labeled training data, known as supervised learning, or raw, unlabeled data, known as unsupervised learning. These models then copy the way humans learn, by trial or error, and after some time, the well-trained models will make accurate predictions.


In [None]:
import time

# Function to connect to Pinecone DB
def connect_to_pinecone(api_key, index_name):
    start_time = time.time()

    # Initialize pinecone client
    pc = Pinecone(api_key=api_key)

    # create index if it does not exist
    if index_name not in pc.list_indexes().names():
        pc.create_index(
            name=index_name,
            dimension=768,  # dimension of dense vector
            metric='dotproduct',  # dotproduct sparse matrix
            spec=ServerlessSpec(cloud='aws', region="us-east-1")
        )

    index = pc.Index(index_name)

    end_time = time.time()
    print(f"Time taken to connect to Pinecone: {end_time - start_time:.2f} seconds")

    return index


# Function to retrieve context using Pinecone
def context_retriever(user_query, index, k=5):
    start_time = time.time()

    # Load the model for embeddings
    embeddings = HuggingFaceEmbeddings(model_name='all-mpnet-base-v2')

    # Load the model for sparse matrix encoding
    bm25_encoder = BM25Encoder().default()

    # Using Hybrid Search Reriever for context extraction
    retriever = PineconeHybridSearchRetriever(embeddings=embeddings, sparse_encoder=bm25_encoder, index=index, top_k=k)
    context = retriever.invoke(user_query)

    end_time = time.time()
    print(f"Time taken to retrieve context: {end_time - start_time:.2f} seconds")

    return context


# Function to process audio query
def process_audio(api_key, audio_file_path, model_name):
    start_time = time.time()

    os.environ['GROQ_API_KEY'] = api_key
    client = Groq()

    with open(audio_file_path, "rb") as file:
        transcription = client.audio.transcriptions.create(
            file=(audio_file_path, file.read()),
            model=model_name,
        )

    end_time = time.time()
    print(f"Time taken to process audio: {end_time - start_time:.2f} seconds")

    return transcription.text


# Function to generate LLM response
def generate_llm_response(context, question, api_key, model_name):
    start_time = time.time()

    llm = ChatGroq(groq_api_key=api_key, model_name=model_name, temperature=0.2)
    prompt_template = PromptTemplate(
        template="""You are an AI assistant of Growexx AI Solutions. Your job is to generate answers for the asked question on the basis of context provided.
        The generated response will further be provided to a voice cloning text-to-speech model to convert the text into your company's
        CEO Vikas Agarwal's voice response. The voice accent is Indian english.

        Ensure the response is clear, concise, and includes appropriate punctuation like
        commas and full stops for natural speech pauses.

        If the required information is not available in the context, respond with:
        "The information you requested is not available in the provided context."

        Context: {context}

        Question: {question}

        Provide the answer with proper punctuation for voice clarity:
        """,
        input_variables=["context", "question"]
    )

    rag_chain = LLMChain(llm=llm, prompt=prompt_template)

    formatted_context = "\n".join(str(item) for item in context) if context else "No context available."
    response = rag_chain.run(context=formatted_context, question=question)

    end_time = time.time()
    print(f"Time taken to generate LLM response: {end_time - start_time:.2f} seconds")

    return response


# Function to split input text into smaller chunks
def split_text(text, max_length=120):
    start_time = time.time()

    if len(text) <= max_length:
        return [text]

    sentences = text.split('.')
    result = []
    current_chunk = ''

    for sentence in sentences:
        if len(current_chunk) + len(sentence) + 1 <= max_length:
            if current_chunk:
                current_chunk += '.' + sentence.strip()
            else:
                current_chunk = sentence.strip()
        else:
            result.append(current_chunk + ".")
            current_chunk = sentence.strip()

    if current_chunk:
        result.append(current_chunk + ".")

    end_time = time.time()
    print(f"Time taken to split text: {end_time - start_time:.2f} seconds")

    return result


def produce_output(user_query):
    start_time = time.time()

    config = configparser.ConfigParser()
    config.read('config.ini')

    api_config = config['api']
    pinecone_api_key = api_config['pinecone_api_key']
    groq_api_key = api_config['groq_api_key']

    index_name = "knowledge-base"
    llm_model = 'llama3-70b-8192'
    stt_model = 'whisper-large-v3'

    if user_query.endswith('.mp3'):
        text_query = process_audio(groq_api_key, user_query, stt_model)
    else:
        text_query = user_query

    index = connect_to_pinecone(pinecone_api_key, index_name)
    content = context_retriever(text_query, index)

    context = [i.page_content for i in content]
    text_response = generate_llm_response(context, text_query, groq_api_key, llm_model)
    splitted_text = split_text(text_response)

    end_time = time.time()
    print(f"Time taken to produce output: {end_time - start_time:.2f} seconds")

    return text_response


In [None]:
print(produce_output("Who is Founder of Growexx?"))

Time taken to connect to Pinecone: 0.88 seconds
Time taken to retrieve context: 10.75 seconds
Time taken to generate LLM response: 0.57 seconds
Time taken to produce output: 12.31 seconds
The Founder of GrowExx is Vikas Agarwal.


In [None]:
print(produce_output("Tell me about Vikas Agarwal?"))

Time taken to connect to Pinecone: 0.72 seconds
Time taken to retrieve context: 10.39 seconds
Time taken to generate LLM response: 0.68 seconds
Time taken to split text: 0.00 seconds
Time taken to produce output: 11.89 seconds
Vikas Agarwal is the Founder of GrowExx, a Digital Product Development Company specializing in Product Engineering, Data Engineering, Business Intelligence, Web and Mobile Applications. His expertise lies in Technology Innovation, Product Management, Building, and nurturing strong and self-managed high-performing Agile teams.


In [None]:
print(produce_output("What is Pinecone?"))

Time taken to connect to Pinecone: 0.78 seconds
Time taken to retrieve context: 9.70 seconds
Time taken to generate LLM response: 1.12 seconds
Time taken to split text: 0.00 seconds
Time taken to produce output: 11.77 seconds
Pinecone is a completely controlled vector database that can transform your AI and ML projects. With our expertise, you can integrate Pinecone with ease, provide real-time data updates, and offer reasonably priced solutions.


In [None]:
print(produce_output("What is Machine learning?"))

Time taken to connect to Pinecone: 0.66 seconds
Time taken to retrieve context: 11.05 seconds
Time taken to generate LLM response: 6.77 seconds
Time taken to produce output: 18.60 seconds
Machine Learning is a part or branch of Artificial Intelligence that trains machines to learn from experience.


In [None]:
print(produce_output("What is Product Roadmap for a Digital Platform for Algo Trading Software?"))

Time taken to connect to Pinecone: 0.74 seconds
Time taken to retrieve context: 11.41 seconds
Time taken to generate LLM response: 0.84 seconds
Time taken to split text: 0.00 seconds
Time taken to produce output: 13.10 seconds
The Product Roadmap for a Digital Platform for Algo Trading Software is a solution that integrates all products and services on one platform, managing access to paid and free subscribers. This roadmap was created through a product discovery workshop, where we held an ice-breaking conversation with the founders to understand the domain, defined and analyzed user personas and competition, and asked tough questions to understand the complete picture. The primary goals of this exercise were to clearly define the product, chart out a roadmap towards building the product in less time, and prepare a scale-up roadmap.


In [None]:
## API
import configparser
import os, subprocess
import time
from pinecone import Pinecone,ServerlessSpec
from langchain_community.retrievers import PineconeHybridSearchRetriever
from pinecone_text.sparse import BM25Encoder
from langchain import LLMChain, PromptTemplate
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_groq import ChatGroq
from groq import Groq
from IPython.display import Audio, display
import warnings
warnings.filterwarnings("ignore")
from flask import Flask, request, jsonify, send_file, render_template
from pyngrok import ngrok
import time


In [None]:
config = configparser.ConfigParser()
config.read('config.ini')

api_config = config['api']
pinecone_api_key = api_config['pinecone_api_key']
groq_api_key = api_config['groq_api_key']
ngrok_auth_token = api_config['ngrok_auth_token']
ngrok.set_auth_token(ngrok_auth_token)

index_name = "knowledge-base"
llm_model = 'llama3-70b-8192'
stt_model = 'whisper-large-v3'

# Initialize pinecone client
pc = Pinecone(api_key=pinecone_api_key)

    # create index if it does not exist
if index_name not in pc.list_indexes().names():
  pc.create_index(
      name=index_name,
      dimension=768,  # dimension of dense vector
      metric='dotproduct',  # dotproduct sparse matrix
      spec=ServerlessSpec(cloud='aws', region="us-east-1")
      )
index = pc.Index(index_name)

# Function to retrieve context using Pinecone


In [None]:
embeddings = HuggingFaceEmbeddings(model_name='all-mpnet-base-v2')

    # Load the model for sparse matrix encoding
bm25_encoder = BM25Encoder().default()
retriever = PineconeHybridSearchRetriever(embeddings=embeddings, sparse_encoder=bm25_encoder, index=index, top_k=5)

In [None]:
def context_retriever(user_query):
    start_time = time.time()

    # Using Hybrid Search Reriever for context extraction

    context = retriever.invoke(user_query)

    end_time = time.time()
    print(f"Time taken to retrieve context: {end_time - start_time:.2f} seconds")

    return context


# Function to process audio query
def process_audio(api_key, audio_file_path, model_name):
    start_time = time.time()

    os.environ['GROQ_API_KEY'] = api_key
    client = Groq()

    with open(audio_file_path, "rb") as file:
        transcription = client.audio.transcriptions.create(
            file=(audio_file_path, file.read()),
            model=model_name,
        )

    end_time = time.time()
    print(f"Time taken to process audio: {end_time - start_time:.2f} seconds")

    return transcription.text


# Function to generate LLM response
def generate_llm_response(context, question, api_key, model_name):
    start_time = time.time()

    llm = ChatGroq(groq_api_key=api_key, model_name=model_name, temperature=0.2)
    prompt_template = PromptTemplate(
        template="""You are an AI assistant of Growexx AI Solutions. Your job is to generate answers for the asked question on the basis of context provided.
        The generated response will further be provided to a voice cloning text-to-speech model to convert the text into your company's
        CEO Vikas Agarwal's voice response. The voice accent is Indian english.

        Ensure the response is clear, concise, and includes appropriate punctuation like
        commas and full stops for natural speech pauses.

        If the required information is not available in the context, respond with:
        "The information you requested is not available in the provided context."

        Context: {context}

        Question: {question}

        Provide the answer with proper punctuation for voice clarity:
        """,
        input_variables=["context", "question"]
    )

    rag_chain = LLMChain(llm=llm, prompt=prompt_template)

    formatted_context = "\n".join(str(item) for item in context) if context else "No context available."
    response = rag_chain.run(context=formatted_context, question=question)

    end_time = time.time()
    print(f"Time taken to generate LLM response: {end_time - start_time:.2f} seconds")

    return response


# Function to split input text into smaller chunks
def split_text(text, max_length=120):
    start_time = time.time()

    if len(text) <= max_length:
        return [text]

    sentences = text.split('.')
    result = []
    current_chunk = ''

    for sentence in sentences:
        if len(current_chunk) + len(sentence) + 1 <= max_length:
            if current_chunk:
                current_chunk += '.' + sentence.strip()
            else:
                current_chunk = sentence.strip()
        else:
            result.append(current_chunk + ".")
            current_chunk = sentence.strip()

    if current_chunk:
        result.append(current_chunk + ".")

    end_time = time.time()
    print(f"Time taken to split text: {end_time - start_time:.2f} seconds")

    return result


def produce_output(user_query):
    start_time = time.time()


    if user_query.endswith('.mp3'):
        text_query = process_audio(groq_api_key, user_query, stt_model)
    else:
        text_query = user_query


    content = context_retriever(text_query)

    context = [i.page_content for i in content]
    text_response = generate_llm_response(context, text_query, groq_api_key, llm_model)
    splitted_text = split_text(text_response)

    end_time = time.time()
    print(f"Time taken to produce output: {end_time - start_time:.2f} seconds")

    return text_response


In [None]:
print(produce_output("Who is Founder of Growexx?"))

Time taken to retrieve context: 1.17 seconds
Time taken to generate LLM response: 0.91 seconds
Time taken to produce output: 2.08 seconds
The Founder of GrowExx is Vikas Agarwal.


In [None]:
print(produce_output("Tell me about Vikas Agarwal?"))

Time taken to retrieve context: 0.55 seconds
Time taken to generate LLM response: 1.10 seconds
Time taken to split text: 0.00 seconds
Time taken to produce output: 1.65 seconds
Vikas Agarwal is the Founder of GrowExx, a Digital Product Development Company specializing in Product Engineering, Data Engineering, Business Intelligence, Web and Mobile Applications. His expertise lies in Technology Innovation, Product Management, Building, and nurturing strong and self-managed high-performing Agile teams.


In [None]:
print(produce_output("What is Pinecone?"))

Time taken to retrieve context: 0.57 seconds
Time taken to generate LLM response: 0.57 seconds
Time taken to produce output: 1.16 seconds
Pinecone is a completely controlled vector database that can transform your AI and ML projects.


In [None]:
print(produce_output("What is Machine learning?"))

Time taken to retrieve context: 0.69 seconds
Time taken to generate LLM response: 1.00 seconds
Time taken to split text: 0.00 seconds
Time taken to produce output: 1.69 seconds
Machine Learning is a part of Artificial Intelligence that trains machines to learn from experience. It's a branch of AI that enables machines to learn, just like humans do, through trial and error. Machine Learning models can be fed labeled training data, known as supervised learning, or raw, unlabeled data, known as unsupervised learning. Over time, these well-trained models make accurate predictions, and they're widely used in applications that involve forecasting, segmentation, clustering, and recommendations.


In [None]:
print(produce_output("What is Product Roadmap for a Digital Platform for Algo Trading Software?"))

Time taken to retrieve context: 0.61 seconds
Time taken to generate LLM response: 3.81 seconds
Time taken to split text: 0.00 seconds
Time taken to produce output: 4.42 seconds
The Product Roadmap for a Digital Platform for Algo Trading Software is a solution that integrates all products and services on one platform, managing access to paid and free subscribers. It involves a product discovery workshop, where we define the product, chart out a roadmap towards building the product in less time, and prepare a scale-up roadmap.
