<a href="https://colab.research.google.com/github/durgaprasad-2103/Audio-Chatbot--A-Generative-AI-Project/blob/main/Audio_Chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Audio-Chatbot--A-Generative-AI-Project
Audio Chatbot is an innovative Generative AI project designed for dynamic spoken conversations. Leveraging GPT-3.5 architecture, it provides a cutting-edge natural language processing experience, generating human-like responses in real-time. The project seamlessly integrates audio input and output, creating a unique conversational interface.

here !pip install command installs essential Python packages, including openai, langchain, tiktoken, pypdf, unstructured[local-inference], gradio, and chromadb, facilitating the setup

In [1]:
%%capture
!pip install openai langchain  tiktoken pypdf unstructured[local-inference] gradio chromadb


In these, the script imports necessary modules from the langchain library, initializing components such as CharacterTextSplitter, Pinecone, Chroma, OpenAIEmbeddings, ConversationalRetrievalChain, and a chat model named ChatOpenAI to enable text processing, vector storage, and conversational retrieval.

In [2]:
import os
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Pinecone, Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI

here the openai api key is used

In [3]:

os.environ['OPENAI_API_KEY'] ="use openai api key "

The code employs the DirectoryLoader from langchain to dynamically load PDF documents located in the specified directory . This facilitates seamless integration and processing of multiple PDFs, contributing to the system's adaptability in handling diverse document sources for subsequent analysis. The glob parameter is utilized for a recursive search, enabling efficient retrieval of all PDF files within the specified directory. This modular approach enhances the system's versatility, accommodating various document formats and supporting a flexible document processing pipeline.

In [4]:
from langchain.document_loaders import DirectoryLoader

pdf_loader = DirectoryLoader('/content/drive/MyDrive/team1 gen ai/', glob="**/*.pdf")
#readme_loader = DirectoryLoader('/content/Documents/', glob="**/*.md")
#txt_loader = DirectoryLoader('/content/Documents/', glob="**/*.txt")|

The code consolidates document loading by iterating through a list of loaders, including a pdf_loader instance. The load() method is called for each loader, extending the documents list with the loaded content. This modular approach streamlines document processing, promoting modularity and content aggregation from diverse sources in a concise manner.

In [5]:
#take all the loader
loaders = [pdf_loader]

#lets create document
documents = []
for loader in loaders:
    documents.extend(loader.load())

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


The code prints the total number of documents in the dataset and the character count in the first document's content in concise one-liners.

In [6]:
print (f'You have {len(documents)} document(s) in your data')
print (f'There are {len(documents[0].page_content)} characters in your document')

You have 2 document(s) in your data
There are 33002 characters in your document


In [7]:
documents[0]

Document(page_content='Attention Is All You Need\n\nAshish Vaswani∗ Google Brain avaswani@google.com\n\nNoam Shazeer∗ Google Brain noam@google.com\n\nNiki Parmar∗ Google Research nikip@google.com\n\nJakob Uszkoreit∗ Google Research usz@google.com\n\nLlion Jones∗ Google Research llion@google.com\n\nAidan N. Gomez∗ † University of Toronto aidan@cs.toronto.edu\n\nŁukasz Kaiser∗ Google Brain lukaszkaiser@google.com\n\nIllia Polosukhin∗ ‡ illia.polosukhin@gmail.com\n\nAbstract\n\nThe dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and

The code initializes a CharacterTextSplitter with specified parameters, such as a chunk size of 1000 characters and a chunk overlap of 40. It then applies this text splitter to divide the documents into smaller chunks, enhancing processing efficiency, and prints the total number of resulting chunks.

In [8]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=40) #chunk overlap seems to work better
documents = text_splitter.split_documents(documents)
print(len(documents))

49


In [9]:
documents[0]

Document(page_content='Attention Is All You Need\n\nAshish Vaswani∗ Google Brain avaswani@google.com\n\nNoam Shazeer∗ Google Brain noam@google.com\n\nNiki Parmar∗ Google Research nikip@google.com\n\nJakob Uszkoreit∗ Google Research usz@google.com\n\nLlion Jones∗ Google Research llion@google.com\n\nAidan N. Gomez∗ † University of Toronto aidan@cs.toronto.edu\n\nŁukasz Kaiser∗ Google Brain lukaszkaiser@google.com\n\nIllia Polosukhin∗ ‡ illia.polosukhin@gmail.com\n\nAbstract', metadata={'source': '/content/drive/MyDrive/team1 gen ai/attention.pdf'})

In [10]:
documents[1]

Document(page_content='Abstract\n\nThe dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring signiﬁcantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English- to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the l

The code initializes an instance of OpenAIEmbeddings, indicating the use of OpenAI's language embeddings. This is employed to convert textual data into numerical representations, capturing semantic meaning and enabling advanced natural language processing tasks such as semantic search and question-answering within the project.

In [11]:
embeddings = OpenAIEmbeddings()

  warn_deprecated(


here we have tried using both Chromadb and pinecone

In [12]:
from langchain.vectorstores import Chroma

In [13]:
vectorstore = Chroma.from_documents(documents, embeddings)

here required library of pinecone is installed

In [14]:
%%capture
!pip install pinecone-client

In [15]:
!pip install -q --upgrade pinecone-client==2.2.4
import pinecone

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.4/179.4 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.5/62.5 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m305.4/305.4 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [16]:
%%capture
pip install --upgrade pinecone-client


The code initializes Pinecone, a vector search service, with a specified API key and environment. It then creates a vector store named "langchain-demo" by converting document data into vectors using the provided embeddings and indexes them for efficient and scalable similarity searches.

In [None]:
import pinecone

# initialize pinecone
pinecone.init(
    api_key="use pine cone api key",  # find at app.pinecone.io
    environment= "gcp-starter"  # next to api key in console
)

index_name = "langchain-demo"

vectorstore = Pinecone.from_documents(documents, embeddings, index_name=index_name)

In [17]:
# if you already have an index, you can load it like this
import pinecone
from tqdm.autonotebook import tqdm

# initialize pinecone
pinecone.init(
    api_key="7ca185da-22a9-49a2-a570-1cae198e9741",  # find at app.pinecone.io
    environment= "gcp-starter" # next to api key in console
)

index_name = "langchain-demo"
vectorstore = Pinecone.from_existing_index(index_name, embeddings)

The code formulates a query,and conducts a similarity search using the pre-existing vector store. The result is stored in docs, and the total number of matching documents is printed. The content of the top two matching documents is then displayed, offering insights into potential answers or relevant information related to the query.

In [18]:
query = "Who are the authors of gpt4all paper ?"
docs = vectorstore.similarity_search(query)

In [19]:
len(docs)

4

In [20]:
print(docs[0].page_content)

4 Use Considerations

The authors release data and training details in hopes that it will accelerate open LLM research, particularly in the domains of alignment and inter- pretability. GPT4All model weights and data are intended and licensed only for research purposes and any commercial use is prohibited. GPT4All is based on LLaMA, which has a non-commercial license. The assistant data is gathered from Ope- nAI’s GPT-3.5-Turbo, whose terms of use pro-

hibit developing models that compete commer- cially with OpenAI.

References

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models.

Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. 2023. Stan- ford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/ stanford_alpaca.


In [21]:
print(docs[1].page_content)

GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo

Yuvanesh Anand yuvanesh@nomic.ai

Zach Nussbaum zanussbaum@gmail.com

Brandon Duderstadt brandon@nomic.ai

Benjamin Schmidt ben@nomic.ai

Andriy Mulyar andriy@nomic.ai

Abstract

This preliminary technical report describes the development of GPT4All, a chatbot trained over a massive curated corpus of assistant in- teractions including word problems, story de- scriptions, multi-turn dialogue, and code. We openly release the collected data, data cura- tion procedure, training code, and final model weights to promote open research and repro- ducibility. Additionally, we release quantized 4-bit versions of the model allowing virtually anyone to run the model on CPU.

1 Data Collection and Curation



The code establishes an OpenAI retriever from the existing vector store for similarity searches with a specified search type and parameters. It then utilizes a Conversational Retrieval Chain, incorporating OpenAI's model, to handle questions and maintain a chat history. A new query is processed, and the resulting answer is extracted, updating the chat history with the question and answer pair for ongoing conversational context.

In [22]:
from langchain.llms import OpenAI

In [23]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k":2})
qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), retriever)

  warn_deprecated(


In [24]:
chat_history = []
query = "How much is spent for training the model?"
result = qa({"question": query, "chat_history": chat_history})
result["answer"]

  warn_deprecated(


' $800'

In [25]:
chat_history.append((query, result["answer"]))
chat_history

[('How much is spent for training the model?', ' $800')]

In [26]:
query = "What is this number multiplied by 10?"
result = qa({"question": query, "chat_history": chat_history})
result["answer"]

'\nIf this number was multiplied by 10, the total cost for training the model would be $1000.'

The code sets up a simple chat interface within a Colab environment using IPython widgets. Users can interact with the local chatbot by entering questions in the provided text box
it is local to colab just trying out

In [28]:
from IPython.display import display
import ipywidgets as widgets

In [29]:
chat_history = []

def on_submit(_):
    query = input_box.value
    input_box.value = ""

    if query.lower() == 'exit':
        print("Thanks for the chat!")
        return

    result = qa({"question": query, "chat_history": chat_history})
    chat_history.append((query, result['answer']))

    display(widgets.HTML(f'User: {query}'))
    display(widgets.HTML(f'Chatbot: {result["answer"]}'))

print("Chat with your data. Type 'exit' to stop")

input_box = widgets.Text(placeholder='Please enter your question:')
input_box.on_submit(on_submit)

display(input_box)

Chat with your data. Type 'exit' to stop


Text(value='', placeholder='Please enter your question:')

HTML(value='User: User: who are the author')

HTML(value='Chatbot:  The authors are Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-An…

HTML(value='User: what is gpt?')

HTML(value='Chatbot: \nGPT stands for Generative Pre-trained Transformer, which is a type of large language mo…

THE MAIN CHATBOT


The code sets up a Gradio interface with a Chatbot block, a Textbox for user input, and a Clear button. The respond function processes user messages, updates the chat history, and utilizes the Conversational Retrieval Chain for generating responses. The response is converted to speech using gTTS, and the audio is played. The interactive interface enables users to chat with the local chatbot, clearing the conversation if desired. This serves as a practical demonstration of the chatbot's capabilities within the Gradio framework, enhancing user interaction and accessibility.

In [30]:
%%capture
!pip install SpeechRecognition
!pip install gradio --upgrade
!pip install pyttsx3
!sudo apt-get install espeak
!pip install gtts
!pip install SpeechRecognition gradio gtts

In [31]:
import gradio as gr
from gtts import gTTS
from IPython.display import Audio, display

def text_to_speech(text):
    tts = gTTS(text=text, lang='en')
    tts.save('output.mp3')
    display(Audio(filename='output.mp3', autoplay=True))

def respond(user_message, chat_history):
    print(user_message)
    print(chat_history)
    if chat_history:
        chat_history = [tuple(sublist) for sublist in chat_history]
        print(chat_history)

    # Get response from QA chain
    response = qa({"question": user_message, "chat_history": chat_history})

    # Append user message and response to chat history
    chat_history.append((user_message, response["answer"]))
    print(chat_history)

    # Convert the response to speech and play it
    text_to_speech(response["answer"])

    return "", chat_history

with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    msg = gr.Textbox()
    clear = gr.Button("Clear")

    msg.submit(respond, [msg, chatbot], [msg, chatbot], queue=False)
    clear.click(lambda: None, None, chatbot, queue=False)

demo.launch(debug=True, share=True)


Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://7ae0e8c0d59ed12e5b.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


what is the context of this ? 
[]
[('what is the context of this ? ', ' This is a piece of context from a paper presented at the 31st Conference on Neural Information Processing Systems (NIPS 2017) in Long Beach, CA, USA. The paper discusses the limitations of recurrent models in terms of parallelization and recent efforts to improve computational efficiency.')]


what is the use of this 
[['what is the context of this ? ', 'This is a piece of context from a paper presented at the 31st Conference on Neural Information Processing Systems (NIPS 2017) in Long Beach, CA, USA. The paper discusses the limitations of recurrent models in terms of parallelization and recent efforts to improve computational efficiency.']]
[('what is the context of this ? ', 'This is a piece of context from a paper presented at the 31st Conference on Neural Information Processing Systems (NIPS 2017) in Long Beach, CA, USA. The paper discusses the limitations of recurrent models in terms of parallelization and recent efforts to improve computational efficiency.')]
[('what is the context of this ? ', 'This is a piece of context from a paper presented at the 31st Conference on Neural Information Processing Systems (NIPS 2017) in Long Beach, CA, USA. The paper discusses the limitations of recurrent models in terms of parallelization and recent efforts to improve computational 

what is the cost of the model 
[['what is the context of this ? ', 'This is a piece of context from a paper presented at the 31st Conference on Neural Information Processing Systems (NIPS 2017) in Long Beach, CA, USA. The paper discusses the limitations of recurrent models in terms of parallelization and recent efforts to improve computational efficiency.'], ['what is the use of this ', 'The paper presented at the 31st Conference on Neural Information Processing Systems (NIPS 2017) discusses the limitations of sequential computation in recurrent models and explores methods for improving computational efficiency and model performance.']]
[('what is the context of this ? ', 'This is a piece of context from a paper presented at the 31st Conference on Neural Information Processing Systems (NIPS 2017) in Long Beach, CA, USA. The paper discusses the limitations of recurrent models in terms of parallelization and recent efforts to improve computational efficiency.'), ('what is the use of this

gpt3 /
[['what is the context of this ? ', 'This is a piece of context from a paper presented at the 31st Conference on Neural Information Processing Systems (NIPS 2017) in Long Beach, CA, USA. The paper discusses the limitations of recurrent models in terms of parallelization and recent efforts to improve computational efficiency.'], ['what is the use of this ', 'The paper presented at the 31st Conference on Neural Information Processing Systems (NIPS 2017) discusses the limitations of sequential computation in recurrent models and explores methods for improving computational efficiency and model performance.'], ['what is the cost of the model ', 'The cost of the model is not mentioned in the given context.']]
[('what is the context of this ? ', 'This is a piece of context from a paper presented at the 31st Conference on Neural Information Processing Systems (NIPS 2017) in Long Beach, CA, USA. The paper discusses the limitations of recurrent models in terms of parallelization and rece

Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://7ae0e8c0d59ed12e5b.gradio.live


