## Importing necessary libraries

In [1]:
!pip install groq langchain langchain-core langchain-groq chromadb pypdf gradio sentence-transformers langchain_community


Collecting groq
  Downloading groq-0.11.0-py3-none-any.whl.metadata (13 kB)
Collecting langchain
  Downloading langchain-0.2.16-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-core
  Downloading langchain_core-0.2.39-py3-none-any.whl.metadata (6.2 kB)
Collecting langchain-groq
  Downloading langchain_groq-0.1.9-py3-none-any.whl.metadata (2.9 kB)
Collecting chromadb
  Downloading chromadb-0.5.5-py3-none-any.whl.metadata (6.8 kB)
Collecting pypdf
  Downloading pypdf-4.3.1-py3-none-any.whl.metadata (7.4 kB)
Collecting gradio
  Downloading gradio-4.44.0-py3-none-any.whl.metadata (15 kB)
Collecting sentence-transformers
  Downloading sentence_transformers-3.0.1-py3-none-any.whl.metadata (10 kB)
Collecting langchain_community
  Downloading langchain_community-0.2.16-py3-none-any.whl.metadata (2.7 kB)
Collecting httpx<1,>=0.23.0 (from groq)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-text-splitters<0.3.0,>=0.2.0 (from langchain)
  Downloading lan

## This code integrates the Langchain framework with the Groq provider to create a chatbot capable of handling user queries related to the provided pdf,csv or text file accurately. Users can interact with the chatbot via a Gradio interface, where they can type questions about the provided data, and the chatbot will generate responses based on the context and user input.

In [2]:
from langchain_groq import ChatGroq
from langchain_community.document_loaders import TextLoader, PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
# from langchain_community import embeddings
from langchain_core.prompts import ChatPromptTemplate
# from langchain_core.runnables import RunnablePassthrough
# from langchain_core.output_parsers import StrOutputParser
from google.colab import userdata
import os
import time
import textwrap
import gradio as gr
from langchain_community.document_loaders.csv_loader import CSVLoader

In [3]:
class TextFileLoader:
    def __init__(self, file_path):
        self.file_path = file_path

    def load(self):
        with open(self.file_path, 'r') as file:
            text_data = file.read()
        return text_data

In [4]:
file_type = input("Enter file type (PDF, CSV, or TXT): ").upper()

if file_type == "PDF":
    loader = PyPDFLoader("/content/Chatbotpdf-converted (1).pdf")

elif file_type == "CSV":
    loader = CSVLoader("/content/TSLA (2).csv")

elif file_type == "TXT":
    loader = TextFileLoader("/content/chatbot.txt")

else:
    print("Invalid file type. Please provide either 'PDF', 'CSV', or 'TXT'.")
    exit()

text = loader.load()


# loader = PyPDFLoader("/content/sodapdf-converted.pdf")
# text = loader.load()

# # loader = CSVLoader("/content/TSLA (2).csv")
# # text = loader.load()

Enter file type (PDF, CSV, or TXT): pdf


In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(text)

In [6]:
from transformers import AutoModel
from langchain.embeddings import HuggingFaceEmbeddings

model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cpu"}

embeddings_hf = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

  embeddings_hf = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]



1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

* **Sentence-transformers/all-mpnet-base-v2:** This model, part of the Sentence Transformers library, provides dense vector representations for sentences using the MPNet architecture in its base version 2.

* **Groq:** Groq is a hardware accelerator designed for accelerating machine learning workloads, known for its high efficiency and performance.

* **Chroma database:** Chroma database is a scalable, distributed database optimized for time-series data storage and retrieval, often used in applications such as monitoring, analytics, and IoT.

In [7]:
vectorstore = Chroma.from_documents(
    documents = chunks,
    collection_name= "groq_embeds",
    embedding = embeddings_hf,
)

retriever = vectorstore.as_retriever()

In [8]:
from google.colab import userdata
import os
from langchain_groq import ChatGroq

os.environ["GROQ_API_KEY"] = "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"
llm = ChatGroq(temperature=0, model_name = "mixtral-8x7b-32768" )

# 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


* **VectorStore:** Where dimensions converge to unlock the secrets of data with unparalleled efficiency.

* **Rag_template (Retrieval-Augmented Generation):** Crafting narratives with precision, Rag_template illuminates the path to coherent and contextually rich text generation.

* **Gradio:** Empowering developers to create interactive machine learning applications with simplicity and speed.

In [9]:
from langchain.chains import RetrievalQA

rag_template = """Answer the question based only on the following context:
{context}
Question: {question}
"""

rag_prompt = ChatPromptTemplate.from_template(rag_template)
qa_chain = RetrievalQA.from_chain_type(
    llm, retriever=vectorstore.as_retriever(), chain_type_kwargs={"prompt": rag_prompt},
)

In [10]:
response = qa_chain.invoke("What is the market cap of Infosys?")

In [11]:
print(response)

{'query': 'What is the market cap of Infosys?', 'result': 'The market capitalization of Infosys stock is Rs 5,87,285 Cr.'}


In [12]:
response['result']

'The market capitalization of Infosys stock is Rs 5,87,285 Cr.'

In [13]:
def process_question(user_question):
  response = qa_chain.invoke(user_question)
  full_response = response['result']
  return full_response

In [14]:
interface = gr.Interface(fn= process_question,
                         inputs= gr.Textbox(lines=2, placeholder="Type your question here"),
                         outputs= gr.Textbox(),
                         title= "Chatbot for Stock Market Assistance",
                         description="Ask any question about your documents")

In [15]:
interface.launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://73d9df0f97f971188c.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




* **Here are some model names that would also work for Groq,**

1. "meta-llama/Llama-2-7b"
2. "gpt-xl"
3. "gemini-2-7b"
4. "gpt3-10-turbo"

