**Installing Required Libraries**

These libraries are used for:

- streamlit: Building the web application
- pyngrok: Creating a public URL for the application
- langchain: Natural language processing and document retrieval
- huggingface_hub: Accessing pre-trained language models
- faiss-cpu: Efficient similarity search and clustering
- PyPDF2: Reading PDF files
- python-docx: Reading Word documents
- langchain_community: Community-driven extensions for LangChain


In [10]:
!pip install streamlit
!pip install pyngrok
!pip install langchain
!pip install huggingface_hub
!pip install faiss-cpu
!pip install PyPDF2
!pip install python-docx




In [2]:
pip install langchain_community

Collecting langchain_community
  Downloading langchain_community-0.3.24-py3-none-any.whl.metadata (2.5 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain_community)
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB

**Writing the Streamlit App and Building the Chatbot**


In [9]:
%%writefile app.py
import streamlit as st
import pandas as pd
import PyPDF2
from transformers import T5ForConditionalGeneration, T5Tokenizer

def read_file(file):
    if file.type == "text/csv":
        df = pd.read_csv(file)
        text = ""
        for column in df.columns:
            text += " ".join(df[column].astype(str).tolist()) + " "
        return text
    elif file.type == "text/plain":
        return file.read().decode("utf-8")
    elif file.type == "application/pdf":
        pdf_reader = PyPDF2.PdfReader(file)
        text = ""
        for page in pdf_reader.pages:
            text += page.extract_text()
        return text
    else:
        return ""

st.title("Document-based Chatbot")

uploaded_file = st.file_uploader("Upload a file", type=["csv", "txt", "pdf"])
if uploaded_file is not None:
    text = read_file(uploaded_file)

    model = T5ForConditionalGeneration.from_pretrained('t5-base')
    tokenizer = T5Tokenizer.from_pretrained('t5-base')

    question = st.text_input("Ask a question")
    if st.button("Submit"):
        input_text = f"question: {question} context: {text}"
        inputs = tokenizer(input_text, return_tensors="pt")
        outputs = model.generate(**inputs, max_length=500)
        answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
        st.write(answer)

Overwriting app.py


**Setting up Ngrok**

In [4]:
!ngrok authtoken 2v9rpOWmlmm6L8fgeINDkyhHEkt_44vEEJgRAFhL4E5Stoxdw

Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml


In [5]:
from pyngrok import ngrok

In [6]:
!streamlit run app.py &>/dev/null&

In [7]:
import time
time.sleep(5)

In [8]:
publ_url = ngrok.connect(addr ='8501')
print(publ_url)

NgrokTunnel: "https://6bb4-34-127-109-69.ngrok-free.app" -> "http://localhost:8501"
