## Installing Libraries

In [93]:
!pip install wikipedia-api chromadb langchain  langchain_community transformers torch accelerate streamlit

!npm install -g localtunnel

[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K
changed 22 packages in 806ms
[1G[0K⠦[1G[0K
[1G[0K⠦[1G[0K3 packages are looking for funding
[1G[0K⠦[1G[0K  run `npm fund` for details
[1G[0K⠦[1G[0K

## Importing Libraries

In [87]:
import wikipediaapi
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
import streamlit as st

## Input

In [88]:
email = 'lokeshkumardas2004@example.com'
topic = "idli"
query = 'famous idli in india?'

## Getting Info

In [None]:
def get_wikipedia_summary_to_file(place_name):
    # Define a custom User-Agent
    custom_user_agent = f'SceneTravelGuideApp/1.0 ({email})'

    # Initialize Wikipedia API with User-Agent
    wiki_wiki = wikipediaapi.Wikipedia(
        language='en',
        user_agent=custom_user_agent
    )

    # Fetch the page
    page = wiki_wiki.page(place_name)

    if not page.exists():
        return "Sorry, the requested page doesn't exist on Wikipedia."

    # Extract the complete content
    content = page.text

    # # Write content to a .txt file
    # with open(file_name, 'w', encoding='utf-8') as file:
    #     file.write(content)

    # return f"Content successfully saved to {file_name}."
    return content

# file_name = f"{place_name}_info.txt"
text = get_wikipedia_summary_to_file(topic)
# print(result_message)


## Creating Vector chunks from the Info collected

In [89]:

# Create a Document object from the text variable
document = Document(page_content=text)

# Initialize the text splitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)

# Split the document into chunks
text_chunks = text_splitter.split_documents([document])  # Note the list around the document

# Access the first chunk's content
print(text_chunks[0].page_content)


Idli or idly (; plural: idlis) or iddali or iddena is a type of savoury rice cake, originating from South India, popular as a breakfast food in Southern India and in Sri Lanka. The cakes are made by steaming a batter consisting of fermented de-husked black lentils and rice. The fermentation process


## Initializing LLM Model

In [90]:
# Load the DistilBART model
model_name = 'sshleifer/distilbart-cnn-12-6'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Create the pipeline
summarizer = pipeline(
    'text2text-generation',
    model=model,
    tokenizer=tokenizer,
    max_length=150,  # Adjust as needed
    min_length=80,
    num_beams=4,
    temperature=0.7
)

## Retrieve chunk based on the query

In [91]:
# creating database
embeddings = HuggingFaceEmbeddings()

import shutil
import os

# Remove the existing database directory
if os.path.exists('db'):
    shutil.rmtree('db')

# Recreate the Chroma database
db = Chroma.from_documents(
    documents=text_chunks,
    embedding=embeddings
)


  embeddings = HuggingFaceEmbeddings()


## Using LLM to generate appropriate output

In [None]:
# Query-based summarization
def summarize_with_query(text, query):
    input_text = f"Given the Context based on the query give a discriptive answer of query'\n Query: '{query}'\n\nContext: {text}"
    summary = summarizer(input_text)
    return summary[0]['generated_text']

In [None]:
# Retrieve relevant documents based on the query
retriever = db.as_retriever(search_kwargs = {'k':5})
relevant_docs = retriever.get_relevant_documents(query)

combined_text = " "
for i in relevant_docs:
    combined_text += i.page_content + " "


print(combined_text)

 The earliest extant Tamil work to mention idli (as itali) is Maccapuranam, dated to the 17th century. In 2015, Chennai-based Idli caterer Eniyavan started celebrating March 30 as "World Idli Day". appear in the Indian works only after 1250 CE. Food historian K. T. Achaya speculates that the modern idli recipe might have originated in present-day Indonesia, which has a long tradition of fermented food. According to him, the cooks employed by the Hindu kings of the Indianised kingdoms might Idli Day
March 30 is celebrated as World Idli Day. It was first celebrated in 2015 at Chennai. Idli or idly (; plural: idlis) or iddali or iddena is a type of savoury rice cake, originating from South India, popular as a breakfast food in Southern India and in Sri Lanka. The cakes are made by steaming a batter consisting of fermented de-husked black lentils and rice. The fermentation process the Hindu kings of the Indianised kingdoms might have invented the steamed idli there, and brought the recipe 

In [None]:

output = summarize_with_query(combined_text, query)




## Output

In [None]:
print(output)

 The earliest extant Tamil work to mention idli (as itali) is Maccapuranam, dated to the 17th century . In 2015, Chennai-based Idli caterer Eniyavan started celebrating March 30 as World Idli Day . Food historian K. T. Achaya speculates that the modern idli recipe might have originated in present-day Indonesia . According to him, the cooks employed by the Hindu kings of the Indianised kingdoms might have invented the steamed idli .


## Deployment on Streamlit

Install all the Libraries

In [94]:
import streamlit as st

In [95]:
print('Code: ')
!wget -q -O - ipv4.icanhazip.com
!streamlit run app.py & npx localtunnel --port 8501

34.170.201.105

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8501[0m
[34m  External URL: [0m[1mhttp://34.170.201.105:8501[0m
[0m
[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0Kyour url is: https://plenty-friends-crash.loca.lt
2024-11-27 17:34:24.513648: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-11-27 17:34:24.562945: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-11-27 17:34:24.576516: E external/local_xla/xla/stream_executor/cuda/cu

In [None]:
from google.colab import drive
drive.mount('/content/drive')