Skip to content

SaiPavankumar22/WikiRetriver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

📚 Material-Generator

Material-Generator is a Generative AI-powered knowledge assistant that retrieves high-quality content from Wikipedia, semantically chunks the data, stores it in a FAISS vector database, and provides rewritten, easy-to-understand responses based on user queries using LLaMA-3 via Groq. This tool is ideal for educational use cases, personalized study material generation, and intelligent content summarization.


🚀 Features

  • 🔍 Wikipedia Search Integration: Retrieves accurate and relevant content from Wikipedia using WikipediaRetriever.
  • 🧠 Semantic Chunking: Groups related sentences using sentence similarity and cosine similarity thresholds.
  • FAISS Vector Store: Stores and indexes semantically meaningful text chunks for efficient retrieval.
  • 🧾 Reranked Responses: Returns the top 5 semantically closest responses to a user query.
  • ✍️ LLM-based Rewriting: Uses LLaMA-3 (via Groq) to rewrite chunks into simpler, user-friendly language.
  • 🌐 Wikipedia URL Retrieval: Provides direct Wikipedia URLs related to the user query for further exploration.

🛠️ Tech Stack & Tools

Technology Purpose
Langchain Framework for building modular LLM applications
Langchain-Community For WikipediaRetriever integration
Langchain-Groq Integration with Groq’s ultra-fast LLaMA-3 LLM
SentenceTransformers Embedding generator and semantic similarity
FAISS Efficient vector similarity search for chunk retrieval
BeautifulSoup Cleans and parses HTML content from Wikipedia pages
ChatPromptTemplate Custom prompt design for LLM rewriting and URL generation
Python Core programming language

🧪 Semantic Chunking Logic

Text is semantically chunked based on cosine similarity between sentence embeddings. Sentences with a similarity above a set threshold (default: 0.75) are grouped into a single chunk to preserve contextual relevance.


📦 Installation

Run the following to install all required dependencies:

!pip install langchain-community
!pip install langchain-experimental
!pip install langchain-groq
!pip install faiss-cpu
!pip install wikipedia
!pip install sentence-transformers
!pip install beautifulsoup4

🔐 Environment Variables

Set the following environment variables before running the script:

os.environ["LANGSMITH_API_KEY"] = "YOUR_LANGSMITH_KEY"
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["GROQ_API_KEY"] = "YOUR_GROQ_API_KEY"

📋 How It Works

  1. User enters a query (e.g., "Quantum Computing").
  2. WikipediaRetriever fetches related documents.
  3. HTML is cleaned using BeautifulSoup.
  4. Documents are semantically chunked using SentenceTransformer.
  5. Chunks are stored in a FAISS index.
  6. FAISS is queried with the user input and reranked.
  7. Top 5 chunks are passed through LLaMA-3 for rewriting.
  8. Final outputs are displayed along with related Wikipedia links.

🖥️ Example Output

Enter your query: Quantum Computing

Quantum computing is a form of computation that uses quantum bits, or qubits, which can exist in multiple states simultaneously. This allows for more complex and faster processing compared to traditional computers.
<chunk_id_1>

...

Relevant Wikipedia URLs:
https://en.wikipedia.org/wiki/Quantum_computing

🧠 Use Cases

  • Student learning and revision
  • Summarizing technical or complex topics
  • Creating simple explanations for educational tools
  • Building AI tutors or study companions

✍️ Future Enhancements

  • File upload support for user-provided documents
  • Integration with other public data APIs
  • PDF export of generated content
  • Web UI using Flask or Streamlit

📄 License

This project is open-source and available under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors