Material-Generator is a Generative AI-powered knowledge assistant that retrieves high-quality content from Wikipedia, semantically chunks the data, stores it in a FAISS vector database, and provides rewritten, easy-to-understand responses based on user queries using LLaMA-3 via Groq. This tool is ideal for educational use cases, personalized study material generation, and intelligent content summarization.
- 🔍 Wikipedia Search Integration: Retrieves accurate and relevant content from Wikipedia using
WikipediaRetriever. - 🧠 Semantic Chunking: Groups related sentences using sentence similarity and cosine similarity thresholds.
- ⚡ FAISS Vector Store: Stores and indexes semantically meaningful text chunks for efficient retrieval.
- 🧾 Reranked Responses: Returns the top 5 semantically closest responses to a user query.
- ✍️ LLM-based Rewriting: Uses LLaMA-3 (via Groq) to rewrite chunks into simpler, user-friendly language.
- 🌐 Wikipedia URL Retrieval: Provides direct Wikipedia URLs related to the user query for further exploration.
| Technology | Purpose |
|---|---|
Langchain |
Framework for building modular LLM applications |
Langchain-Community |
For WikipediaRetriever integration |
Langchain-Groq |
Integration with Groq’s ultra-fast LLaMA-3 LLM |
SentenceTransformers |
Embedding generator and semantic similarity |
FAISS |
Efficient vector similarity search for chunk retrieval |
BeautifulSoup |
Cleans and parses HTML content from Wikipedia pages |
ChatPromptTemplate |
Custom prompt design for LLM rewriting and URL generation |
Python |
Core programming language |
Text is semantically chunked based on cosine similarity between sentence embeddings. Sentences with a similarity above a set threshold (default: 0.75) are grouped into a single chunk to preserve contextual relevance.
Run the following to install all required dependencies:
!pip install langchain-community
!pip install langchain-experimental
!pip install langchain-groq
!pip install faiss-cpu
!pip install wikipedia
!pip install sentence-transformers
!pip install beautifulsoup4Set the following environment variables before running the script:
os.environ["LANGSMITH_API_KEY"] = "YOUR_LANGSMITH_KEY"
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["GROQ_API_KEY"] = "YOUR_GROQ_API_KEY"- User enters a query (e.g., "Quantum Computing").
- WikipediaRetriever fetches related documents.
- HTML is cleaned using BeautifulSoup.
- Documents are semantically chunked using
SentenceTransformer. - Chunks are stored in a FAISS index.
- FAISS is queried with the user input and reranked.
- Top 5 chunks are passed through LLaMA-3 for rewriting.
- Final outputs are displayed along with related Wikipedia links.
Enter your query: Quantum Computing
Quantum computing is a form of computation that uses quantum bits, or qubits, which can exist in multiple states simultaneously. This allows for more complex and faster processing compared to traditional computers.
<chunk_id_1>
...
Relevant Wikipedia URLs:
https://en.wikipedia.org/wiki/Quantum_computing
- Student learning and revision
- Summarizing technical or complex topics
- Creating simple explanations for educational tools
- Building AI tutors or study companions
- File upload support for user-provided documents
- Integration with other public data APIs
- PDF export of generated content
- Web UI using Flask or Streamlit
This project is open-source and available under the MIT License.