Skip to content

A Natural Language Query Agent to answer questions based on the context

Notifications You must be signed in to change notification settings

DevG10/Ema-s-NLQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Query Agent

A conversational AI agent capable of answering questions based on lecture notes and other resources using advanced NLP techniques and pre-trained language models. Link to the approach

Table of Contents

Introduction

This project aims to create an AI agent that can answer questions from a specific context (lecture notes, web resources). The agent utilizes NLP techniques and pre-trained language models to provide accurate and meaningful responses.

Features

  • Data Collection from PDFs and websites
  • Data Preprocessing using NLP techniques
  • Embedding creation and storage using FAISS
  • Query processing and response generation using Google's Gemini Pro LLM
  • Follow-up question handling using Langchain's Conversation Buffer Memory
  • User interface built with Streamlit

Installation

Prerequisites

  • Python 3.9+
  • Pip

Steps

# Clone the repository
git clone https://github.com/DevG10/Ema-s-NLQ.git

# Navigate to the project directory
cd Ema-s-NLQ

# Install the dependencies
pip install -r requirements.txt

Usage

Run the main.py file

cd src
python -m main.py

Note: Before running the main file ensure that you have first created a .env file and pasted your gemini key as GOOGLE_API_KEY = 'your_api_key' inside your .env file

Data Collection and Preprocessing

Data Sources

  • Lecture Notes
  • Websites from well-known institutes

Tools and Techniques

  • BeautifulSoup for web scraping
  • PyMuPDF for PDF extraction
  • NLTK for preprocessing (stopwords removal, lemmatization, etc.)

Creating and Storing Embeddings

  • Embeddings convert text into numerical vectors, allowing for semantic search and retrieval.
  • To create and store the embeddings, I used FAISS which is open-source and fast too as it is made using C++
  • The embeddings were calculated using sentence transformer's all-MiniLM-L6-v2 pre-trained model

Building the Query Agent

Query Processing

  • The query given by the user is first converted to the embeddings using the same model which was previously used to create the embeddings of the lecture.
  • This converted query is then searched using vector similarity search and the matching docs are returned

Handling Follow-Up Questions

  • To enable follow-up questions, Langchain's Conversation Buffer Memory is used to keep track of the context and questions asked.

Future Work

  • Potential improvements and additional features:
  • Enhancing the accuracy of responses
  • Adding more data sources
  • Improving the user interface

About

A Natural Language Query Agent to answer questions based on the context

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published