Question-Answering with Astra DB and LangChain: A Vector Search Demo

Overview

This project demonstrates how to build a question-answering (QA) system using LangChain, OpenAI, and Astra DB. The system processes a PDF document, stores its content in a vector database, and allows interactive querying to retrieve relevant information.

Introduction

This project aims to create an efficient QA system by leveraging advanced NLP (Natural Language Processing) techniques. It uses:

LangChain: A framework for building applications with large language models (LLMs).
OpenAI: Provides language models and embeddings to process and understand the text.
Astra DB: A scalable, cloud-native database that stores and retrieves text chunks as vectors.

Prerequisites

To run this demo, ensure you have the following:

Serverless Cassandra with Vector Search Database on Astra DB: https://accounts.datastax.com/session-service/v1/login
- Obtain a DB Token with the role of Database Administrator.
- Copy your Database ID. These connection parameters will be required shortly.
- For detailed instructions, refer here: https://docs.datastax.com/en/astra-db-serverless/get-started/quickstart.html#_prepare_for_using_your_vector_database
OpenAI API Key:
- You will need an API key from OpenAI for this demo to function. https://cassio.org/start_here/#llm-access

Technologies Used

LangChain: Framework for building applications with LLMs.
OpenAI: Provides the GPT models used for embeddings and language understanding.
Astra DB: Cloud-native database service by DataStax.
PyPDF2: Library for reading PDF files.
Datasets: Hugging Face library for loading and processing datasets.

Abbreviations and Definitions

QA (Question-Answering): A type of information retrieval that involves answering questions posed by users based on a given dataset or document.
NLP (Natural Language Processing): A field of AI that focuses on the interaction between computers and human languages.
LLM (Large Language Model): A type of machine learning model trained on a large dataset of text to understand and generate human-like text.
Vector Store: A database that stores text data as vectors (numerical representations) to enable efficient similarity search.
Embedding: A representation of text data in a continuous vector space, often used to measure the similarity between different pieces of text.

Thank you for visiting my repository! Your interest and support are greatly appreciated.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
budget_speech.pdf		budget_speech.pdf
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Question-Answering with Astra DB and LangChain: A Vector Search Demo

Overview

Introduction

Prerequisites

Technologies Used

Abbreviations and Definitions

About

Releases

Packages

Languages

License

Deba951/Querying-PDF-With-Astra-and-LangChain

Folders and files

Latest commit

History

Repository files navigation

Question-Answering with Astra DB and LangChain: A Vector Search Demo

Overview

Introduction

Prerequisites

Technologies Used

Abbreviations and Definitions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages