This project demonstrates how to build a question-answering (QA) system using LangChain, OpenAI, and Astra DB. The system processes a PDF document, stores its content in a vector database, and allows interactive querying to retrieve relevant information.
![Untitled](https://private-user-images.githubusercontent.com/83878346/331717180-5d2c656d-01c1-4a71-bb62-d251ec6cc27b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAxNTg4NjYsIm5iZiI6MTcyMDE1ODU2NiwicGF0aCI6Ii84Mzg3ODM0Ni8zMzE3MTcxODAtNWQyYzY1NmQtMDFjMS00YTcxLWJiNjItZDI1MWVjNmNjMjdiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzA1VDA1NDkyNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWNlNmE2YjEyZjhlOWRkNjQ5ZmM1MjU4MzViZjExYTY2ODA4YzMyODJlYzE1ZTllMTgyMWQ0MDFkNTRlMzk1ODAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.XJ7-JmrZy4GBhYNXy1v-gMf-HShNwk__ZIMGekPCjk8)
This project aims to create an efficient QA system by leveraging advanced NLP (Natural Language Processing) techniques. It uses:
- LangChain: A framework for building applications with large language models (LLMs).
- OpenAI: Provides language models and embeddings to process and understand the text.
- Astra DB: A scalable, cloud-native database that stores and retrieves text chunks as vectors.
To run this demo, ensure you have the following:
- Serverless Cassandra with Vector Search Database on Astra DB:
https://accounts.datastax.com/session-service/v1/login
- Obtain a DB Token with the role of Database Administrator.
- Copy your Database ID. These connection parameters will be required shortly.
- For detailed instructions, refer here:
https://docs.datastax.com/en/astra-db-serverless/get-started/quickstart.html#_prepare_for_using_your_vector_database
- OpenAI API Key:
- You will need an API key from OpenAI for this demo to function.
https://cassio.org/start_here/#llm-access
- You will need an API key from OpenAI for this demo to function.
- LangChain: Framework for building applications with LLMs.
- OpenAI: Provides the GPT models used for embeddings and language understanding.
- Astra DB: Cloud-native database service by DataStax.
- PyPDF2: Library for reading PDF files.
- Datasets: Hugging Face library for loading and processing datasets.
- QA (Question-Answering): A type of information retrieval that involves answering questions posed by users based on a given dataset or document.
- NLP (Natural Language Processing): A field of AI that focuses on the interaction between computers and human languages.
- LLM (Large Language Model): A type of machine learning model trained on a large dataset of text to understand and generate human-like text.
- Vector Store: A database that stores text data as vectors (numerical representations) to enable efficient similarity search.
- Embedding: A representation of text data in a continuous vector space, often used to measure the similarity between different pieces of text.
Thank you for visiting my repository! Your interest and support are greatly appreciated.