This application integrates a PDF text extraction feature and store as embedding vector in vector database (FAISS) and use OpenAI's ChatGPT model to provide an interactive question-answering system. It allows users to query a PDF document and receive contextually relevant answers from vector database which is converted into more meaningful answer by ChatGPT.
- PDF Text Extraction: Extract text from any given page of a PDF document.
- Embedding Vector: Store the extracted text into embedding vector form in a vector datbase.
- ChatGPT-Powered Responses: Generate answers to questions based on the content of a specified page in the PDF document.
- Web-Based Interface: The application is accessible through a web interface, allowing for easy interaction and use.
- PDF Selection and Page Reference: Users can specify a page number from a pre-defined PDF document.
- Question Input: Users can input a question related to the content of the selected page.
- Similarity Search: Answers which are similar to the asked questions is search from the vector database.
- Answer Generation: The application processes the extracted text from the PDF page and the user's question, leveraging ChatGPT to generate a relevant answer.
- Response Display: The generated answer is displayed to the user, providing insights or information based on the PDF's content.
- FastAPI: Powers the backend of the application, handling web requests and server-side logic.
- PDFMiner: Used for extracting text from PDF documents.
- FAISS: Vector database to store the embedding vectors and perform similarity search.
- OpenAI's ChatGPT: Provides the AI model for generating answers to user queries.
- Uvicorn: Serves as the ASGI server for hosting the application.
Before you start, ensure you have the following installed:
- Python 3.6 or higher
- Pip (Python package installer)
- Clone the Repository: If the application is hosted in a Git repository, provide instructions to clone it. Otherwise, skip this step if the user is setting it up directly from provided files.
git clone [your-repository-link]
cd [repository-name]
- Environment Setup: It's recommended to use a virtual environment for Python projects. This keeps dependencies required by different projects separate and organized.
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
- Install Dependencies: Install the required Python packages using pip.
pip install -r requirements.txt
- Environment Variables: Set up the necessary environment variables. Create a .env file in the root directory of the project and add the following variables:
OPENAI_API_KEY=your_openai_api_key
CHATGPT_MODEL=model_name # for example, "davinci"
Replace your_openai_api_key and model_name with your actual OpenAI API key and the model name you intend to use.
- Start the API Server:
Run the following command to start the FastAPI server:
python main.py
- Start the streamlit server in new terminal tab.
It will open application in browser at http://localhost:8501/
streamlit run app.py
- Through the frontend, you can test the PDF text extraction and question-answering features.
- Select a page number and input your question related to the content on that page.
- Submit the request, and the application will display the generated answer based on the PDF's content.
- Ensure that the PDF file (data/Attention_Is_All You_Need.pdf) is placed in the correct directory as specified in the code.
- The API key and model name must be valid and active for the OpenAI service to work correctly.