# Exploration Notebook

This notebook is intended for exploratory data analysis and experimentation with the Retrieval-Augmented Generation (RAG) pipeline. It will cover the following aspects:

1. **PDF Ingestion**: Loading and parsing PDF files.
2. **Text Vectorization**: Converting text into vector representations.
3. **Chunking**: Splitting text into manageable chunks.
4. **Embedding Generation**: Creating embeddings for the text.
5. **Retrieval**: Querying the vector store for relevant documents.
6. **Question Generation**: Generating questions based on retrieved content.

## Setup

Before running the code, ensure that all necessary libraries are installed and the environment is properly configured.

In [None]:
# Import necessary libraries
import os
import pandas as pd
from src.ingestion.ingestion import IngestionPipeline
from src.chunking.chunker import Chunker
from src.embedding.embedder import Embedder
from src.retrieval.retriever import Retriever
from src.generation.generator import ExamGenerator

# Initialize the ingestion pipeline
ingestion_pipeline = IngestionPipeline()

# Define the path to the PDF files
pdf_directory = os.path.join('data', 'raw')

# Ingest PDFs
ingestion_pipeline.ingest_pdfs(pdf_directory)

# Further analysis and experimentation can be added below.