# Neural Network RAG Advisor - Workflow Notebook

This notebook orchestrates the steps for the RAG system:
1.  **Setup:** Installs dependencies and imports necessary modules.
2.  **Configuration:** Sets up LlamaIndex global settings (Embedding Model, LLM).
3.  **(Optional) Data Collection:** Runs the arXiv scraper.
4.  **Index Building:** Creates or updates the FAISS vector index.
5.  **Querying:** Loads the index and answers questions using the configured LLM.

## 1. Setup

### 1.1 Working Directory

Ensure this notebook is running with the project's root directory (`V1`) as the working directory so imports from `src` work.

In [None]:
import os
# Get current working directory
cwd = os.getcwd()
print(f"Current Working Directory: {cwd}")
# Verify 'src' directory exists - adjust path if notebook is not in V1
src_path = os.path.join(cwd, 'src')
if not os.path.isdir(src_path):
    print("\nERROR: 'src' directory not found.")
    print("Please ensure you are running this notebook from the project root ('V1') directory.")
else:
    print("'src' directory found. Imports should work.")

### 1.2 Dependencies

Install the required packages using pip. Ensure your virtual environment (`venv`) is activated if running locally outside of an integrated environment like Colab/Paperspace.

In [None]:
%pip install --upgrade --force-reinstall numpy==1.26.4 scipy==1.10.1 protobuf==3.20.3 fsspec==2024.6.1
%pip install --upgrade llama-index-core llama-index-vector-stores-faiss llama-index-readers-file llama-index-embeddings-huggingface llama-index-llms-huggingface sentence-transformers faiss-cpu accelerate bitsandbytes transformers huggingface_hub arxiv
# Install specific PyTorch version for your CUDA (Example: CUDA 12.1 Stable)
# Adjust 'cu121' if you have a different CUDA version (e.g., 'cu118', or use nightly '--pre --index-url .../nightly/cu128' for CUDA 12.8)
%pip uninstall torch torchvision torchaudio -y
%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Optional: Install CUDA-enabled bitsandbytes if you have a compatible GPU and want 4-bit quantization
# %pip uninstall bitsandbytes -y
# %pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.1-py3-none-win_amd64.whl

# Verify PyTorch CUDA
import torch
print(f'PyTorch version: {torch.__version__}')
print(f'CUDA Available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    print(f'CUDA Version detected by PyTorch: {torch.version.cuda}')

### 1.3 Imports & Logging

In [None]:
import logging
import torch
import faiss # Ensure faiss is imported if needed elsewhere
from transformers import BitsAndBytesConfig

# LlamaIndex imports
from llama_index.core import (
    Settings,
    PromptTemplate,
    VectorStoreIndex,
    StorageContext
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.vector_stores.faiss import FaissVectorStore

# Import functions from our .py files
from src.arxiv_scraper import scrape_arxiv
from src.rag_pipeline import build_faiss_index, load_faiss_index

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logging.getLogger().setLevel(logging.INFO) # Ensure root logger level is INFO

## 2. Configuration (Global Settings)

Configure the Embedding Model and LLM globally using `llama_index.core.Settings`. This ensures consistency across index loading and querying.

In [None]:
logging.info("Configuring global LlamaIndex settings...")

# 1. Configure Embedding Model
Settings.embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)
logging.info(f"Using embedding model: {Settings.embed_model.model_name}")

# 2. Configure LLM (Llama 3.1 8B Instruct - No Quantization based on qa_system.py)
llm_model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"
logging.info(f"Setting up LLM: {llm_model_name}")

# --- RAG Prompt Template for Llama 3 Instruct ---
query_wrapper_prompt = PromptTemplate(
    "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n"
    "You are an expert Q&A assistant specialized in neural network architectures. "
    "Your goal is to answer the user's query accurately based *only* on the provided context information. "
    "If the context does not contain the information needed to answer the query, "
    "state that the answer is not found in the context. Do not add information "
    "that is not present in the context. Keep your answers concise and directly relevant to the query."
    "<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"
    "Context information:\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, answer the query.\n"
    "Query: {query_str}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
)

# --- LLM Initialization (No Quantization, using float16) ---
# Ensure you have enough VRAM (>16GB) for float16, otherwise remove torch_dtype
use_quantization = False # Set to True if you successfully installed CUDA bitsandbytes and want to try it
quantization_config = None
model_kwargs = {"torch_dtype": torch.float16} # Use float16 by default

if use_quantization:
    try:
        # Define 4-bit quantization config
        quantization_config = BitsAndBytesConfig(
           load_in_4bit=True,
           bnb_4bit_compute_dtype=torch.float16
        )
        model_kwargs["quantization_config"] = quantization_config
        # If using quantization, might not need torch_dtype explicitly
        if "torch_dtype" in model_kwargs: del model_kwargs["torch_dtype"]
        logging.info("Using 4-bit quantization configuration.")
    except Exception as e:
        logging.error(f"Failed to configure quantization: {e}. Disabling quantization.", exc_info=True)
        use_quantization = False
        quantization_config = None
        if "quantization_config" in model_kwargs: del model_kwargs["quantization_config"]
        # Fallback to float16 if quantization fails
        if "torch_dtype" not in model_kwargs:
            model_kwargs["torch_dtype"] = torch.float16

if not use_quantization:
     logging.info("Quantization disabled. Using torch_dtype: %s", model_kwargs.get("torch_dtype", "Default (likely float32)"))

try:
    # Log in to Hugging Face Hub (required for Llama 3 models)
    # Ensure you have run `huggingface-cli login` in your terminal previously,
    # OR set the HF_TOKEN environment variable.
    from huggingface_hub import login
    # login() # Call this if needed, or rely on CLI login / env var

    Settings.llm = HuggingFaceLLM(
        model_name=llm_model_name,
        tokenizer_name=llm_model_name,
        query_wrapper_prompt=query_wrapper_prompt,
        context_window=131072,
        max_new_tokens=512,
        model_kwargs=model_kwargs,
        generate_kwargs={
            "temperature": 0.7,
            "do_sample": True,
        },
        device_map="auto",
    )
    logging.info(f"LLM '{llm_model_name}' configured successfully.")
except Exception as e:
    logging.error(f"Failed to initialize LLM: {e}", exc_info=True)
    logging.error("Ensure you have accepted Llama 3 terms and logged into Hugging Face.")
    # Set LLM to None to potentially allow other parts of the notebook to run
    Settings.llm = None
    print("\n!!! LLM INITIALIZATION FAILED. Querying will not work. Please check errors above. !!!\n")


## 3. (Optional) Data Collection

Run the arXiv scraper to fetch research papers. This only needs to be done once or when you want to update the data.

In [None]:
run_scraper = False # Set to True to run the scraper

if run_scraper:
    logging.info("Running arXiv scraper...")
    try:
        # You can customize the query and max_results here
        scrape_arxiv(query="neural network architecture OR large language model", max_results=500)
        logging.info("arXiv scraper finished.")
    except Exception as e:
        logging.error(f"arXiv scraper failed: {e}", exc_info=True)
else:
    logging.info("Skipping arXiv scraper.")

## 4. Index Building

Create the FAISS vector index from the downloaded documents. This uses the `build_faiss_index` function from `src/rag_pipeline.py` and relies on the global `Settings.embed_model` configured earlier.

In [None]:
run_index_build = True # Set to True to build/rebuild the index
data_directory = "data/research_papers"
persist_directory = "storage"

# Optional: Check if index already exists to avoid rebuilding
faiss_binary_path = os.path.join(persist_directory, "vector_store.faiss")
if os.path.exists(faiss_binary_path):
    logging.info(f"Index already exists at {persist_directory}. Set run_index_build=True to force rebuild.")
    run_index_build = False # Avoid accidental rebuild

if run_index_build:
    logging.info("Building FAISS index...")
    # Make sure the data directory exists
    if not os.path.isdir(data_directory) or not os.listdir(data_directory):
        logging.error(f"Data directory '{data_directory}' is empty or does not exist. Cannot build index.")
        logging.error("Please run the scraper (Step 3) or place documents in the directory first.")
    else:
        try:
            # Note: build_faiss_index in rag_pipeline.py might try to set Settings.llm = None.
            # Our global Settings configuration in this notebook should take precedence during execution.
            build_faiss_index(data_dir=data_directory, persist_dir=persist_directory)
            logging.info("Index building process finished.")
        except Exception as e:
            logging.error(f"Index building failed: {e}", exc_info=True)
else:
    logging.info("Skipping index build.")

## 5. Querying

Load the index and ask questions. This uses the `load_faiss_index` function and the globally configured `Settings` (including the LLM).

In [None]:
# Define a query function specific to the notebook
def run_query(question: str, index_persist_dir="storage"):
    """Loads index, creates engine, and runs query using global Settings."""
    if Settings.llm is None:
        logging.error("LLM is not configured (Settings.llm is None). Cannot run query.")
        return "Error: LLM not initialized."

    logging.info(f"Loading index from '{index_persist_dir}' for querying...")
    index = load_faiss_index(persist_dir=index_persist_dir)

    if index is None:
        logging.error("Index loading failed.")
        return "Error: Could not load the index. Please build it first."
    logging.info("Index loaded successfully.")

    logging.info("Creating query engine...")
    try:
        # Create engine using global Settings (LLM + Embed Model)
        query_engine = index.as_query_engine()
        logging.info("Query engine ready.")
    except Exception as e:
        logging.error(f"Failed to create query engine: {e}", exc_info=True)
        return f"Error creating query engine: {e}"

    logging.info(f"Sending query to LLM: '{question}'")
    try:
        response = query_engine.query(question)
        logging.info("LLM processing finished.")
        answer_text = str(response.response).strip()
        # Clean up potential end token
        if answer_text.endswith("<|eot_id|>"):
              answer_text = answer_text[:-len("<|eot_id|>")].strip()
        return answer_text
    except Exception as e:
        logging.error(f"An error occurred during querying: {e}", exc_info=True)
        return f"Error during query: {e}"


### Run a Sample Query

In [None]:
# --- Define your question here ---
my_question = "I have a finance advisor ai model. I want to evolve it using genetic algorithms. What must i do?"
#my_question = "What is ResNet?"
#my_question = "Summarize recent advancements in transformer architectures."

print(f"Asking: {my_question}\n")

# Run the query function
answer = run_query(my_question)

print("\nSynthesized Answer:")
print(answer)

## 6. Next Steps

*   Experiment with different questions in the cell above.
*   Modify the LLM configuration (e.g., try quantization if you fix `bitsandbytes`, try different `temperature` values).
*   Update the scraper query and rebuild the index with different data.
*   Integrate the `data_structures.py` content if needed.