In [None]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example 2: Programmatic Usage and Advanced Extraction\n",
    "\n",
    "This notebook demonstrates how to use `evidence-extractor` as a Python library, rather than just a command-line tool.\n",
    "\n",
    "This is useful for integrating the extraction logic into your own custom analysis workflows.\n",
    "\n",
    "We will cover:\n",
    "1.  Importing and using the core functions programmatically.\n",
    "2.  Focusing on specific extractions, like PICO and Quality Scores.\n",
    "3.  Directly accessing the structured Pydantic data models."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Setup and Imports\n",
    "\n",
    "First, we import the necessary functions from our library. We will also need to instantiate the `GeminiClient` as our functions rely on it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from pprint import pprint\n",
    "\n",
    "# Import the core functions we want to use\n",
    "from evidence_extractor.core.ingest import ingest_pdf\n",
    "from evidence_extractor.core.preprocess import extract_text_from_doc, clean_and_consolidate_text\n",
    "from evidence_extractor.extraction.pico import extract_pico_elements\n",
    "from evidence_extractor.extraction.methods import extract_methods_and_quality\n",
    "from evidence_extractor.integration.gemini_client import GeminiClient\n",
    "\n",
    "# Ensure your GEMINI_API_KEY is set in your .env file\n",
    "# The GeminiClient will load it automatically.\n",
    "gemini_client = GeminiClient()\n",
    "\n",
    "if not gemini_client.is_configured():\n",
    "    print(\"ERROR: Gemini client is not configured. Please check your .env file.\")\n",
    "\n",
    "pdf_path = \"../data/raw/sample.pdf\"\n",
    "\n",
    "if not os.path.exists(pdf_path):\n",
    "    print(f\"ERROR: Sample PDF not found at {pdf_path}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Preprocessing the Document\n",
    "\n",
    "Just like in the main CLI, our first step is to ingest the PDF and extract the cleaned text. This text will be the input for our AI-powered functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "document = ingest_pdf(pdf_path)\n",
    "pages_text = extract_text_from_doc(document)\n",
    "_, cleaned_text = clean_and_consolidate_text(pages_text)\n",
    "\n",
    "# We'll use a snippet for efficiency, as PICO and Methods are usually in the abstract/intro\n",
    "text_snippet = cleaned_text[:8000]\n",
    "\n",
    "print(f\"Successfully processed PDF. Text snippet length: {len(text_snippet)} characters.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Extracting PICO Elements\n",
    "\n",
    "Now we can call the `extract_pico_elements` function directly. It takes the Gemini client and the text snippet as input and returns a Pydantic `PICO` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pico_result = extract_pico_elements(gemini_client, text_snippet)\n",
    "\n",
    "if pico_result:\n",
    "    print(\"--- Extracted PICO Elements ---\")\n",
    "    # We can access the fields of the Pydantic model directly\n",
    "    print(f\"Population: {pico_result.population}\")\n",
    "    print(f\"Intervention: {pico_result.intervention}\")\n",
    "    print(f\"Comparison: {pico_result.comparison}\")\n",
    "    print(f\"Outcome: {pico_result.outcome}\")\n",
    "    \n",
    "    # The object also contains the default correction metadata\n",
    "    print(f\"\\nValidation Status: {pico_result.correction_metadata.status.value}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Extracting Methodological Quality\n",
    "\n",
    "Similarly, we can call the `extract_methods_and_quality` function to get a `QualityScore` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "quality_result = extract_methods_and_quality(gemini_client, text_snippet)\n",
    "\n",
    "if quality_result:\n",
    "    print(\"--- Extracted Quality Score ---\")\n",
    "    print(f\"Score Name: {quality_result.score_name}\")\n",
    "    print(f\"Score Value: {quality_result.score_value}\")\n",
    "    print(f\"Justification: {quality_result.justification}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "By importing functions directly, you can build custom workflows. For example, you could loop over a directory of PDFs, extract only the PICO elements for each one, and save the results directly to a CSV file, bypassing the main `extract` command's full pipeline."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}