# Golden Solution: Research Paper Entity Extraction & Citation Analysis

Reference implementation. Runs end-to-end without manual intervention.

**Deliverables demonstrated:**
- Data loaded from files (environment interaction)
- Entity extraction with resolution and disambiguation
- Citation network analysis with anomaly detection
- Artifact generation (final_report.json saved to disk)
- All unit tests pass

## Setup

In [None]:
import subprocess, sys, os, shutil

REPO_URL = "https://github.com/EhsanKA/agentic_task.git"
REPO_DIR = "/content/agentic_task"

if os.path.exists(REPO_DIR):
    shutil.rmtree(REPO_DIR)
subprocess.run(["git", "clone", REPO_URL, REPO_DIR], check=True)
subprocess.run([sys.executable, "-m", "pip", "install", "-q", REPO_DIR], check=True)

## Data Generation & Loading

In [None]:
import json
from benchmark.data.loader import setup_data

papers_raw, citations_raw, affiliations_raw, DATA_DIR = setup_data()
print(f"Data directory: {DATA_DIR}")

## Run Pipeline

In [None]:
from benchmark.pipeline.runner import run_pipeline

results = run_pipeline(papers_raw, citations_raw, affiliations_raw, data_dir=DATA_DIR)

for k, v in results.items():
    globals()[k] = v

## Results

In [None]:
print(json.dumps(validation_results, indent=2))
print(f"\nOrphans: {len(orphan_citations)}, Self-cites: {len(self_citations)}")
print(f"Temporal anomalies: {len(temporal_anomalies)}, Rings: {len(citation_ring_papers)}")
print(f"Typos: {len(typo_corrections)}, Affiliation conflicts: {len(affiliation_conflicts)}")

In [None]:
print(json.dumps(final_report, indent=2, default=str))

## Unit Tests

In [None]:
from benchmark.evaluation.tests import set_context, run_all_tests

results["_data_dir"] = DATA_DIR
set_context(results)
test_result = run_all_tests()

## Summary

In [None]:
import os
print("=" * 60)
print(f"Artifact saved: {os.path.exists(os.path.join(DATA_DIR, 'final_report.json'))}")
if test_result.wasSuccessful():
    print("ALL TESTS PASSED")
else:
    print(f"FAILED: {len(test_result.failures)} failures, {len(test_result.errors)} errors")
print("=" * 60)