## Getting the reasoning logs

In [5]:
from langchain_ollama import ChatOllama
from langchain_core.messages import HumanMessage
model = ChatOllama(model="qwen3:8b") # i can also pass reasoning=True here
msg = [HumanMessage("why is quantum physics so hard?")]

response = model.invoke(msg, reasoning=True)
print("Answer:\n", response.content, "\n")

Answer:
 Quantum physics is notoriously difficult to grasp for several interconnected reasons, rooted in its abstract concepts, mathematical complexity, and departure from classical intuition. Here's a breakdown of the key factors:

### 1. **Counterintuitive Nature of Reality**  
   - **Wave-Particle Duality**: Particles like electrons exhibit both wave-like and particle-like behavior, which defies classical logic. For example, light behaves as a wave in some experiments and as a particle (photon) in others.  
   - **Superposition**: Quantum systems can exist in multiple states simultaneously (e.g., a particle being in two places at once) until measured. This is fundamentally different from our everyday experience.  
   - **Entanglement**: Particles can become "spooky" connected, with their states correlated instantaneously across vast distances, even if separated by space. This challenges classical notions of locality and causality.  
   - **Uncertainty Principle**: Certain pairs of p

In [7]:
print("Thinking / reasoning trace:\n", response.additional_kwargs.get("reasoning_content") or "No reasoning field found.")

Thinking / reasoning trace:
 Okay, the user is asking why quantum physics is so hard. Let me start by breaking down the main reasons. First, quantum mechanics deals with particles at a very small scale, which is different from the macroscopic world we're used to. So, the concepts like superposition and entanglement don't align with our everyday experiences. That's probably a big part of the difficulty.

Then there's the mathematical complexity. The equations in quantum mechanics, like the Schrödinger equation, are more advanced than classical physics. Students might struggle with the abstract mathematics involved, especially if they're not comfortable with linear algebra or differential equations.

Another point is the counterintuitive nature of the theory. Things like wave-particle duality and the uncertainty principle challenge our classical intuitions. People might find it hard to grasp because these concepts don't fit with how we perceive reality. For example, the idea that a parti

In [None]:
from langchain.tools import tool
from langchain.messages import HumanMessage, ToolMessage, AIMessage
from langchain_ollama.llms import OllamaLLM
from langchain_ollama.chat_models import ChatOllama
from langchain_tavily import TavilySearch

#get the model
model = ChatOllama(model="llama3.2", temperature=0)
model.invoke("tell me a joke")

#define the tool
search_tool = TavilySearch()

messages = [HumanMessage("add 35 + 156")]
#call the model by binding the tool
tool_llm = model.bind_tools(tools=[search_tool])
response = tool_llm.invoke(messages)
print("Tool call: ",response.tool_calls)
# manually call the tool and append the ToolMessage
for tc in response.tool_calls:
    name = tc["name"]
    args = tc['args']
    result = tool_llm.invoke(args)
    messages += [response, ToolMessage(content=str(result), tool_call_id=tc['id'])]

final = tool_llm.invoke(messages)
print(final)

## Testing search tools

### Wikipedia

In [2]:
from typing import Dict, Any, List, Set, Tuple, Optional
import time
import wikipedia
from wikipedia.exceptions import DisambiguationError, PageError

class WikipediaScraper:
    """
    Simple Wikipedia scraper using the 'wikipedia' package.
    Returns page metadata + content + a link-edge list.
    """

    DEPTH_CONFIG = {
        # (max_hops, max_pages_total, max_links_per_page)
        "shallow":  (0,  1,  20),  # only the main page
        "moderate": (1, 10,  15),  # main page + follow top links (1 hop)
        "deep":     (2, 50,  10),  # up to 2 hops and more pages allowed
    }

    def __init__(
        self,
        user_agent: str = "LangGraph-WikiScraper",
        delay_between_requests: float = 0.5,
    ):

        wikipedia.set_lang("en")
        self.delay = delay_between_requests
        self.user_agent = user_agent

    def _safe_get_page(self, title: str) -> Optional[wikipedia.WikipediaPage]:
        """
        Try to fetch a page object for title. Handles disambiguation and page errors.
        Returns None on failure.
        """
        try:
            # This may raise DisambiguationError or PageError
            page = wikipedia.page(title, auto_suggest=False, preload=False)
            return page
        except DisambiguationError as e:
            # pick the first non-ambiguous option as fallback (best-effort)
            # e.options is a list of possible titles
            for option in e.options:
                try:
                    page = wikipedia.page(option, auto_suggest=False, preload=False)
                    return page
                except Exception:
                    continue
            return None
        except PageError:
            return None
        except Exception:
            # any other unexpected error -> None
            return None

    def scrape(
        self,
        query: str,
        depth: str = "moderate",
        full_text: bool = False,
        max_pages_override: Optional[int] = None,
        max_links_per_page_override: Optional[int] = None,
    ) -> Dict[str, Any]:
        """
        Scrape wikipedia starting from `query`.
        depth: one of 'shallow', 'moderate', 'deep'
        full_text: whether to store full page.content (can be large). If False we store page.summary.
        max_pages_override: optionally override configured max pages.
        max_links_per_page_override: optionally override per-page link limit.

        Returns:
        {
            "start_query": query,
            "depth": depth,
            "pages": { title: { "title", "url", "summary", "content"(opt), "links": [...] } },
            "edges": [ (from_title, to_title), ... ],
            "visited_order": [title1, title2, ...],
            "warnings": [...],
        }
        """
        if depth not in self.DEPTH_CONFIG:
            raise ValueError(f"depth must be one of {list(self.DEPTH_CONFIG.keys())}")

        max_hops, cfg_max_pages, cfg_max_links = self.DEPTH_CONFIG[depth]
        if max_pages_override is not None:
            cfg_max_pages = int(max_pages_override)
        if max_links_per_page_override is not None:
            cfg_max_links = int(max_links_per_page_override)

        # BFS-style crawl up to given hops, but stop when total pages >= cfg_max_pages
        from collections import deque
        queue: deque[Tuple[str, int]] = deque()  # (title, current_hop)
        visited: Set[str] = set()
        pages: Dict[str, Dict[str, Any]] = {}
        edges: List[Tuple[str, str]] = []
        warnings: List[str] = []

        # Start by resolving the query to a page (try search then page)
        try:
            # Try direct page first (auto_suggest off to avoid surprises)
            start_page = self._safe_get_page(query)
            if start_page:
                start_title = start_page.title
            else:
                # fallback: use wikipedia.search to find best match
                search_results = wikipedia.search(query, results=5)
                if not search_results:
                    warnings.append(f"No search results for query '{query}'.")
                    return {
                        "start_query": query,
                        "depth": depth,
                        "pages": pages,
                        "edges": edges,
                        "visited_order": [],
                        "warnings": warnings,
                    }
                start_title = search_results[0]
        except Exception as e:
            warnings.append(f"Failed to resolve start query '{query}': {e}")
            return {
                "start_query": query,
                "depth": depth,
                "pages": pages,
                "edges": edges,
                "visited_order": [],
                "warnings": warnings,
            }

        queue.append((start_title, 0))

        while queue and len(pages) < cfg_max_pages:
            title, hop = queue.popleft()
            if title in visited:
                continue
            # polite pause
            time.sleep(self.delay)

            page = self._safe_get_page(title)
            if page is None:
                warnings.append(f"Could not fetch page '{title}'. Skipping.")
                visited.add(title)
                continue

            visited.add(page.title)
            # gather content
            try:
                summary = page.summary
            except Exception:
                summary = ""

            content = None
            if full_text:
                try:
                    content = page.content
                except Exception:
                    content = None

            # get links (titles)
            try:
                raw_links = page.links or []
            except Exception:
                raw_links = []

            # normalize / limit links
            links = []
            for l in raw_links:
                if len(links) >= cfg_max_links:
                    break
                # simple exclusions: skip files and external links (page.links typically only titles)
                if l.lower().startswith("file:"):
                    continue
                links.append(l)

            pages[page.title] = {
                "title": page.title,
                "url": page.url if hasattr(page, "url") else f"https://en.wikipedia.org/wiki/{page.title.replace(' ', '_')}",
                "summary": summary,
                "content": content,
                "links": links,
                "hop": hop,
            }

            # record edges
            for to_title in links:
                edges.append((page.title, to_title))

            # enqueue next hop links if we still have hops left and page limit not reached
            if hop < max_hops:
                for to_title in links:
                    if to_title not in visited and len(pages) + len(queue) < cfg_max_pages:
                        queue.append((to_title, hop + 1))

        visited_order = list(pages.keys())
        return {
            "start_query": query,
            "depth": depth,
            "pages": pages,
            "edges": edges,
            "visited_order": visited_order,
            "warnings": warnings,
        }


# Example usage:
if __name__ == "__main__":
    scraper = WikipediaScraper(delay_between_requests=0.2)
    result = scraper.scrape("Graph neural network", depth="deep", full_text=True)
    print("Visited:", result["visited_order"])
    # inspect one page
    for title, meta in result["pages"].items():
        print("TITLE:", title)
        print("URL:", meta["url"])
        print("SUMMARY (first 300 chars):", meta["summary"][:300])
        print("LINKS (first 10):", meta["links"][:10])
        break


Visited: ['Graph neural network', '15.ai', 'AAAI Conference on Artificial Intelligence', 'AI alignment', 'AI safety', 'Action selection', 'Activation function', 'Active learning (machine learning)', 'Adjacency matrix', 'Adobe Firefly', 'Adversarial machine learning', '/mlp/', 'NetEase', '2001: A Space Odyssey', '44,100 Hz', '4chan', 'Ethics of artificial intelligence', 'Partnership on AI', 'AI boom', 'Academic conference', 'Alberta', 'Amazon (company)', 'Anaheim, California', 'Anaheim Convention Center', 'Artificial intelligence', 'Association for the Advancement of Artificial Intelligence', 'Atlanta', 'Austin, Texas', 'Baidu', 'AI-assisted software development', 'AI bubble', 'AI capability control', 'Artificial intelligence industry in China', 'AI takeover', 'AI winter', 'Alan Turing', 'AI21 Labs', 'AI Safety Institute', 'AI Safety Summit', 'AI Seoul Summit']
TITLE: Graph neural network
URL: https://en.wikipedia.org/wiki/Graph_neural_network
SUMMARY (first 300 chars): Graph neural net

### Arxiv

In [3]:
import time
from typing import Dict, Any, List, Set, Tuple, Optional
import arxiv   # pip install arxiv
from collections import deque

class ArxivScraper:
    """
    Scraper that uses the `arxiv` python package to fetch paper metadata and
    expands the search neighborhood by following author publications and
    same-category papers as an approximation of 'linked' papers.

    Depth presets:
      - shallow:  (0 hops, top 1-3 papers)
      - moderate: (1 hop, expand by authors + same category)
      - deep:     (2 hops, larger expansion)
    """

    # config tuples: (max_hops, max_papers_total, max_papers_per_seed)
    DEPTH_CONFIG = {
        "shallow":  (0,  3,  3),
        "moderate": (1, 25,  8),
        "deep":     (2, 80, 12),
    }

    def __init__(
        self,
        results_per_query: int = 10,
        delay_between_requests: float = 0.5,
        max_results_returned_from_arxiv_call: int = 50,
    ):
        arxiv.timeout = 30  # seconds per request (module-level)
        self.results_per_query = results_per_query
        self.delay = delay_between_requests
        # the arxiv library paginates; cap how many results we fetch for any helper call
        self._max_results_per_call = max_results_returned_from_arxiv_call

    def _sleep(self):
        if self.delay and self.delay > 0:
            time.sleep(self.delay)

    def _search(self, query: str, max_results: int = 10) -> List[arxiv.Result]:
        """
        Run an arxiv search and return arxiv.Result objects (list).
        """
        max_results = min(max_results, self._max_results_per_call)
        search = arxiv.Search(
            query=query,
            max_results=max_results,
            sort_by=arxiv.SortCriterion.Relevance
        )
        results = []
        try:
            for r in search.results():
                results.append(r)
        except Exception:
            # on transient network / parsing errors, return what we have
            pass
        return results

    def _papers_to_meta(self, r: arxiv.Result) -> Dict[str, Any]:
        """
        Convert arxiv.Result to a serializable metadata dict.
        """
        return {
            "id": getattr(r, "entry_id", None),
            "arxiv_id": getattr(r, "get_short_id", lambda: None)() if hasattr(r, "get_short_id") else None,
            "title": getattr(r, "title", None),
            "authors": [a.name for a in getattr(r, "authors", [])] if getattr(r, "authors", None) else [],
            "summary": getattr(r, "summary", None),
            "published": getattr(r, "published", None),
            "updated": getattr(r, "updated", None),
            "pdf_url": next((l.href for l in getattr(r, "links", []) if getattr(l, "title", "") == "pdf"), None)
                       or getattr(r, "pdf_url", None),
            "primary_category": getattr(r, "primary_category", None) if hasattr(r, "primary_category") else None,
            "categories": getattr(r, "categories", None) if hasattr(r, "categories") else None,
            "authors_inferred_query": ", ".join([a.name for a in getattr(r, "authors", [])]) if getattr(r, "authors", None) else "",
        }

    def _get_by_author(self, author_name: str, max_results: int = 5) -> List[arxiv.Result]:
        """
        Search for other papers by the same author. We craft an author: query.
        """
        q = f'au:"{author_name}"'
        return self._search(q, max_results=max_results)

    def _get_by_category(self, category: str, max_results: int = 5) -> List[arxiv.Result]:
        """
        Search for recent / relevant papers in the given category.
        category should be like 'cs.AI' or 'stat.ML' — but we accept whatever arXiv expects.
        We'll use the 'cat:' field if the user-specified category looks valid.
        """
        # fallback: search for category token anywhere
        q = f'cat:{category}'
        results = self._search(q, max_results=max_results)
        if not results:
            # fallback to simple token search
            results = self._search(category, max_results=max_results)
        return results

    def scrape(
        self,
        query: str,
        depth: str = "moderate",
        max_results_override: Optional[int] = None,
        max_per_seed_override: Optional[int] = None,
    ) -> Dict[str, Any]:
        """
        Scrape arXiv starting from a textual query and expand according to depth.

        Returns a dict:
          {
            "start_query": query,
            "depth": depth,
            "papers": { unique_key: metadata_dict },
            "edges": [ (from_id, to_id, relation) ],
            "visited_order": [unique_key, ...],
            "warnings": [...],
          }

        Notes:
          - Because arXiv doesn't provide structured references/citations, 'edges' are derived
            heuristically:
              * "author": link from a paper to another paper by the same author discovered
              * "category": link from a paper to another paper found in same primary category
              * "search_seed": initial query -> paper
          - Unique key used for nodes is arXiv short id if available else entry_id or title.
        """
        if depth not in self.DEPTH_CONFIG:
            raise ValueError(f"depth must be one of {list(self.DEPTH_CONFIG.keys())}")

        max_hops, cfg_max_papers, cfg_max_per_seed = self.DEPTH_CONFIG[depth]
        if max_results_override is not None:
            cfg_max_papers = int(max_results_override)
        if max_per_seed_override is not None:
            cfg_max_per_seed = int(max_per_seed_override)

        # Start: search for the query
        warnings: List[str] = []
        seed_results = self._search(query, max_results=self.results_per_query)
        if not seed_results:
            warnings.append(f"No arXiv results for query '{query}'.")
            return {
                "start_query": query,
                "depth": depth,
                "papers": {},
                "edges": [],
                "visited_order": [],
                "warnings": warnings,
            }

        # BFS-like expansion using seeds, but nodes are unique by arxiv short id / entry_id / title
        def node_key(r: arxiv.Result):
            # prefer arXiv short id like '2101.00001'
            try:
                sid = r.get_short_id()
            except Exception:
                sid = None
            if sid:
                return sid
            if getattr(r, "entry_id", None):
                return getattr(r, "entry_id")
            return getattr(r, "title", None)

        queue = deque()
        papers: Dict[str, Dict[str, Any]] = {}
        edges: List[Tuple[str, str, str]] = []  # (from_key, to_key, relation)
        visited: Set[str] = set()
        visited_order: List[str] = []

        # Enqueue seed results as hop=0
        for r in seed_results[:cfg_max_per_seed]:
            key = node_key(r)
            queue.append((r, 0, "seed"))

        while queue and len(papers) < cfg_max_papers:
            r, hop, relation_from = queue.popleft()
            key = node_key(r)
            if not key or key in visited:
                continue
            self._sleep()
            # Convert to metadata dict and save
            meta = self._papers_to_meta(r)
            meta["hop"] = hop
            papers[key] = meta
            visited.add(key)
            visited_order.append(key)

            # record an edge from the seed query if relation_from == "seed"
            if relation_from == "seed":
                edges.append((f"query::{query}", key, "search_seed"))

            # Heuristic expansions:
            if hop < max_hops:
                # 1) Add other papers by each author
                authors = meta.get("authors", []) or []
                for author in authors:
                    # limit how many author-search results we consider per author
                    try:
                        author_results = self._get_by_author(author, max_results=cfg_max_per_seed)
                    except Exception:
                        author_results = []
                    for ar in author_results:
                        k2 = node_key(ar)
                        if not k2 or k2 in visited:
                            continue
                        edges.append((key, k2, "author"))
                        # enqueue only if we still have budget
                        if len(papers) + len(queue) < cfg_max_papers:
                            queue.append((ar, hop + 1, "author"))

                # 2) Add papers from same primary_category (if available)
                prim_cat = meta.get("primary_category") or (meta.get("categories") and meta.get("categories")[0])
                if prim_cat:
                    try:
                        cat_results = self._get_by_category(prim_cat, max_results=cfg_max_per_seed)
                    except Exception:
                        cat_results = []
                    for ar in cat_results:
                        k2 = node_key(ar)
                        if not k2 or k2 in visited:
                            continue
                        edges.append((key, k2, "category"))
                        if len(papers) + len(queue) < cfg_max_papers:
                            queue.append((ar, hop + 1, "category"))

        # assemble the final structure
        return {
            "start_query": query,
            "depth": depth,
            "papers": papers,
            "edges": edges,
            "visited_order": visited_order,
            "warnings": warnings,
        }

# Example usage:
if __name__ == "__main__":
    scraper = ArxivScraper(results_per_query=5, delay_between_requests=0.2)
    res = scraper.scrape("graph neural networks", depth="moderate")
    print("Visited count:", len(res["visited_order"]))
    for k in res["visited_order"][:5]:
        p = res["papers"][k]
        print(k, "-", p["title"], "|", p["authors"][:3])


  for r in search.results():


Visited count: 24
2307.00865v1 - A Survey on Graph Classification and Link Prediction based on GNN | ['Xingyu Liu', 'Juan Chen', 'Quan Wen']
2007.06559v2 - Graph Structure of Neural Networks | ['Jiaxuan You', 'Jure Leskovec', 'Kaiming He']
2011.01412v1 - Sampling and Recovery of Graph Signals based on Graph Neural Networks | ['Siheng Chen', 'Maosen Li', 'Ya Zhang']
1908.00187v1 - Graph Neural Networks for Small Graph and Giant Network Representation Learning: An Overview | ['Jiawei Zhang']
1902.10042v2 - Graph Neural Processes: Towards Bayesian Graph Neural Networks | ['Andrew Carr', 'David Wingate']


### Semantic Scholar

In [4]:
"""
SemanticScholarScraper

Requires: pip install requests

This class talks to the Semantic Scholar Academic Graph API (graph/v1).
It supports:
 - start from a textual query (search) OR a paper id/DOI
 - expand via citations, references, authors' other papers, and recommendations
 - depth presets: shallow / moderate / deep (BFS on citation/reference graph)
 - simple rate-limit/backoff handling and optional API key support

Notes:
 - Replace 'YOUR_API_KEY' with a real key if you have one (recommended).
 - Respect Semantic Scholar's terms of use & rate limits. See docs.
"""

import time
import requests
from typing import Dict, Any, List, Set, Tuple, Optional
from collections import deque

class SemanticScholarScraper:
    # (max_hops, max_nodes_total, max_links_per_node)
    DEPTH_CONFIG = {
        "shallow":  (0,  2,  20),  # only seed results (2 papers)
        "moderate": (1, 25,  30),  # follow citations/references 1 hop
        "deep":     (2, 80,  40),  # up to 2 hops (be careful with quota)
    }

    BASE = "https://api.semanticscholar.org/graph/v1"

    def __init__(
        self,
        api_key: Optional[str] = None,
        sleep: float = 0.0,
        max_retries: int = 5,
        backoff_factor: float = 1.5,
        user_agent: str = "LangGraph-SemanticScholarScraper/1.0",
    ):
        """
        api_key: optional x-api-key for higher rate limits (recommended).
        sleep: base sleep between requests (if you want additional throttling).
        max_retries/backoff_factor: for 429/5xx handling.
        """
        self.api_key = api_key
        self.sleep = float(sleep)
        self.max_retries = int(max_retries)
        self.backoff_factor = float(backoff_factor)
        self.session = requests.Session()
        self.session.headers.update({"User-Agent": user_agent})
        if api_key:
            self.session.headers.update({"x-api-key": api_key})

    # ---- low-level request with backoff ----
    def _request(self, method: str, url: str, params: Dict[str, Any] = None, json: Dict[str, Any] = None):
        delay = self.sleep
        for attempt in range(1, self.max_retries + 1):
            if delay:
                time.sleep(delay)
            try:
                resp = self.session.request(method, url, params=params, json=json, timeout=30)
            except requests.RequestException as e:
                # network-level error -> exponential backoff and retry
                delay = max(delay * self.backoff_factor, 0.5) if delay else 0.5
                continue
            # handle 200
            if resp.status_code == 200:
                # return json or empty dict
                try:
                    return resp.json()
                except ValueError:
                    return {}
            # handle rate limit
            if resp.status_code in (429, 503):
                # backoff then retry
                # try to parse Retry-After header
                ra = resp.headers.get("Retry-After")
                if ra:
                    try:
                        sleep_for = float(ra)
                    except Exception:
                        sleep_for = delay if delay else 1.0
                else:
                    sleep_for = delay if delay else 1.0
                # increase delay for next round
                delay = (delay or 1.0) * self.backoff_factor
                time.sleep(sleep_for)
                continue
            # client error -> return None
            return None
        # exhausted retries
        return None

    # ---- convenience helpers for API endpoints ----
    def _search_papers(self, query: str, limit: int = 5) -> List[Dict[str, Any]]:
        """
        uses: GET /paper/search?query=...&limit=...
        fields: we request a compact set of fields helpful for graph navigation
        """
        url = f"{self.BASE}/paper/search"
        params = {
            "query": query,
            "limit": limit,
            "fields": "paperId,title,abstract,authors,year,venue,doi,externalIds,fieldsOfStudy,url,citationCount,referenceCount"
        }
        data = self._request("GET", url, params=params)
        if not data:
            return []
        # Semantic Scholar returns 'data' key with list of hits
        return data.get("data", []) if isinstance(data, dict) else []

    def _get_paper(self, paper_id: str, fields: str = None) -> Optional[Dict[str, Any]]:
        """
        GET /paper/{paperId}?fields=...
        Accepts arXiv id, DOI, S2 id etc.
        """
        if fields is None:
            fields = ",".join([
                "paperId","title","abstract","authors","year","venue","doi","externalIds",
                "fieldsOfStudy","url","citationCount","referenceCount","influentialCitationCount"
            ])
        url = f"{self.BASE}/paper/{requests.utils.quote(paper_id, safe='')}"
        params = {"fields": fields}
        return self._request("GET", url, params=params)

    def _get_related(self, paper_id: str, relation: str = "citations", limit: int = 25) -> List[Dict[str, Any]]:
        """
        relation: 'citations' or 'references' or 'recommendations' (recommendations endpoint differs)
        For citations/references: GET /paper/{paperId}/citations?fields=paperId,title,authors,...
        """
        if relation not in ("citations", "references"):
            raise ValueError("relation must be 'citations' or 'references'")
        url = f"{self.BASE}/paper/{requests.utils.quote(paper_id, safe='')}/{relation}"
        params = {
            "limit": limit,
            "fields": "citingPaper.paperId,citingPaper.title,citingPaper.authors,citingPaper.year,citingPaper.venue,citingPaper.doi,citingPaper.citationCount"
        } if relation == "citations" else {
            "limit": limit,
            "fields": "referencedPaper.paperId,referencedPaper.title,referencedPaper.authors,referencedPaper.year,referencedPaper.venue,referencedPaper.doi,referencedPaper.citationCount"
        }
        data = self._request("GET", url, params=params)
        if not data:
            return []
        # Structure differs slightly: items may be wrapped (citingPaper/referencedPaper)
        items = data.get("data", []) if isinstance(data, dict) else []
        flattened = []
        for it in items:
            # try both shapes
            if "citingPaper" in it:
                flattened.append(it["citingPaper"])
            elif "referencedPaper" in it:
                flattened.append(it["referencedPaper"])
            else:
                flattened.append(it)
        return flattened

    def _get_author_papers(self, author_id: str, limit: int = 10) -> List[Dict[str, Any]]:
        """
        GET /author/{authorId}/papers - returns papers by an author.
        We'll request compact fields.
        """
        url = f"{self.BASE}/author/{requests.utils.quote(author_id, safe='')}/papers"
        params = {
            "limit": limit,
            "fields": "paperId,title,year,venue,doi,citationCount"
        }
        data = self._request("GET", url, params=params)
        if not data:
            return []
        return data.get("data", []) if isinstance(data, dict) else []

    # ---- top-level scrape API ----
    def scrape(
        self,
        query_or_id: str,
        depth: str = "moderate",
        use_citations: bool = True,
        use_references: bool = True,
        use_author_papers: bool = True,
        use_recommendations: bool = False,
        max_per_seed_override: Optional[int] = None,
        max_nodes_override: Optional[int] = None,
    ) -> Dict[str, Any]:
        """
        query_or_id: free-text query (search) OR a paper id/DOI (detect heuristically).
        depth: shallow/moderate/deep
        Returns:
          {
            "start": query_or_id,
            "depth": depth,
            "papers": { paperId: metadata_dict },
            "edges": [ (from_paperId, to_paperId, relation) ],
            "visited_order": [...],
            "warnings": [...]
          }
        """
        if depth not in self.DEPTH_CONFIG:
            raise ValueError(f"depth must be one of {list(self.DEPTH_CONFIG.keys())}")

        max_hops, cfg_max_nodes, cfg_max_per_seed = self.DEPTH_CONFIG[depth]
        if max_per_seed_override is not None:
            cfg_max_per_seed = int(max_per_seed_override)
        if max_nodes_override is not None:
            cfg_max_nodes = int(max_nodes_override)

        warnings: List[str] = []
        papers: Dict[str, Dict[str, Any]] = {}
        edges: List[Tuple[str, str, str]] = []
        visited: Set[str] = set()
        visited_order: List[str] = []

        # Decide if input is a paper id (contains '/' or ':' for DOI/arXiv or looks like S2 id)
        is_paper_id = any(tok in query_or_id for tok in ("/", ":", "arXiv")) or len(query_or_id) > 10 and " " not in query_or_id

        seeds: List[Dict[str, Any]] = []
        if is_paper_id:
            res = self._get_paper(query_or_id)
            if res:
                seeds.append(res)
            else:
                # fallback: attempt search
                seeds = self._search_papers(query_or_id, limit=cfg_max_per_seed)
        else:
            seeds = self._search_papers(query_or_id, limit=cfg_max_per_seed)

        if not seeds:
            warnings.append(f"No seeds found for '{query_or_id}'")
            return {
                "start": query_or_id,
                "depth": depth,
                "papers": papers,
                "edges": edges,
                "visited_order": visited_order,
                "warnings": warnings
            }

        # BFS queue of tuples (paper_meta_dict, hop, relation_from)
        queue = deque()
        for s in seeds[:cfg_max_per_seed]:
            queue.append((s, 0, "seed"))

        def get_key(pp: Dict[str, Any]) -> Optional[str]:
            return pp.get("paperId") or pp.get("doi") or pp.get("url") or pp.get("title")

        while queue and len(papers) < cfg_max_nodes:
            node_meta, hop, relation_from = queue.popleft()
            key = get_key(node_meta)
            if not key:
                continue
            if key in visited:
                continue
            # normalize: if only seed was from search, fetch full paper to get counts & ids
            if "paperId" not in node_meta or not node_meta.get("paperId"):
                # try fetch by title/doi via search to get canonical id
                # try short search by title (1 result)
                title = node_meta.get("title")
                if title:
                    search_hits = self._search_papers(title, limit=1)
                    if search_hits:
                        node_meta = search_hits[0]
            # nunmp
            pid = node_meta.get("paperId") or node_meta.get("doi") or node_meta.get("url") or node_meta.get("title")
            if not pid:
                continue

            # Fetch full paper metadata if not detailed already
            full = self._get_paper(pid)
            if full is None:
                # keep minimal meta if fetching failed
                full = node_meta

            # store node
            papers_key = full.get("paperId") or full.get("doi") or full.get("url") or full.get("title")
            if not papers_key:
                continue
            papers[papers_key] = {
                "paperId": full.get("paperId"),
                "title": full.get("title"),
                "abstract": full.get("abstract"),
                "authors": [{"name": a.get("name"), "authorId": a.get("authorId")} for a in full.get("authors", [])] if isinstance(full.get("authors"), list) else [],
                "year": full.get("year"),
                "venue": full.get("venue"),
                "doi": full.get("doi"),
                "url": full.get("url"),
                "citationCount": full.get("citationCount"),
                "referenceCount": full.get("referenceCount"),
                "fieldsOfStudy": full.get("fieldsOfStudy"),
                "hop": hop,
            }
            visited.add(papers_key)
            visited_order.append(papers_key)

            # record edge from seed query if needed
            if relation_from == "seed":
                edges.append((f"query::{query_or_id}", papers_key, "search_seed"))

            # If we can expand, push citations/references/author papers
            if hop < max_hops:
                # 1) citations
                if use_citations:
                    try:
                        related = self._get_related(papers_key, relation="citations", limit=cfg_max_per_seed)
                    except Exception:
                        related = []
                    for r in related[:cfg_max_per_seed]:
                        k2 = get_key(r)
                        if not k2 or k2 in visited:
                            continue
                        edges.append((papers_key, r.get("paperId") or k2, "cites_me"))  # r cites this paper
                        if len(papers) + len(queue) < cfg_max_nodes:
                            queue.append((r, hop+1, "citation"))

                # 2) references
                if use_references:
                    try:
                        related = self._get_related(papers_key, relation="references", limit=cfg_max_per_seed)
                    except Exception:
                        related = []
                    for r in related[:cfg_max_per_seed]:
                        k2 = get_key(r)
                        if not k2 or k2 in visited:
                            continue
                        edges.append((papers_key, r.get("paperId") or k2, "references"))  # this paper -> referenced paper
                        if len(papers) + len(queue) < cfg_max_nodes:
                            queue.append((r, hop+1, "reference"))

                # 3) author papers (optional)
                if use_author_papers:
                    for a in papers[papers_key].get("authors", [])[:3]:  # limit to top 3 authors to prevent explosion
                        aid = a.get("authorId")
                        if not aid:
                            continue
                        try:
                            author_papers = self._get_author_papers(aid, limit=min(10, cfg_max_per_seed))
                        except Exception:
                            author_papers = []
                        for ap in author_papers[:cfg_max_per_seed]:
                            k2 = get_key(ap)
                            if not k2 or k2 in visited:
                                continue
                            edges.append((papers_key, ap.get("paperId") or k2, "author"))
                            if len(papers) + len(queue) < cfg_max_nodes:
                                queue.append((ap, hop+1, "author"))

                # 4) recommendations (optional, omitted by default)
                if use_recommendations:
                    # The recommendations endpoint is separate; if you want it, you can add calls here.
                    pass

        return {
            "start": query_or_id,
            "depth": depth,
            "papers": papers,
            "edges": edges,
            "visited_order": visited_order,
            "warnings": warnings
        }
