# Typical activation potential: analysis of each proof N against context

This notebook implements the *typical activation potential* from `main.tex`.

Notation (from `main.tex`):
- Phi_T(r, C) is the typical activation potential for resource r in context C.
- Phi_h(r, C) is the historical component of activation potential.
- Phi_Tc(r, C) is the typical co-occurrence component (Phi_{T_c}).
- delta (DELTA in code) is the weight in Phi_T = delta * Phi_h + (1 - delta) * Phi_Tc, with delta in [0, 1].

Assumptions (matching the existing SPARQL queries):

Assumption: The SPARQL queries used in this notebook were verified to implement the paper's definitions of
context C and "together" (per definition/postulate/common notion or proposition/proof).
If query scopes are changed, results may no longer match the formulas in `main.tex`.
- Context C for proof n are the resources in definitions, postulates, common notions, propositions up to n (included), and proofs up to n-1 (included)
- "Together" for co-occurrence means resources that co-occur within the same definition, postulate, common notion, proposition, or proof.
- Phi_h is computed from history queries; Phi_Tc is computed from Hebbian pair degrees (co-occurrence links); Phi_T uses the weighted sum above.
- Empty denominators yield 0 for the corresponding potential.
- TYPE_SELECTION toggles type-based co-occurrence for propositions/proofs (relation/operation types) when true.


# SPEC

GOALS: 
(1) Apply the description of typical activation potential provided in main.tex to compute the activation potential of resources used in proof N against the context of resources used in definitions, postulates, common notions, propositions up to N (included), and proofs up to N-1 (included; if N=1, then there is no proof to include in the context). 
(2) Note when proof N uses _directly_ a resource that has not appeared anywhere in the context (i.e. in the definitions, postulates, common notions, propositions up to N included, and proofs up to N-1 included). For proof N, use direct_template_propositions_proofs if TYPE_SELECTION = False and direct_template_last_item_types if TYPE_SELECTION = True. Define new_resources as the subset of the direct‑usage set from the ‘HOW TO FIND RESOURCES IN PROOF N’ step that does not occur in the context.

PREPARATION: review 
    - main.tex (for the definition of typical activation potential), 
    - analyses.ipynb (for strategies to compare proofs with their context of previous material), and 
    - typical.ipynb (for an algorithm implementing typical activation potential)
    - ./modules/queries.py (for SPARQL queries used to work with ontological resources)

NOTES:
    - cache results of SPARQL queries for faster re-runs (QueryRunner)

INPUTS: PROOF_N, DELTA, HISTORY_WEIGHTS (exactly 3 weights required), TYPE_SELECTION

HOW TO CREATE THE CONTEXT OF PROOF N:
- directly used resources: use queries.direct_definitions(), queries.direct_postulates(), queries.direct_common_notions(), and queries.direct_template_propositions_proofs() for propositions up to N (included) and proofs up to N-1 (included) [for propositions and proofs, one need to state the IRIs as VALUES in the SPARQL queries]
- hierarchically used resources: use queries.hierarchical_definitions(), queries.hierarchical_postulates(), queries.hierarchical_common_notions(), and hierarchical_template_propositions_proofs for propositions up to N (included) and proofs up to N-1 (included) [for propositions and proofs, one need to state the IRIs as VALUES in the SPARQL queries]
- mereologically used resources: use queries.mereological_definitions(), queries.mereological_postulates(), queries.mereological_common_notions, queries.mereological_template_propositions_proofs() for propositions up to N (included) and proofs up to N-1 (included) [for propositions and proofs, one need to state the IRIs as VALUES in the SPARQL queries]
- hebbian co-occurrence: use queries.hebb_definitions(), queries.hebb_postulates(), queries.hebb_common_notions(), and queries.hebb_template_propositions_proofs() for propositions up to N (included) and proofs up to N-1 (included) [for propositions and proofs, one need to state the IRIs as VALUES in the SPARQL queries].

EDGE CASE:
- N = 1: the context includes only definitions, postulates, common notions, and proposition 1.

HOW TO FIND RESOURCES IN PROOF N:
- if TYPE_SELECTION = False: use queries.direct_template_propositions_proofs() for proof N (the list of values for the query should contain only the IRI of proof N)
- if TYPE_SELECTION = True: use queries.direct_template_last_item_types() for proof N (the list of values for the query should contain only the IRI of proof N)

OUTPUT: 
    - csv with columns "proof", "resource_used_in_proof", "number_of_resources_used_in_proof", "phi_h", "phi_tc", "phi_t", "new_resources", "number_of_new_resources"
    - include only resources used in proof N (for each N) in the column "resource_used_in_proof"
    - output path: put the csv file in the "./output" folder
    - output naming convention: typical_weights-<history_weights>_type-<true_or_false>_<timestamp>.csv
    - "number_of_resources_used_in_proof" is a column of scalars that counts how many resources a proof contains (set-size, not multiplicity; it is a per-proof piece of data but we keep it in this csv)
    - "new_resources" contains a list of resources that are new in proof N (never used before)
    - "new_resources", "number_of_new_resources" are per-proof data; however I want to include them in a single csv; therefore, it is ok to repeat these data on several rows for the same proof

In [None]:
from __future__ import annotations

import datetime as dt
from pathlib import Path

import pandas as pd

from modules import rdf_utils, file_utils
from modules.query_runner import QueryRunner

In [None]:
# STEP 0: define parameters
DELTA = 0.5  # delta in Phi_T = delta * Phi_h + (1 - delta) * Phi_Tc
HISTORY_WEIGHTS = (6 / 9, 1 / 9, 2 / 9)  # Phi_h weights: direct, hierarchical, mereological
TYPE_SELECTION = False  # toggle type-based co-occurrence in propositions/proofs

# NOTE: Although the SPEC lists PROOF_N as an input, 
# this notebook runs a batch analysis by iterating over a range of proofs. 
# We therefore use START_PROPOSITION/END_PROPOSITION and 
# treat each iteration as the current PROOF_N
START_PROPOSITION = 1
END_PROPOSITION = 48

def validate_params() -> None:
    if not (0.0 <= DELTA <= 1.0):
        raise ValueError(f"DELTA must be in [0, 1], got {DELTA}.")
    if len(HISTORY_WEIGHTS) != 3:
        raise ValueError(
            f"HISTORY_WEIGHTS must have length 3, got {len(HISTORY_WEIGHTS)}."
        )
    if any((w < 0.0 or w > 1.0) for w in HISTORY_WEIGHTS):
        raise ValueError("All HISTORY_WEIGHTS must be in [0, 1].")
    total = sum(HISTORY_WEIGHTS)
    if abs(total - 1.0) > 1e-9:
        raise ValueError(f"HISTORY_WEIGHTS must sum to 1, got {total}.")

validate_params()

OUTPUT_DIR = Path('output')
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

In [None]:
# STEP 1: Load latest ontology TTL and reuse a cached QueryRunner
INPUT_TTL = file_utils.latest_file(folder=Path('ontologies'), filename_fragment='ontology_', extension='ttl')
graph = rdf_utils.load_graph(INPUT_TTL)
runner = QueryRunner(graph)

Note on reuse: `rdf_utils.sparql_to_concat_df` could replace the local aggregation helpers because it is query-agnostic,
but `calculate_activation_potential.hebb` returns pair-level potentials (this notebook needs per-resource degrees),
and `calculate_activation_potential.history` scopes propositions/proofs up to N-1 and does not support type selection,
so they are not drop-in fits for this spec.


In [None]:
# STEP 2: function(s) to create context for proof N
# - Implement a function that, for a given N, builds the context set C.
# - Use the SPEC’s query families: direct, hierarchical, mereological, hebbian.
# - Apply each family per source type: definitions, postulates, common notions, propositions, proofs.
# - Collect hebbian co-occurrence data here (queries.hebb_*) for Phi_Tc.
# - Sources: definitions, postulates, common notions (all); propositions <= N; proofs <= N-1 (none if N=1).
# - For propositions/proofs, pass explicit IRI lists as VALUES to the queries.
# - Return de-duplicated context resources plus the hebb co-occurrence set.
# - Keep it pure: no printing, no file I/O; just compute and return.

from typing import Iterable

from modules import queries


def _iri_for_proposition(proof_n: int) -> str:
    return f"<https://www.foom.com/core#proposition_{proof_n}>"


def _iri_for_proof(proof_n: int) -> str:
    return f"<https://www.foom.com/core#proof_{proof_n}>"


def _iris_for_context(proof_n: int) -> list[str]:
    propositions = [_iri_for_proposition(i) for i in range(1, proof_n + 1)]
    proofs = [_iri_for_proof(i) for i in range(1, proof_n)]
    return propositions + proofs


def _values_clause(values: Iterable[str]) -> str | None:
    tokens = [value for value in values if value]
    if not tokens:
        return None
    return " ".join(tokens)


def _fetch_sum_links(runner: QueryRunner, queries_to_run: Iterable[str]) -> pd.DataFrame:
    frames = []
    for query in queries_to_run:
        df = runner.fetch(query)
        if df.empty or "o" not in df.columns:
            continue
        if "links" in df.columns:
            frame = df[["o", "links"]].copy()
        else:
            frame = df[["o"]].copy()
            frame["links"] = 1
        frames.append(frame)
    if not frames:
        return pd.DataFrame(columns=["o", "links"])
    return (
        pd.concat(frames, ignore_index=True)
        .groupby("o", as_index=False)["links"]
        .sum()
    )


def _fetch_hebb_links(runner: QueryRunner, queries_to_run: Iterable[str]) -> pd.DataFrame:
    frames = []
    for query in queries_to_run:
        df = runner.fetch(query)
        if df.empty or "o1" not in df.columns or "o2" not in df.columns:
            continue
        if "links" in df.columns:
            frame = df[["o1", "o2", "links"]].copy()
        else:
            frame = df[["o1", "o2"]].copy()
            frame["links"] = 1
        frames.append(frame)
    if not frames:
        return pd.DataFrame(columns=["o1", "o2", "links"])
    return (
        pd.concat(frames, ignore_index=True)
        .groupby(["o1", "o2"], as_index=False)["links"]
        .sum()
    )


def build_context_for_proof(
    proof_n: int,
    *,
    runner: QueryRunner,
    type_selection: bool,
) -> tuple[set[str], dict[str, pd.DataFrame], pd.DataFrame]:
    values = _values_clause(_iris_for_context(proof_n))

    direct_queries = [
        queries.direct_definitions(),
        queries.direct_postulates(),
        queries.direct_common_notions(),
    ]
    if values:
        direct_queries.append(queries.direct_template_propositions_proofs(values))
    direct_df = _fetch_sum_links(runner, direct_queries)

    hierarchical_queries = [
        queries.hierarchical_definitions(),
        queries.hierarchical_postulates(),
        queries.hierarchical_common_notions(),
    ]
    if values:
        hierarchical_queries.append(queries.hierarchical_template_propositions_proofs(values))
    hierarchical_df = _fetch_sum_links(runner, hierarchical_queries)

    mereological_queries = [
        queries.mereological_definitions(),
        queries.mereological_postulates(),
        queries.mereological_common_notions(),
    ]
    if values:
        mereological_queries.append(queries.mereological_template_propositions_proofs(values))
    mereological_df = _fetch_sum_links(runner, mereological_queries)

    hebb_queries = [
        queries.hebb_definitions(),
        queries.hebb_postulates(),
        queries.hebb_common_notions(),
    ]
    if values:
        if type_selection:
            hebb_queries.append(queries.hebb_template_propositions_proofs_types(values))
        else:
            hebb_queries.append(queries.hebb_template_propositions_proofs(values))
    hebb_df = _fetch_hebb_links(runner, hebb_queries)

    context_resources: set[str] = set()
    for df in (direct_df, hierarchical_df, mereological_df):
        if not df.empty and "o" in df.columns:
            context_resources.update(df["o"].dropna().astype(str))

    family_dfs = {
        "direct": direct_df,
        "hierarchical": hierarchical_df,
        "mereological": mereological_df,
    }
    return context_resources, family_dfs, hebb_df


In [None]:
# STEP 3: function(s) to gather resources from proof N
# - Implement a function that, for a given N, returns the direct-usage set for proof N.
# - If TYPE_SELECTION is False: use queries.direct_template_propositions_proofs() on proof N.
# - If TYPE_SELECTION is True: use queries.direct_template_last_item_types() on proof N.
# - Pass a single proof IRI as VALUES.
# - Keep it pure: no printing, no file I/O; just compute and return.

def resources_in_proof(
    proof_n: int,
    *,
    runner: QueryRunner,
    type_selection: bool,
) -> set[str]:
    proof_iri = _iri_for_proof(proof_n)
    if type_selection:
        query = queries.direct_template_last_item_types(proof_iri)
    else:
        query = queries.direct_template_propositions_proofs(proof_iri)
    df = runner.fetch(query)
    if "o" not in df.columns:
        return set()
    return {str(value) for value in df["o"].dropna().astype(str)}


In [None]:
# STEP 4: function(s) to calculate the historical activation potential Phi_h
# - Implement a function that computes Phi_h for each resource used in proof N.
# - Use HISTORY_WEIGHTS for direct/hierarchical/mereological components per SPEC.
# - Compare only against the context built in STEP 2 for the same N.
# - If a denominator is empty, return 0 for that component.
# - Keep it pure: no printing, no file I/O; just compute and return.

def compute_phi_h(
    proof_resources: Iterable[str],
    family_dfs: dict[str, pd.DataFrame],
    weights: tuple[float, float, float],
) -> pd.DataFrame:
    resources = sorted(set(proof_resources))
    if not resources:
        return pd.DataFrame(columns=["resource_used_in_proof", "phi_h"])

    phi_h = {resource: 0.0 for resource in resources}
    for family_name, weight in zip(
        ("direct", "hierarchical", "mereological"),
        weights,
    ):
        df = family_dfs.get(family_name, pd.DataFrame())
        if df.empty or "links" not in df.columns:
            continue
        total_links = float(df["links"].sum())
        if total_links == 0:
            continue
        link_map = {
            str(row["o"]): float(row["links"])
            for _, row in df.iterrows()
        }
        for resource in resources:
            links = link_map.get(resource, 0.0)
            if links:
                phi_h[resource] += (links * weight) / total_links

    return pd.DataFrame({
        "resource_used_in_proof": resources,
        "phi_h": [phi_h[resource] for resource in resources],
    })


In [None]:
# STEP 5: function(s) to calculate the co-occurrence activation potential Phi_Tc
# - Implement a function that computes Phi_Tc for each resource used in proof N.
# - Use hebbian co-occurrence data from STEP 2 (queries.hebb_*).
# - Compare only against the context built in STEP 2 for the same N.
# - If a denominator is empty, return 0.
# - Keep it pure: no printing, no file I/O; just compute and return.

def compute_phi_tc(
    proof_resources: Iterable[str],
    hebb_df: pd.DataFrame,
) -> pd.DataFrame:
    resources = sorted(set(proof_resources))
    if not resources:
        return pd.DataFrame(columns=["resource_used_in_proof", "phi_tc"])

    degrees: dict[str, float] = {}
    if not hebb_df.empty and "links" in hebb_df.columns:
        for _, row in hebb_df.iterrows():
            o1 = str(row["o1"])
            o2 = str(row["o2"])
            weight = float(row["links"])
            degrees[o1] = degrees.get(o1, 0.0) + weight
            degrees[o2] = degrees.get(o2, 0.0) + weight

    total_degree = sum(degrees.values())
    if total_degree == 0:
        phi_tc = {resource: 0.0 for resource in resources}
    else:
        phi_tc = {
            resource: degrees.get(resource, 0.0) / total_degree
            for resource in resources
        }

    return pd.DataFrame({
        "resource_used_in_proof": resources,
        "phi_tc": [phi_tc[resource] for resource in resources],
    })


In [None]:
# STEP 6: function(s) to calculate the total activation potential Phi_T
# - Implement a function that combines Phi_h and Phi_Tc.
# - Phi_T = DELTA * Phi_h + (1 - DELTA) * Phi_Tc.
# - Compute only for resources used in proof N.
# - Keep it pure: no printing, no file I/O; just compute and return.

def compute_phi_t(
    proof_resources: Iterable[str],
    phi_h_df: pd.DataFrame,
    phi_tc_df: pd.DataFrame,
    delta: float,
) -> pd.DataFrame:
    resources = sorted(set(proof_resources))
    if not resources:
        return pd.DataFrame(columns=["resource_used_in_proof", "phi_t"])

    phi_h_map = {}
    if not phi_h_df.empty:
        phi_h_map = dict(zip(phi_h_df["resource_used_in_proof"], phi_h_df["phi_h"]))
    phi_tc_map = {}
    if not phi_tc_df.empty:
        phi_tc_map = dict(zip(phi_tc_df["resource_used_in_proof"], phi_tc_df["phi_tc"]))

    phi_t = {
        resource: delta * float(phi_h_map.get(resource, 0.0))
        + (1 - delta) * float(phi_tc_map.get(resource, 0.0))
        for resource in resources
    }
    return pd.DataFrame({
        "resource_used_in_proof": resources,
        "phi_t": [phi_t[resource] for resource in resources],
    })


In [None]:
# STEP 7: function(s) to find resources in proof N that are not in the context
# - Implement a function that computes new_resources for proof N.
# - new_resources = direct-usage set from STEP 3 minus context from STEP 2.
# - Return the list plus a count for output convenience.
# - Keep it pure: no printing, no file I/O; just compute and return.

def compute_new_resources(
    proof_resources: Iterable[str],
    context_resources: Iterable[str],
) -> list[str]:
    proof_set = set(proof_resources)
    context_set = set(context_resources)
    return sorted(proof_set - context_set)


In [None]:
# STEP 8: run a loop with all steps for proofs N = START_PROPOSITION to END_PROPOSITION
# - For each N, build context (STEP 2), proof resources (STEP 3), Phi_h (STEP 4), Phi_Tc (STEP 5), Phi_T (STEP 6), new_resources (STEP 7).
# - Handle the N=1 edge case (no proofs in context).
# - Accumulate row data for each resource used in proof N.

results_rows: list[dict[str, object]] = []
for proof_n in range(START_PROPOSITION, END_PROPOSITION + 1):
    context_resources, family_dfs, hebb_df = build_context_for_proof(
        proof_n,
        runner=runner,
        type_selection=TYPE_SELECTION,
    )
    proof_resources = resources_in_proof(
        proof_n,
        runner=runner,
        type_selection=TYPE_SELECTION,
    )
    proof_resources_sorted = sorted(proof_resources)

    phi_h_df = compute_phi_h(proof_resources_sorted, family_dfs, HISTORY_WEIGHTS)
    phi_tc_df = compute_phi_tc(proof_resources_sorted, hebb_df)
    phi_t_df = compute_phi_t(proof_resources_sorted, phi_h_df, phi_tc_df, DELTA)

    new_resources = compute_new_resources(proof_resources_sorted, context_resources)
    proof_count = len(proof_resources_sorted)
    new_count = len(new_resources)

    phi_h_map = dict(zip(phi_h_df["resource_used_in_proof"], phi_h_df["phi_h"]))
    phi_tc_map = dict(zip(phi_tc_df["resource_used_in_proof"], phi_tc_df["phi_tc"]))
    phi_t_map = dict(zip(phi_t_df["resource_used_in_proof"], phi_t_df["phi_t"]))

    for resource in proof_resources_sorted:
        results_rows.append({
            "proof": proof_n,
            "resource_used_in_proof": resource,
            "number_of_resources_used_in_proof": proof_count,
            "phi_h": float(phi_h_map.get(resource, 0.0)),
            "phi_tc": float(phi_tc_map.get(resource, 0.0)),
            "phi_t": float(phi_t_map.get(resource, 0.0)),
            "new_resources": new_resources,
            "number_of_new_resources": new_count,
        })


In [None]:
# STEP 9: print df of the results; print a summary of the results; save results to CSV
# - Build a DataFrame with columns: proof, resource_used_in_proof, number_of_resources_used_in_proof, phi_h, phi_tc, phi_t, new_resources, number_of_new_resources.
# - Repeat per-proof scalars (counts, new_resources) across each resource row.
# - Serialize history_weights as hyphen-joined decimals (e.g., 0.6667-0.1111-0.2222).
# - Use timestamp format YYYYMMDD-HHMMSS and save to ./output as:
#   typical_weights-<history_weights>_type-<true_or_false>_<timestamp>.csv

results_df = pd.DataFrame(
    results_rows,
    columns=[
        "proof",
        "resource_used_in_proof",
        "number_of_resources_used_in_proof",
        "phi_h",
        "phi_tc",
        "phi_t",
        "new_resources",
        "number_of_new_resources",
    ],
)

history_weights_label = "-".join(f"{w:.4f}" for w in HISTORY_WEIGHTS)
timestamp = dt.datetime.now().strftime("%Y%m%d-%H%M%S")
type_label = str(TYPE_SELECTION).lower()
output_path = OUTPUT_DIR / f"typical_weights-{history_weights_label}_type-{type_label}_{timestamp}.csv"

print(results_df)
print(f"Total proofs processed: {END_PROPOSITION - START_PROPOSITION + 1}")
print(f"Total rows: {len(results_df)}")

results_df.to_csv(output_path, index=False)
print(f"Saved: {output_path}")


Note on TYPE_SELECTION: in analyses.ipynb, type_selection only switches proof-level direct resources to relation/operation types,
while history and co-occurrence stay concept-based. In typical_proof.ipynb, TYPE_SELECTION switches both proof resources
and Hebbian co-occurrence queries to type-based variants; history context remains concept-based.
