# Typical activation potential: analysis of each proof N against context

This notebook implements the *typical activation potential* from `main.tex`.

Notation (from `main.tex`):
- Phi_T(r, C) is the typical activation potential for resource r in context C.
- Phi_h(r, C) is the historical component of activation potential.
- Phi_Tc(r, C) is the typical co-occurrence component (Phi_{T_c}).
- delta (DELTA in code) is the weight in Phi_T = delta * Phi_h + (1 - delta) * Phi_Tc, with delta in [0, 1].

Assumptions (matching the existing SPARQL queries):

Assumption: The SPARQL queries used in this notebook were verified to implement the paper's definitions of
context C and "together" (per definition/postulate/common notion or proposition/proof).
If query scopes are changed, results may no longer match the formulas in `main.tex`.
- Context C for proof n are the resources in definitions, postulates, common notions, propositions up to n (included), and proofs up to n-1 (included)
- "Together" for co-occurrence means resources that co-occur within the same definition, postulate, common notion, proposition, or proof.
- Phi_h is computed from history queries; Phi_Tc is computed from Hebbian pair degrees (co-occurrence links); Phi_T uses the weighted sum above.
- Empty denominators yield 0 for the corresponding potential.
- TYPE_SELECTION toggles type-based co-occurrence for propositions/proofs (relation/operation types) when true.


# SPEC

GOALS: 
(1) Apply the description of typical activation potential provided in main.tex to compute the activation potential of resources used in proof N against the context of resources used in definitions, postulates, common notions, propositions up to N (included), and proofs up to N-1 (included; if N=1, then there is no proof to include in the context). 
(2) Note when proof N uses _directly_ a resource that has not appeared anywhere in the context (i.e. in the definitions, postulates, common notions, propositions up to N included, and proofs up to N-1 included). For proof N, use direct_template_propositions_proofs if TYPE_SELECTION = False and direct_template_last_item_types if TYPE_SELECTION = True. Define new_resources as the subset of the direct‑usage set from the ‘HOW TO FIND RESOURCES IN PROOF N’ step that does not occur in the context.

PREPARATION: review 
    - main.tex (for the definition of typical activation potential), 
    - analyses.ipynb (for strategies to compare proofs with their context of previous material), and 
    - typical.ipynb (for an algorithm implementing typical activation potential)
    - ./modules/queries.py (for SPARQL queries used to work with ontological resources)

NOTES:
    - cache results of SPARQL queries for faster re-runs (QueryRunner)

INPUTS: PROOF_N, DELTA, HISTORY_WEIGHTS (exactly 3 weights required), TYPE_SELECTION

HOW TO CREATE THE CONTEXT OF PROOF N:
- directly used resources: use queries.direct_definitions(), queries.direct_postulates(), queries.direct_common_notions(), and queries.direct_template_propositions_proofs() for propositions up to N (included) and proofs up to N-1 (included) [for propositions and proofs, one need to state the IRIs as VALUES in the SPARQL queries]
- hierarchically used resources: use queries.hierarchical_definitions(), queries.hierarchical_postulates(), queries.hierarchical_common_notions(), and hierarchical_template_propositions_proofs for propositions up to N (included) and proofs up to N-1 (included) [for propositions and proofs, one need to state the IRIs as VALUES in the SPARQL queries]
- mereologically used resources: use queries.mereological_definitions(), queries.mereological_postulates(), queries.mereological_common_notions, queries.mereological_template_propositions_proofs() for propositions up to N (included) and proofs up to N-1 (included) [for propositions and proofs, one need to state the IRIs as VALUES in the SPARQL queries]
- hebbian co-occurrence: use queries.hebb_definitions(), queries.hebb_postulates(), queries.hebb_common_notions(), and queries.hebb_template_propositions_proofs() for propositions up to N (included) and proofs up to N-1 (included) [for propositions and proofs, one need to state the IRIs as VALUES in the SPARQL queries].

EDGE CASE:
- N = 1: the context includes only definitions, postulates, common notions, and proposition 1.

HOW TO FIND RESOURCES IN PROOF N:
- if TYPE_SELECTION = False: use queries.direct_template_propositions_proofs() for proof N (the list of values for the query should contain only the IRI of proof N)
- if TYPE_SELECTION = True: use queries.direct_template_last_item_types() for proof N (the list of values for the query should contain only the IRI of proof N)

OUTPUT: 
    - csv with columns "proof", "resource_used_in_proof", "number_of_resources_used_in_proof", "phi_h", "phi_tc", "phi_t", "new_resources", "number_of_new_resources"
    - include only resources used in proof N (for each N) in the column "resource_used_in_proof"
    - output path: put the csv file in the "./output" folder
    - output naming convention: typical_weights-<history_weights>_type-<true_or_false>_<timestamp>.csv
    - "number_of_resources_used_in_proof" is a column of scalars that counts how many resources a proof contains (set-size, not multiplicity; it is a per-proof piece of data but we keep it in this csv)
    - "new_resources" contains a list of resources that are new in proof N (never used before)
    - "new_resources", "number_of_new_resources" are per-proof data; however I want to include them in a single csv; therefore, it is ok to repeat these data on several rows for the same proof

Note on reuse: `rdf_utils.sparql_to_concat_df` could replace the local aggregation helpers because it is query-agnostic,
but `calculate_activation_potential.hebb` returns pair-level potentials (this notebook needs per-resource degrees),
and `calculate_activation_potential.history` scopes propositions/proofs up to N-1 and does not support type selection,
so they are not drop-in fits for this spec.

Note on TYPE_SELECTION: in analyses.ipynb, type_selection only switches proof-level direct resources to relation/operation types,
while history and co-occurrence stay concept-based. In typical_proof.ipynb, TYPE_SELECTION switches both proof resources
and Hebbian co-occurrence queries to type-based variants; history context remains concept-based.

In [None]:
from __future__ import annotations

import datetime as dt
from pathlib import Path

import pandas as pd

from modules import rdf_utils, file_utils
from modules.query_runner import QueryRunner
    
OUTPUT_DIR = Path('output')
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# functions to create context for proof N
from modules.typical_context import build_context_for_proof
# functions to gather resources from proof N
from modules.typical_proof_resources import resources_in_proof
# functions to calculate the historical activation potential Phi_h
from modules.typical_activation import compute_phi_h
# functions to calculate the co-occurrence activation potential Phi_Tc
from modules.typical_activation import compute_phi_tc
# functions to calculate the total activation potential Phi_T
from modules.typical_activation import compute_phi_t
# functions to find resources in proof N that are not in the context
from modules.typical_activation import compute_new_resources


In [None]:
# STEP 0: Load latest ontology TTL and reuse a cached QueryRunner
INPUT_TTL = file_utils.latest_file(folder=Path('ontologies'), filename_fragment='ontology_', extension='ttl')
graph = rdf_utils.load_graph(INPUT_TTL)
runner = QueryRunner(graph)

#####################################################################
# STEP 1: define parameters
DELTA = 0.5  # delta in Phi_T = delta * Phi_h + (1 - delta) * Phi_Tc
HISTORY_WEIGHTS = (6 / 9, 1 / 9, 2 / 9)  # Phi_h weights: direct, hierarchical, mereological

# NOTE: Although the SPEC lists PROOF_N as an input, 
# this notebook runs a batch analysis by iterating over a range of proofs. 
# We therefore use START_PROPOSITION/END_PROPOSITION and 
# treat each iteration as the current PROOF_N
START_PROPOSITION = 1
END_PROPOSITION = 48


def validate_params() -> None:
    if not (0.0 <= DELTA <= 1.0):
        raise ValueError(f"DELTA must be in [0, 1], got {DELTA}.")
    if len(HISTORY_WEIGHTS) != 3:
        raise ValueError(
            f"HISTORY_WEIGHTS must have length 3, got {len(HISTORY_WEIGHTS)}."
        )
    if any((w < 0.0 or w > 1.0) for w in HISTORY_WEIGHTS):
        raise ValueError("All HISTORY_WEIGHTS must be in [0, 1].")
    total = sum(HISTORY_WEIGHTS)
    if abs(total - 1.0) > 1e-9:
        raise ValueError(f"HISTORY_WEIGHTS must sum to 1, got {total}.")
    
validate_params()

#####################################################################
def run_analysis(type_selection: bool) -> tuple[pd.DataFrame, Path]:
    """Run the full analysis for a given type selection, save CSV, and return results."""
    results_rows: list[dict[str, object]] = []
    for proof_n in range(START_PROPOSITION, END_PROPOSITION + 1):
        context_resources, family_dfs, hebb_df = build_context_for_proof(
            proof_n,
            runner=runner,
            type_selection=type_selection,
        )
        proof_resources = resources_in_proof(
            proof_n,
            runner=runner,
            type_selection=type_selection,
        )
        proof_resources_sorted = sorted(proof_resources)

        phi_h_df = compute_phi_h(proof_resources_sorted, family_dfs, HISTORY_WEIGHTS)
        phi_tc_df = compute_phi_tc(proof_resources_sorted, hebb_df)
        phi_t_df = compute_phi_t(proof_resources_sorted, phi_h_df, phi_tc_df, DELTA)

        new_resources = compute_new_resources(proof_resources_sorted, context_resources)
        proof_count = len(proof_resources_sorted)
        new_count = len(new_resources)

        phi_h_map = dict(zip(phi_h_df["resource_used_in_proof"], phi_h_df["phi_h"]))
        phi_tc_map = dict(zip(phi_tc_df["resource_used_in_proof"], phi_tc_df["phi_tc"]))
        phi_t_map = dict(zip(phi_t_df["resource_used_in_proof"], phi_t_df["phi_t"]))

        for resource in proof_resources_sorted:
            results_rows.append({
                "proof": proof_n,
                "resource_used_in_proof": resource,
                "number_of_resources_used_in_proof": proof_count,
                "phi_h": float(phi_h_map.get(resource, 0.0)),
                "phi_tc": float(phi_tc_map.get(resource, 0.0)),
                "phi_t": float(phi_t_map.get(resource, 0.0)),
                "new_resources": new_resources,
                "number_of_new_resources": new_count,
            })

    results_df = pd.DataFrame(
        results_rows,
        columns=[
            "proof",
            "resource_used_in_proof",
            "number_of_resources_used_in_proof",
            "phi_h",
            "phi_tc",
            "phi_t",
            "new_resources",
            "number_of_new_resources",
        ],
    )

    history_weights_label = "-".join(f"{w:.4f}" for w in HISTORY_WEIGHTS)
    timestamp = dt.datetime.now().strftime("%Y%m%d-%H%M%S")
    type_label = str(type_selection).lower()
    output_path = OUTPUT_DIR / (
        f"typical_weights-{history_weights_label}_type-{type_label}_{timestamp}.csv"
    )

    print(f"Total proofs processed: {END_PROPOSITION - START_PROPOSITION + 1}")
    print(f"Total rows: {len(results_df)}")

    results_df.to_csv(output_path, index=False)
    print(f"Saved: {output_path}")
    return results_df, output_path


In [None]:
results_df_false, output_path_false = run_analysis(False)
print("TYPE_SELECTION=False")
results_df_false

In [None]:
results_df_true, output_path_true = run_analysis(True)
print("TYPE_SELECTION=True")
results_df_true