# Typical activation potential: analysis of each proof N against context

This notebook implements the *typical activation potential* from `main.tex`.

Notation (from `main.tex`):
- Phi_T(r, C) is the typical activation potential for resource r in context C.
- Phi_h(r, C) is the historical component of activation potential.
- Phi_Tc(r, C) is the typical co-occurrence component (Phi_{T_c}).
- delta (DELTA in code) is the weight in Phi_T = delta * Phi_h + (1 - delta) * Phi_Tc, with delta in [0, 1].

Assumptions (matching the existing SPARQL queries):

Assumption: The SPARQL queries used in this notebook were verified to implement the paper's definitions of
context C and "together" (per definition/postulate/common notion or proposition/proof).
If query scopes are changed, results may no longer match the formulas in `main.tex`.
- Context C for proof n are the resources in definitions, postulates, common notions, propositions up to n (included), and proofs up to n-1 (included)
- "Together" for co-occurrence means resources that co-occur within the same definition, postulate, common notion, proposition, or proof.
- Phi_h is computed from history queries; Phi_Tc is computed from Hebbian pair degrees (co-occurrence links); Phi_T uses the weighted sum above.
- Empty denominators yield 0 for the corresponding potential.
- TYPE_SELECTION toggles type-based co-occurrence for propositions/proofs (relation/operation types) when true.


# SPEC

GOALS: 
(1) Apply the description of typical activation potential provided in main.tex to compute the activation potential of resources used in proof N against the context of resources used in definitions, postulates, common notions, propositions up to N (included), and proofs up to N-1 (included; if N=1, then there is no proof to include in the context). 
(2) Note when proof N uses _directly_ a resource that has not appeared anywhere in the context (i.e. in the definitions, postulates, common notions, propositions up to N included, and proofs up to N-1 included). For proof N, use direct_template_propositions_proofs if TYPE_SELECTION = False and direct_template_last_item_types if TYPE_SELECTION = True. Define new_resources as the subset of the direct‑usage set from the ‘HOW TO FIND RESOURCES IN PROOF N’ step that does not occur in the context.

PREPARATION: review 
    - main.tex (for the definition of typical activation potential), 
    - analyses.ipynb (for strategies to compare proofs with their context of previous material), and 
    - typical.ipynb (for an algorithm implementing typical activation potential)
    - ./modules/queries.py (for SPARQL queries used to work with ontological resources)

NOTES:
    - cache results of SPARQL queries for faster re-runs (QueryRunner)

INPUTS: PROOF_N, DELTA, HISTORY_WEIGHTS (exactly 3 weights required), TYPE_SELECTION

HOW TO CREATE THE CONTEXT OF PROOF N:
- directly used resources: use queries.direct_definitions(), queries.direct_postulates(), queries.direct_common_notions(), and queries.direct_template_propositions_proofs() for propositions up to N (included) and proofs up to N-1 (included) [for propositions and proofs, one need to state the IRIs as VALUES in the SPARQL queries]
- hierarchically used resources: use queries.hierarchical_definitions(), queries.hierarchical_postulates(), queries.hierarchical_common_notions(), and hierarchical_template_propositions_proofs for propositions up to N (included) and proofs up to N-1 (included) [for propositions and proofs, one need to state the IRIs as VALUES in the SPARQL queries]
- mereologically used resources: use queries.mereological_definitions(), queries.mereological_postulates(), queries.mereological_common_notions, queries.mereological_template_propositions_proofs() for propositions up to N (included) and proofs up to N-1 (included) [for propositions and proofs, one need to state the IRIs as VALUES in the SPARQL queries]
- hebbian co-occurrence: use queries.hebb_definitions(), queries.hebb_postulates(), queries.hebb_common_notions(), and queries.hebb_template_propositions_proofs() for propositions up to N (included) and proofs up to N-1 (included) [for propositions and proofs, one need to state the IRIs as VALUES in the SPARQL queries].

EDGE CASE:
- N = 1: the context includes only definitions, postulates, common notions, and proposition 1.

HOW TO FIND RESOURCES IN PROOF N:
- if TYPE_SELECTION = False: use queries.direct_template_propositions_proofs() for proof N (the list of values for the query should contain only the IRI of proof N)
- if TYPE_SELECTION = True: use queries.direct_template_last_item_types() for proof N (the list of values for the query should contain only the IRI of proof N)

OUTPUT: 
    - csv with columns "proof", "resource_used_in_proof", "number_of_resources_used_in_proof", "phi_h", "phi_tc", "phi_t", "new_resources", "number_of_new_resources"
    - include only resources used in proof N (for each N) in the column "resource_used_in_proof"
    - output path: put the csv file in the "./output" folder
    - output naming convention: typical_weights-<history_weights>_type-<true_or_false>_<timestamp>.csv
    - "number_of_resources_used_in_proof" is a column of scalars that counts how many resources a proof contains (it is a per-proof piece of data but we keep it in this csv)
    - "new_resources" contains a list of resources that are new in proof N (never used before)
    - "new_resources", "number_of_new_resources" are per-proof data; however I want to include them in a single csv; therefore, it is ok to repeat these data on several rows for the same proof

In [None]:
from __future__ import annotations

import datetime as dt
from pathlib import Path

import pandas as pd

from modules import rdf_utils, file_utils
from modules.calculate_activation_potential import history as history_potential
from modules.calculate_activation_potential import hebb as hebb_potential
from modules.query_runner import QueryRunner

In [None]:
# STEP 0: define parameters
DELTA = 0.5  # delta in Phi_T = delta * Phi_h + (1 - delta) * Phi_Tc
HISTORY_WEIGHTS = (6 / 9, 1 / 9, 2 / 9)  # Phi_h weights: direct, hierarchical, mereological
TYPE_SELECTION = False  # toggle type-based co-occurrence in propositions/proofs

# NOTE: Although the SPEC lists PROOF_N as an input, 
# this notebook runs a batch analysis by iterating over a range of proofs. 
# We therefore use START_PROPOSITION/END_PROPOSITION and 
# treat each iteration as the current PROOF_N
START_PROPOSITION = 1
END_PROPOSITION = 48

def validate_params() -> None:
    if not (0.0 <= DELTA <= 1.0):
        raise ValueError(f"DELTA must be in [0, 1], got {DELTA}.")
    if len(HISTORY_WEIGHTS) != 3:
        raise ValueError(
            f"HISTORY_WEIGHTS must have length 3, got {len(HISTORY_WEIGHTS)}."
        )
    if any((w < 0.0 or w > 1.0) for w in HISTORY_WEIGHTS):
        raise ValueError("All HISTORY_WEIGHTS must be in [0, 1].")
    total = sum(HISTORY_WEIGHTS)
    if abs(total - 1.0) > 1e-9:
        raise ValueError(f"HISTORY_WEIGHTS must sum to 1, got {total}.")

validate_params()

OUTPUT_DIR = Path('output')
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

In [None]:
# STEP 1: Load latest ontology TTL and reuse a cached QueryRunner
INPUT_TTL = file_utils.latest_file(folder=Path('ontologies'), filename_fragment='ontology_', extension='ttl')
graph = rdf_utils.load_graph(INPUT_TTL)
runner = QueryRunner(graph)

In [None]:
# STEP 2: function(s) to create context for proof N
# - Implement a function that, for a given N, builds the context set C.
# - Use the SPEC’s query families: direct, hierarchical, mereological, hebb.
# - Sources: definitions, postulates, common notions (all); propositions <= N; proofs <= N-1 (none if N=1).
# - For propositions/proofs, pass explicit IRI lists as VALUES to the queries.
# - Return de-duplicated context resources (and hebb pairs if needed for Phi_Tc).
# - Keep it pure: no printing, no file I/O; just compute and return.


In [None]:
# STEP 3: function(s) to gather resources from proof N
# - SPEC is source of truth; use TYPE_SELECTION to choose the direct query.
# - Query only proof N (single IRI in VALUES) and normalize to match context resource IDs.
# - Create functions only. We will call them in a loop below.


In [None]:
# STEP 4: function(s) to calculate the historical activation potential Phi_h for all resources in proof N
# - SPEC is source of truth; use HISTORY_WEIGHTS and return 0 on empty denominators.
# - Compute Phi_h only for resources in proof N against the context built in STEP 2.
# - Create functions only. We will call them in a loop below.


In [None]:
# STEP 5: function(s) to calculate the co-occurrence activation potential Phi_Tc for all resources in proof N
# - SPEC is source of truth; use hebbian co-occurrence queries and return 0 on empty denominators.
# - Compute Phi_Tc only for resources in proof N against the context built in STEP 2.
# - Create functions only. We will call them in a loop below.


In [None]:
# STEP 6: function(s) to calculate the total activation potential Phi_T for all resources in proof N
# - SPEC is source of truth; Phi_T = DELTA * Phi_h + (1 - DELTA) * Phi_Tc.
# - Create functions only. We will call them in a loop below.


In [None]:
# STEP 7: function(s) to find resources in proof N that are not in the context
# - SPEC is source of truth; new_resources = direct-usage set (STEP 3) minus context (STEP 2).
# - Create functions only. We will call them in a loop below.


In [None]:
# STEP 8: run a loop with all steps for proofs N = START_PROPOSITION to END_PROPOSITION

In [None]:
# STEP 9: print df of the results; print a summary of the results; save results to CSV
# - SPEC is source of truth; output columns, path, and naming convention must match.
# - Repeat per-proof scalars (counts, new_resources) across rows for each resource.
