-
HiCD is a hyperbolic geometry–based cognitive diagnosis model designed to address long-tail sparsity and semantic heterogeneity in educational graphs.
-
Unlike traditional Euclidean or single-graph methods, HiCD segmenting the educational graph into semantically distinct subgraphs and embeds students, exercises, and concepts into curvature-adaptive hyperbolic spaces, effectively preserving hierarchical and power-law structures while improving diagnostic accuracy and robustness.
-
Hyperbolic Embedding for Long-Tail Sparsity: Learns representations in hyperbolic (Lorentz) space to preserve hierarchical and tree-like relations in sparse educational graphs.
-
Curvature-Aware Graph Decomposition: Decomposes the educational graph into three semantically distinct subgraphs (correct, incorrect, and exercise–concept) and assigns adaptive curvature to each.
-
Multi-Level Fusion Mechanism: Integrates semantic and structural embeddings across subgraphs through attention and knowledge-aware fusion.
-
Hyperbolic Diagnostic Function: Defines prediction functions directly in hyperbolic space using either Möbius subtraction or Lorentzian distance.
-
Comprehensive Evaluation & Robustness: Achieves state-of-the-art results across multiple datasets (ASSIST-0910 and Junyi), especially in sparse and long-tail scenarios.
Educational data are characterized by high sparsity and semantic heterogeneity:
- Long-tail Sparsity:
Most students and concepts interact with very few exercises, making representation learning unstable.
- Heterogeneous Semantics:
Different edge types (e.g., correct/incorrect responses, exercise–concept links) exhibit distinct structures and distributions, which a single embedding space cannot represent faithfully.
Traditional CD models (IRT, NCDM, RCD, SVGCD) fail to fully capture such structural diversity, leading to biased or distorted embeddings. HiCD overcomes these challenges through hyperbolic geometry and graph decomposition.
HiCD addresses the above issues through three coordinated modules:
-
Splits the educational graph into three subgraphs:
- (a) students’ correct graph 𝐺𝑐𝑟,
- (b) students‘ incorrect graph 𝐺𝑖𝑐𝑟,
- (c) exercise–concept association graph 𝐺𝑒𝑐.
-
Each subgraph captures unique semantic and structural relationships.
-
Each subgraph is mapped into a Lorentz-model hyperbolic space with an estimated curvature 𝑘𝑖.
-
Performs message passing in tangent space (via GraphSAGE) and projects results back to hyperbolic space.
-
Multi-level fusion integrates semantic information across subgraphs through attention and Q-matrix–based enhancement.
-
Two geometric diagnostic functions are defined:
- (a) Möbius Subtraction (HiCD-sub) — models ability–difficulty difference in the Poincaré ball,
- (b) Lorentz Distance (HiCD-dist) — computes hyperbolic distance directly in Lorentz space for stability.
-
The Fermi–Dirac decoder converts geometric distances into probabilistic predictions.
EduCWO/
├── model/ # Model implementations
│ ├── HiCD/ # Main Hyperbolic Cognitive Diagnosis (HiCD) model
│ ├── manifolds/ # Lorentz and Poincaré ball manifolds
│ └── __init__.py
│
├── scripts/ # Scripts for data processing, training, and evaluation
│ ├── data/ # Dataset directory
│ │ ├── rawdata/ # Original input datasets
│ │ │ ├── assist0910/ # ASSIST-0910 dataset (raw)
│ │ │ └── junyi/ # Junyi dataset (raw)
│ └── run.py
│
├── utils/ # Helper functions and visualization tools
│
└── README.md # Project description and usage guide
| Dataset | #Students | #Exercises | #Concepts | #Logs | Sparsity |
|---|---|---|---|---|---|
| ASSIST-0910 | 2,493 | 17,676 | 123 | 267,423 | 99.39% |
| Junyi | 10,000 | 734 | 734 | 408,057 | 94.44% |
- ASSIST-0910: Response data from ASSISTments 2009–2010.
- Junyi: Real-world learning logs from the Junyi Academy platform.
- Each dataset includes a Q matrix specifying exercise–concept mappings.
- AUC: Area under the ROC curve
- ACC: Accuracy with 0.5 probability threshold
- RMSE: Root mean square error
HiCD achieves consistent improvement over strong baselines across datasets.
Highlights:
-
On ASSIST-0910, HiCD-dist achieves AUC = 0.7972 and ACC = 0.7531.
-
On Junyi, HiCD-dist attains AUC = 0.8207, outperforming SVGCD by 1.6%.
-
HiCD consistently improves diagnostic robustness under extreme sparsity.
-
HiCD decomposes the educational interaction graph into three semantically distinct subgraphs — students’ correct graph 𝐺𝑐𝑟, students‘ incorrect graph 𝐺𝑖𝑐𝑟, and exercise–concept association graph 𝐺𝑒𝑐.
-
This decomposition isolates different behavioral and structural semantics: the correct and incorrect graphs capture distinct cognitive behaviors, while the exercise–concept graph reflects the underlying knowledge structure.
-
By learning from these specialized subgraphs, HiCD enhances representation diversity and captures fine-grained relational patterns that are often overlooked in unified graph formulations.
-
To effectively integrate heterogeneous semantics, HiCD embeds each subgraph into a hyperbolic space with an adaptively estimated curvature, aligning the geometric properties with the relational complexity of each subgraph. This curvature-aware mapping preserves hierarchical and non-Euclidean characteristics, improving the fidelity of embeddings under long-tail sparsity.
-
HiCD introduces a multi-level fusion mechanism that integrates representations from multiple subgraphs through three stages:
-
Behavioral Fusion — merges student behaviors from correct and incorrect response graphs.
-
Attention Fusion — balances the relative contributions of different subgraphs using attention weighting.
-
Knowledge-Aware Enhancement — aligns exercise representations with related concepts based on the Q-matrix.
-
-
This unified fusion framework ensures that the final embeddings jointly encode behavioral, structural, and conceptual information, yielding robust and interpretable student–exercise representations.
- HiCD defines the diagnostic process directly within hyperbolic space, where student and exercise embeddings are compared through geometric distances that naturally reflect their hierarchical relations.
- Two complementary formulations — Möbius subtraction and Lorentz distance — are employed under different hyperbolic models to measure the relational strength between entities.
- The resulting geometric distances are then decoded into cognitive response probabilities through the Fermi–Dirac function, achieving smooth probabilistic outputs while maintaining geometric interpretability.
- This design allows HiCD to perform cognitive diagnosis in a geometrically consistent and probabilistically stable manner.
