Skip to content

CyberXie/HiCD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HiCD: Hyperbolic Insight through Decomposed Educational Graphs for Long-Tailed Cognitive Diagnosis

Python 3.8+ PyTorch

Overview

  • HiCD is a hyperbolic geometry–based cognitive diagnosis model designed to address long-tail sparsity and semantic heterogeneity in educational graphs.

  • Unlike traditional Euclidean or single-graph methods, HiCD segmenting the educational graph into semantically distinct subgraphs and embeds students, exercises, and concepts into curvature-adaptive hyperbolic spaces, effectively preserving hierarchical and power-law structures while improving diagnostic accuracy and robustness.

Key Features

  • Hyperbolic Embedding for Long-Tail Sparsity: Learns representations in hyperbolic (Lorentz) space to preserve hierarchical and tree-like relations in sparse educational graphs.

  • Curvature-Aware Graph Decomposition: Decomposes the educational graph into three semantically distinct subgraphs (correct, incorrect, and exercise–concept) and assigns adaptive curvature to each.

  • Multi-Level Fusion Mechanism: Integrates semantic and structural embeddings across subgraphs through attention and knowledge-aware fusion.

  • Hyperbolic Diagnostic Function: Defines prediction functions directly in hyperbolic space using either Möbius subtraction or Lorentzian distance.

  • Comprehensive Evaluation & Robustness: Achieves state-of-the-art results across multiple datasets (ASSIST-0910 and Junyi), especially in sparse and long-tail scenarios.

Problem Statement

Educational data are characterized by high sparsity and semantic heterogeneity:

  • Long-tail Sparsity:

​ Most students and concepts interact with very few exercises, making representation learning unstable.

  • Heterogeneous Semantics:

​ Different edge types (e.g., correct/incorrect responses, exercise–concept links) exhibit distinct structures and distributions, which a single embedding space cannot represent faithfully.

Traditional CD models (IRT, NCDM, RCD, SVGCD) fail to fully capture such structural diversity, leading to biased or distorted embeddings. HiCD overcomes these challenges through hyperbolic geometry and graph decomposition.

Solution: HiCD Framework

HiCD addresses the above issues through three coordinated modules:

Graph Decomposition

  • Splits the educational graph into three subgraphs:

    • (a) students’ correct graph 𝐺𝑐𝑟,
    • (b) students‘ incorrect graph 𝐺𝑖𝑐𝑟,
    • (c) exercise–concept association graph 𝐺𝑒𝑐.
  • Each subgraph captures unique semantic and structural relationships.

Hyperbolic Representation Learning

  • Each subgraph is mapped into a Lorentz-model hyperbolic space with an estimated curvature 𝑘𝑖.

  • Performs message passing in tangent space (via GraphSAGE) and projects results back to hyperbolic space.

  • Multi-level fusion integrates semantic information across subgraphs through attention and Q-matrix–based enhancement.

Hyperbolic Diagnosis

  • Two geometric diagnostic functions are defined:

    • (a) Möbius Subtraction (HiCD-sub) — models ability–difficulty difference in the Poincaré ball,
    • (b) Lorentz Distance (HiCD-dist) — computes hyperbolic distance directly in Lorentz space for stability.
  • The Fermi–Dirac decoder converts geometric distances into probabilistic predictions.

Project Structure

EduCWO/
├── model/                      # Model implementations
│   ├── HiCD/                   # Main Hyperbolic Cognitive Diagnosis (HiCD) model
│   ├── manifolds/         		# Lorentz and Poincaré ball manifolds
│   └── __init__.py
│
├── scripts/                    # Scripts for data processing, training, and evaluation
│   ├── data/                   # Dataset directory
│   │   ├── rawdata/        	# Original input datasets
│   │   │   ├── assist0910/ 	# ASSIST-0910 dataset (raw)
│   │   │   └── junyi/      	# Junyi dataset (raw)
│   └── run.py
│
├── utils/                      # Helper functions and visualization tools
│
└── README.md                   # Project description and usage guide

Datasets

Dataset #Students #Exercises #Concepts #Logs Sparsity
ASSIST-0910 2,493 17,676 123 267,423 99.39%
Junyi 10,000 734 734 408,057 94.44%
  • ASSIST-0910: Response data from ASSISTments 2009–2010.
  • Junyi: Real-world learning logs from the Junyi Academy platform.
  • Each dataset includes a Q matrix specifying exercise–concept mappings.

Evaluation Metrics

  • AUC: Area under the ROC curve
  • ACC: Accuracy with 0.5 probability threshold
  • RMSE: Root mean square error

Experimental Results

HiCD achieves consistent improvement over strong baselines across datasets.

Highlights:

  • On ASSIST-0910, HiCD-dist achieves AUC = 0.7972 and ACC = 0.7531.

  • On Junyi, HiCD-dist attains AUC = 0.8207, outperforming SVGCD by 1.6%.

  • HiCD consistently improves diagnostic robustness under extreme sparsity.

Key Innovations

1.Graph Decomposition

  • HiCD decomposes the educational interaction graph into three semantically distinct subgraphs — students’ correct graph 𝐺𝑐𝑟, students‘ incorrect graph 𝐺𝑖𝑐𝑟, and exercise–concept association graph 𝐺𝑒𝑐.

  • This decomposition isolates different behavioral and structural semantics: the correct and incorrect graphs capture distinct cognitive behaviors, while the exercise–concept graph reflects the underlying knowledge structure.

  • By learning from these specialized subgraphs, HiCD enhances representation diversity and captures fine-grained relational patterns that are often overlooked in unified graph formulations.

2.Curvature-Aware Mapping and Multi-Level Fusion

  • To effectively integrate heterogeneous semantics, HiCD embeds each subgraph into a hyperbolic space with an adaptively estimated curvature, aligning the geometric properties with the relational complexity of each subgraph. This curvature-aware mapping preserves hierarchical and non-Euclidean characteristics, improving the fidelity of embeddings under long-tail sparsity.

  • HiCD introduces a multi-level fusion mechanism that integrates representations from multiple subgraphs through three stages:

    • Behavioral Fusion — merges student behaviors from correct and incorrect response graphs.

    • Attention Fusion — balances the relative contributions of different subgraphs using attention weighting.

    • Knowledge-Aware Enhancement — aligns exercise representations with related concepts based on the Q-matrix.

  • This unified fusion framework ensures that the final embeddings jointly encode behavioral, structural, and conceptual information, yielding robust and interpretable student–exercise representations.

3.Hyperbolic Diagnosis

  • HiCD defines the diagnostic process directly within hyperbolic space, where student and exercise embeddings are compared through geometric distances that naturally reflect their hierarchical relations.
  • Two complementary formulations — Möbius subtraction and Lorentz distance — are employed under different hyperbolic models to measure the relational strength between entities.
  • The resulting geometric distances are then decoded into cognitive response probabilities through the Fermi–Dirac function, achieving smooth probabilistic outputs while maintaining geometric interpretability.
  • This design allows HiCD to perform cognitive diagnosis in a geometrically consistent and probabilistically stable manner.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors