catgraph

catgraph provides weighted association networks for categorical data at two complementary levels.

Variable-level association network. Variables are nodes. Edges are weighted by one of four association paradigms — frequentist (classical and bias-corrected Cramér's V), information-theoretic (Normalised Mutual Information and Adjusted Mutual Information), or Bayesian (Dirichlet-smoothed Cramér's V) — selected via the method argument. The workflow supports structural exploration, edge pruning with multiple-testing adjustment, bootstrap confidence intervals, and descriptive network summaries such as centrality and community structure.
Modality-level co-association network. Modalities (factor levels) are nodes. Cross-variable edges are weighted by the association measure chosen at the variable level, with signed standardised Pearson residuals stored separately to indicate whether co-occurrence is above or below independence expectation. The workflow supports edge pruning, signed edge visualisation, and community detection over modalities.

The modality layer sits in the tradition of Multiple Correspondence Analysis and two-mode affiliation networks. It operates on pairwise associations and does not model higher-order interactions or conditional dependencies. It is a descriptive category co-association map, not a respondent-segmentation tool. For respondent segmentation use poLCA or FactoMineR::HCPC.
Modality gravity indices. A novel extension to standard graph centrality that incorporates the empirical prevalence of each modality. The Modality Gravity Index (MGI) and Orbital Score (OS) identify which modalities act as gravitational attractors (dominant, pulling rarer modalities toward them) and which are satellites (minority modalities orbiting more prevalent ones). This addresses a fundamental limitation of standard centrality indices, which treat all nodes as exchangeable regardless of their empirical frequency.

Installation

# install.packages("remotes")
remotes::install_github("AtinaKosta/catgraph")

What's new in 0.11.0

Four association paradigms are now supported as edge weights via the new method argument in catgraph(), build_graph(), assoc_similarity(), and build_modality_graph():

`method`	Paradigm	Description
`"cramers_v"`	Frequentist	Classical phi / Cramér's V (default)
`"cramers_v_corrected"`	Frequentist	Bias-corrected via Bergsma (2013)
`"nmi"`	Information-theoretic	Normalised Mutual Information
`"ami"`	Information-theoretic	Adjusted MI, corrects for chance
`"bayesian_cramers_v"`	Bayesian	Dirichlet-smoothed Cramér's V

Two new exported functions: nmi_assoc() and bayesian_cramers_v() for computing information-theoretic and Bayesian association on arbitrary vector pairs.
alpha argument controls the Dirichlet prior concentration for method = "bayesian_cramers_v" (default 0.5, Jeffreys prior).
Backward compatible: all existing code using catgraph(df), catgraph(df, corrected = TRUE), or build_graph(df) works without any changes.

Quick start

library(catgraph)
data(survey_health)

# --- Variable-level: choose your association paradigm ---

# Frequentist (default)
cg <- catgraph(survey_health, method = "cramers_v_corrected")

# Information-theoretic
cg_nmi <- catgraph(survey_health, method = "nmi")
cg_ami <- catgraph(survey_health, method = "ami")

# Bayesian
cg_bay <- catgraph(survey_health, method = "bayesian_cramers_v", alpha = 0.5)

# Prune and plot
cg_p <- prune_edges(cg, min_weight = 0.05, max_p = 0.05, p_adjust = "BH")
plot(cg_p)

# --- Modality-level: same method flows through ---
mg <- build_modality_graph(survey_health, method = "cramers_v_corrected")
mg <- prune_modality_edges(mg, min_weight = 0.10, max_p = 0.05)
mg <- cluster_modalities(mg)
plot(mg, color_by = "cluster", signed = TRUE)

# --- Gravity indices: which modalities are attractors vs satellites? ---
grav <- modality_gravity(mg)
print(grav)                       # role-grouped: ATTRACTORS / SATELLITES
summary(grav)                     # role counts, Spearman rho diagnostic
plot_gravity(mg)                  # 6-panel: traditional centrality vs MGI
plot_gravity_scatter(grav, mg)    # eigenvector vs dMGI contradiction plot

# --- Compare gravity profiles across subgroups ---
mg_f <- build_conditional_modality_graph(survey_health, given = list(sex = "female"))
mg_m <- build_conditional_modality_graph(survey_health, given = list(sex = "male"))
mg_f <- prune_modality_edges(mg_f, min_weight = 0.10, max_p = 0.05)
mg_m <- prune_modality_edges(mg_m, min_weight = 0.10, max_p = 0.05)
compare_gravity(list(female = mg_f, male = mg_m))

# --- Formal test: do two groups differ in overall association structure? ---
test_modality_graph_equality(mg_f, mg_m, n_perm = 500)

See vignette("introduction", package = "catgraph") for the full variable-level and modality-level workflow, and vignette("comparison", package = "catgraph") for a worked comparison of catgraph's modality layer with MCA, bipartite affiliation networks, and naive co-occurrence projection.

Scope — use `catgraph` for

Question	Tool
Which categorical variables co-vary pairwise?	`catgraph()`
Which association paradigm is most stable?	`catgraph(method = ...)`
Which category levels bundle together across variables?	`build_modality_graph()`
Which modalities are structurally dominant vs peripheral?	`modality_gravity()`
How do category-level association patterns differ across groups?	`compare_*_graphs()`
What's the unprojected respondent-modality incidence like?	`bipartite_modality_graph()`

Scope — do not use `catgraph` for

Causal inference or claims about direct effects.
Conditional-independence structure — use a graphical model instead (bnlearn, gRim).
Respondent segmentation or latent-class analysis — use poLCA or FactoMineR::HCPC().
Estimating conditional relationships between variables — edges are marginal and do not control for other variables.
Mixed continuous/categorical data — use linkspotter or corrgrapher instead.

Interpretation

All graphs produced by catgraph represent marginal association structure. Edge weights quantify pairwise dependence under the chosen paradigm and should be interpreted descriptively. High rank agreement across paradigms (Spearman ρ > 0.90) indicates robust structural conclusions; divergence reveals cases where the choice of metric matters substantively.

Gravity indices (MGI, OS) incorporate empirical modality prevalence. A positive dMGI indicates a modality that exerts net gravitational pull over less prevalent neighbours; a negative dMGI indicates a satellite modality being pulled toward more prevalent ones.

License

GPL (>= 3)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
R		R
data		data
dev/tests_manual		dev/tests_manual
inst		inst
man		man
tests		tests
vignettes		vignettes
.RData		.RData
.Rbuildignore		.Rbuildignore
.Rhistory		.Rhistory
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
_pkgdown.yml		_pkgdown.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

catgraph

Installation

What's new in 0.11.0

Quick start

Scope — use `catgraph` for

Scope — do not use `catgraph` for

Interpretation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

catgraph

Installation

What's new in 0.11.0

Quick start

Scope — use catgraph for

Scope — do not use catgraph for

Interpretation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Scope — use `catgraph` for

Scope — do not use `catgraph` for

Packages