#  ZNIB Gravity model

In [1]:
from maa.config.utils import configure_logging, LogLevel
from maa.config.constants import CONFIGURATION_PATH
from maa.znib.znib import create_znib_gravity_models_from_config
import warnings
warnings.filterwarnings("ignore", module="statsmodels")
configure_logging(level=LogLevel.WARNING)
%load_ext autoreload
%autoreload 2

## Fit Gravity Models

In this step, two gravity models are fitted for each dataset—an **intra-institutional** and an **inter-institutional** model—using
the file paths and parameters defined in the configuration.

It constructs the link and graph datasets for both the **unfiltered (`CoAffAll`)** and **filtered (`CoAffStable`)** versions,
following the workflow outlined in the `create_co_affiliation_networks.ipynb` notebook.
From the set of unique affiliations, we generate the ZNIB model input by constructing all unique affiliation pairs, resulting in
a complete undirected network of `(n × n) / 2` possible edges.

These affiliation pairs are then enriched with:

- **Intra-organizational dummy variables**
  (e.g., `univ_univ`)
- **Inter-organizational dummy variables**
  (e.g., `univ_resi`)
- **Travel times** between affiliation locations

Finally, the enriched edge dataset is used to fit the **ZNIB gravity models**, separately for intra- and inter-institutional
relationships.

In [2]:
models = create_znib_gravity_models_from_config(config_path=CONFIGURATION_PATH)

[2m2025-12-17T13:15:30.775814Z[0m [1mDropped 246 affiliation link(s) from the DataFrame with no coordinates.[0m


### Results for the intra-institutional proximity models

Across both datasets, greater travel time reduces the probability of co-affiliation links, confirming that geographical proximity remains a strong driver of simultaneous affiliations.
The variable ln_prod_edge_strength—the log product of the two institutions’ publication counts—is consistently positive and significant, indicating that larger institutions are more likely to be connected through co-affiliations simply because they host more researchers.

Institutional pairing effects reveal additional structural patterns. University–university (univ_univ) links show strong, highly significant positive effects in both datasets, highlighting universities’ central role as co-affiliation hubs.
Positive coefficients for medical–medical (med_med) and college–college (coll_coll) pairs appear in the CoAff–All dataset but weaken or disappear in CoAff–Stable. This reduction likely reflects the stricter filtering in the stable dataset, which removes many short-term or low-activity affiliations—common in medical and college settings—and the generally lower publication volume in these sectors.

In [3]:
models.all.znib_intra_model.summary

0,1,2,3
Dep. Variable:,y,No. Observations:,1027461.0
Model:,ZeroInflatedNegativeBinomialP,Df Residuals:,1027451.0
Method:,MLE,Df Model:,9.0
Date:,"Wed, 17 Dec 2025",Pseudo R-squ.:,0.2369
Time:,14:36:12,Log-Likelihood:,-22388.0
converged:,True,LL-Null:,-29339.0
Covariance Type:,HC0,LLR p-value:,0.0

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
inflate_const,5.5929,0.157,35.642,0.000,5.285,5.900
inflate_ln_prod_article_count,-0.3275,0.011,-29.835,0.000,-0.349,-0.306
const,1.0205,0.568,1.797,0.072,-0.093,2.134
ln_prod_article_count,0.2514,0.023,10.941,0.000,0.206,0.296
ln_duration,-0.5690,0.053,-10.765,0.000,-0.673,-0.465
uni_uni,2.5586,0.159,16.117,0.000,2.247,2.870
res_res,-0.0635,0.245,-0.260,0.795,-0.543,0.416
med_med,0.4016,0.176,2.284,0.022,0.057,0.746
comp_comp,-0.4980,0.178,-2.805,0.005,-0.846,-0.150


In [4]:
models.stable.znib_intra_model.summary

0,1,2,3
Dep. Variable:,y,No. Observations:,96141.0
Model:,ZeroInflatedNegativeBinomialP,Df Residuals:,96131.0
Method:,MLE,Df Model:,9.0
Date:,"Wed, 17 Dec 2025",Pseudo R-squ.:,0.1631
Time:,14:36:13,Log-Likelihood:,-7293.6
converged:,True,LL-Null:,-8715.3
Covariance Type:,HC0,LLR p-value:,0.0

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
inflate_const,6.1487,0.222,27.673,0.000,5.713,6.584
inflate_ln_prod_article_count,-0.3513,0.014,-24.846,0.000,-0.379,-0.324
const,2.8866,0.767,3.765,0.000,1.384,4.389
ln_prod_article_count,0.2280,0.031,7.360,0.000,0.167,0.289
ln_duration,-0.5801,0.079,-7.307,0.000,-0.736,-0.424
uni_uni,1.6615,0.201,8.252,0.000,1.267,2.056
res_res,-0.6412,0.322,-1.989,0.047,-1.273,-0.009
med_med,-0.2841,0.379,-0.749,0.454,-1.027,0.459
comp_comp,-1.3610,0.451,-3.015,0.003,-2.246,-0.476


### Results for the inter-institutional proximity models

Two main patterns shape the count model results.
(1) Distance decay: higher travel time strongly reduces expected co-affiliation counts.
(2) Productivity effect: joint publication strength (ln prod_edge_strength) is positive and highly significant in CoAffAll, but becomes small and insignificant in CoAffStable, suggesting that productivity drives short-term or project-based co-affiliations but not persistent ones.

Institutional pairing effects show a clear hierarchy:

univ–resi is the only cross-type pairing with a positive, significant effect.

Most other cross-type pairs (gov, npo, med) are strongly negative—especially in the filtered network—indicating that these ties are rare and typically short-lived.

univ–coll is near zero in the full dataset and mildly negative in the filtered one.

Overall, filtering for temporal persistence removes short-term ties and sharpens structural differences across sectors: cross-type coefficients become more negative, reflecting the scarcity of durable inter-sector co-affiliations. The logit component shows that higher productivity reduces the probability of structural zeros, and the positive constant highlights the overall sparsity of inter-institutional co-affiliations.

In [5]:
models.all.znib_inter_model.summary

0,1,2,3
Dep. Variable:,y,No. Observations:,1027461.0
Model:,ZeroInflatedNegativeBinomialP,Df Residuals:,1027437.0
Method:,MLE,Df Model:,23.0
Date:,"Wed, 17 Dec 2025",Pseudo R-squ.:,0.2535
Time:,14:36:13,Log-Likelihood:,-21903.0
converged:,True,LL-Null:,-29339.0
Covariance Type:,HC0,LLR p-value:,0.0

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
inflate_const,4.7356,0.118,40.167,0.000,4.505,4.967
inflate_ln_prod_article_count,-1.2908,0.038,-34.278,0.000,-1.365,-1.217
const,-1.1633,0.207,-5.620,0.000,-1.569,-0.758
ln_prod_article_count,0.5925,0.057,10.321,0.000,0.480,0.705
ln_duration,-0.8087,0.044,-18.476,0.000,-0.895,-0.723
coll_comp,-0.9697,0.539,-1.800,0.072,-2.026,0.086
coll_gov,-2.7638,0.493,-5.612,0.000,-3.729,-1.799
coll_med,-2.5623,0.225,-11.406,0.000,-3.003,-2.122
coll_npo,-1.5974,0.504,-3.166,0.002,-2.586,-0.609


In [6]:
models.stable.znib_inter_model.summary

0,1,2,3
Dep. Variable:,y,No. Observations:,96141.0
Model:,ZeroInflatedNegativeBinomialP,Df Residuals:,96117.0
Method:,MLE,Df Model:,23.0
Date:,"Wed, 17 Dec 2025",Pseudo R-squ.:,0.1777
Time:,14:36:14,Log-Likelihood:,-7166.7
converged:,True,LL-Null:,-8715.3
Covariance Type:,HC0,LLR p-value:,0.0

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
inflate_const,4.2986,0.199,21.612,0.000,3.909,4.688
inflate_ln_prod_article_count,-1.4067,0.057,-24.641,0.000,-1.519,-1.295
const,1.3208,0.332,3.982,0.000,0.671,1.971
ln_prod_article_count,0.3063,0.099,3.089,0.002,0.112,0.501
ln_duration,-0.7942,0.079,-10.047,0.000,-0.949,-0.639
coll_comp,-0.8437,0.832,-1.014,0.310,-2.474,0.787
coll_gov,-3.1286,0.766,-4.085,0.000,-4.630,-1.627
coll_med,-8.2206,0.189,-43.473,0.000,-8.591,-7.850
coll_npo,-2.7275,1.319,-2.069,0.039,-5.312,-0.143
