#  ZNIB Gravity model

In [1]:
from maa.config.utils import configure_logging, LogLevel
from maa.config.constants import CONFIGURATION_PATH
from maa.znib.znib import create_znib_gravity_models_from_config
import warnings
warnings.filterwarnings("ignore", module="statsmodels")
configure_logging(level=LogLevel.WARNING)
%load_ext autoreload
%autoreload 2

## Fit Gravity Models

In this step, two gravity models are fitted for each dataset—an **intra-institutional** and an **inter-institutional** model—using
the file paths and parameters defined in the configuration.

It constructs the link and graph datasets for both the **unfiltered (`CoAffAll`)** and **filtered (`CoAffStable`)** versions,
following the workflow outlined in the `create_co_affiliation_networks.ipynb` notebook.
From the set of unique affiliations, we generate the ZNIB model input by constructing all unique affiliation pairs, resulting in
a complete undirected network of `(n × n) / 2` possible edges.

These affiliation pairs are then enriched with:

- **Intra-organizational dummy variables**
  (e.g., `univ_univ`)
- **Inter-organizational dummy variables**
  (e.g., `univ_resi`)
- **Travel times** between affiliation locations

Finally, the enriched edge dataset is used to fit the **ZNIB gravity models**, separately for intra- and inter-institutional
relationships.

In [2]:
models = create_znib_gravity_models_from_config(config_path=CONFIGURATION_PATH)

[2m2025-11-27T17:35:51.547824Z[0m [1mDropped 249 affiliation link(s) from the DataFrame with no coordinates.[0m


### Results for the intra-institutional proximity models

Across both datasets, greater travel time reduces the probability of co-affiliation links, confirming that geographical proximity remains a strong driver of simultaneous affiliations.
The variable ln_prod_edge_strength—the log product of the two institutions’ publication counts—is consistently positive and significant, indicating that larger institutions are more likely to be connected through co-affiliations simply because they host more researchers.

Institutional pairing effects reveal additional structural patterns. University–university (univ_univ) links show strong, highly significant positive effects in both datasets, highlighting universities’ central role as co-affiliation hubs.
Positive coefficients for medical–medical (med_med) and college–college (coll_coll) pairs appear in the CoAff–All dataset but weaken or disappear in CoAff–Stable. This reduction likely reflects the stricter filtering in the stable dataset, which removes many short-term or low-activity affiliations—common in medical and college settings—and the generally lower publication volume in these sectors.

In [3]:
models.all.znib_intra_model.summary

0,1,2,3
Dep. Variable:,y,No. Observations:,1079715.0
Model:,ZeroInflatedNegativeBinomialP,Df Residuals:,1079705.0
Method:,MLE,Df Model:,9.0
Date:,"Thu, 27 Nov 2025",Pseudo R-squ.:,0.2389
Time:,19:08:44,Log-Likelihood:,-22816.0
converged:,True,LL-Null:,-29977.0
Covariance Type:,HC0,LLR p-value:,0.0

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
inflate_const,5.6067,0.158,35.440,0.000,5.297,5.917
inflate_ln_prod_article_count,-0.3294,0.011,-29.757,0.000,-0.351,-0.308
const,0.8702,0.574,1.517,0.129,-0.254,1.995
ln_prod_article_count,0.2536,0.023,11.103,0.000,0.209,0.298
ln_duration,-0.5630,0.054,-10.434,0.000,-0.669,-0.457
uni_uni,3.4144,0.145,23.600,0.000,3.131,3.698
res_res,0.6221,0.253,2.455,0.014,0.125,1.119
med_med,0.3264,0.177,1.843,0.065,-0.021,0.673
comp_comp,-0.2401,0.167,-1.437,0.151,-0.568,0.087


In [4]:
models.stable.znib_intra_model.summary

0,1,2,3
Dep. Variable:,y,No. Observations:,1079715.0
Model:,ZeroInflatedNegativeBinomialP,Df Residuals:,1079705.0
Method:,MLE,Df Model:,9.0
Date:,"Thu, 27 Nov 2025",Pseudo R-squ.:,0.2388
Time:,19:08:44,Log-Likelihood:,-22819.0
converged:,True,LL-Null:,-29977.0
Covariance Type:,HC0,LLR p-value:,0.0

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
inflate_const,4.6957,0.116,40.635,0.000,4.469,4.922
inflate_ln_prod_article_count,-1.2133,0.041,-29.452,0.000,-1.294,-1.133
const,-2.7108,0.175,-15.462,0.000,-3.054,-2.367
ln_prod_article_count,0.9154,0.086,10.690,0.000,0.748,1.083
ln_duration,-0.7933,0.078,-10.127,0.000,-0.947,-0.640
uni_uni,3.0349,0.154,19.701,0.000,2.733,3.337
res_res,0.5429,0.257,2.116,0.034,0.040,1.046
med_med,0.4868,0.168,2.893,0.004,0.157,0.817
comp_comp,-0.6998,0.190,-3.685,0.000,-1.072,-0.328


### Results for the inter-institutional proximity models

Two main patterns shape the count model results.
(1) Distance decay: higher travel time strongly reduces expected co-affiliation counts.
(2) Productivity effect: joint publication strength (ln prod_edge_strength) is positive and highly significant in CoAffAll, but becomes small and insignificant in CoAffStable, suggesting that productivity drives short-term or project-based co-affiliations but not persistent ones.

Institutional pairing effects show a clear hierarchy:

univ–resi is the only cross-type pairing with a positive, significant effect.

Most other cross-type pairs (gov, npo, med) are strongly negative—especially in the filtered network—indicating that these ties are rare and typically short-lived.

univ–coll is near zero in the full dataset and mildly negative in the filtered one.

Overall, filtering for temporal persistence removes short-term ties and sharpens structural differences across sectors: cross-type coefficients become more negative, reflecting the scarcity of durable inter-sector co-affiliations. The logit component shows that higher productivity reduces the probability of structural zeros, and the positive constant highlights the overall sparsity of inter-institutional co-affiliations.

In [5]:
models.all.znib_inter_model.summary

0,1,2,3
Dep. Variable:,y,No. Observations:,1079715.0
Model:,ZeroInflatedNegativeBinomialP,Df Residuals:,1079691.0
Method:,MLE,Df Model:,23.0
Date:,"Thu, 27 Nov 2025",Pseudo R-squ.:,0.2587
Time:,19:08:45,Log-Likelihood:,-22222.0
converged:,True,LL-Null:,-29977.0
Covariance Type:,HC0,LLR p-value:,0.0

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
inflate_const,4.7823,0.123,38.945,0.000,4.542,5.023
inflate_ln_prod_article_count,-1.3410,0.039,-34.287,0.000,-1.418,-1.264
const,-0.8814,0.236,-3.737,0.000,-1.344,-0.419
ln_prod_article_count,0.4656,0.061,7.584,0.000,0.345,0.586
ln_duration,-0.8836,0.046,-19.417,0.000,-0.973,-0.794
coll_comp,-1.4036,0.447,-3.141,0.002,-2.279,-0.528
coll_gov,-2.8036,0.404,-6.940,0.000,-3.595,-2.012
coll_med,-2.8581,0.258,-11.061,0.000,-3.365,-2.352
coll_npo,-1.9058,0.480,-3.974,0.000,-2.846,-0.966


In [6]:
models.stable.znib_inter_model.summary

0,1,2,3
Dep. Variable:,y,No. Observations:,1079715.0
Model:,ZeroInflatedNegativeBinomialP,Df Residuals:,1079691.0
Method:,MLE,Df Model:,23.0
Date:,"Thu, 27 Nov 2025",Pseudo R-squ.:,0.2587
Time:,19:08:45,Log-Likelihood:,-22222.0
converged:,True,LL-Null:,-29977.0
Covariance Type:,HC0,LLR p-value:,0.0

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
inflate_const,4.7823,0.123,38.945,0.000,4.542,5.023
inflate_ln_prod_article_count,-1.3410,0.039,-34.287,0.000,-1.418,-1.264
const,-0.8814,0.236,-3.737,0.000,-1.344,-0.419
ln_prod_article_count,0.4656,0.061,7.584,0.000,0.345,0.586
ln_duration,-0.8836,0.046,-19.417,0.000,-0.973,-0.794
coll_comp,-1.4036,0.447,-3.141,0.002,-2.279,-0.528
coll_gov,-2.8036,0.404,-6.940,0.000,-3.595,-2.012
coll_med,-2.8581,0.258,-11.061,0.000,-3.365,-2.352
coll_npo,-1.9058,0.480,-3.974,0.000,-2.846,-0.966
