# Tables
This notebook contains the code showing how the tables in the paper is generated from the data. 

In [1]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")
import session_info

session_info.show(html=False)

-----
numpy               1.26.4
pandas              1.5.3
session_info        1.0.0
-----
IPython             8.18.1
jupyter_client      8.6.3
jupyter_core        5.8.1
jupyterlab          4.4.5
notebook            7.4.5
-----
Python 3.9.5 | packaged by conda-forge | (default, Jun 19 2021, 00:27:35) [Clang 11.1.0 ]
macOS-15.6.1-x86_64-i386-64bit
-----
Session information updated at 2025-09-19 10:17


## Table 1.
Distribution of  HLA and Tri-SNP haplotypes among TEDDY samples.


### Import metadata
Metadata for table 1 contains all of the samples where HLA type and tri-SNP genotyping was available (n=7759).

In [2]:
meta = pd.read_csv("../data/meta_hla.csv")
meta.head()

Unnamed: 0,mp257_maskid,hla_type,tri
0,996512,DR4/DR4,010/010
1,474979,DR4/DR8,000/010
2,581412,DR4/DR4,010/010
3,864158,DR3/DR4,010/101
4,669152,DR3/DR4,010/101


### HLA type distribution among all samples

Convert HLA type and tri-SNP genotype columns to ordered categorical for consistency in table order.

In [3]:
hla_order = ['DR3/DR3', 'DR3/DR4', 'DR4/DR4', 'DR4/DR8', 'DR1/DR4',
             'DR4/DR13', 'DR3/DR9', 'DR4/DR9', 'DR4/DR4*030X/020X',
             'DR4/DR4*030X/0304']

meta["hla_type"] = pd.Categorical(
    meta["hla_type"], categories=hla_order, ordered=True)

In [4]:
tri_order = ['010/010', '010/101','101/101',
             '000/010', '001/010', '000/101',
             '000/000', '000/001', '001/101',
             '011/101', '010/011']

meta["tri"] = pd.Categorical(
    meta["tri"], categories=tri_order,  ordered=True)


Create the disrtibution table. This is main text Table 1.

In [5]:
dist = meta.pivot_table(
    index="hla_type", columns="tri", values="mp257_maskid",
    aggfunc="count").fillna(0).astype(int)
dist

tri,010/010,010/101,101/101,000/010,001/010,000/101,000/000,000/001,001/101,011/101,010/011
hla_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
DR3/DR3,62,348,1177,13,2,11,0,0,2,1,0
DR3/DR4,408,2550,3,19,8,30,0,1,1,1,1
DR4/DR4,1480,5,3,39,3,1,0,0,0,0,0
DR4/DR8,14,0,0,1297,5,0,17,1,0,0,0
DR1/DR4,0,0,0,15,143,0,0,2,0,0,0
DR4/DR13,0,0,0,0,56,0,0,1,0,0,0
DR3/DR9,2,16,0,0,0,0,0,0,0,0,0
DR4/DR9,13,1,0,0,0,0,0,0,0,0,0
DR4/DR4*030X/020X,4,0,0,0,0,0,0,0,0,0,0
DR4/DR4*030X/0304,3,0,0,0,0,0,0,0,0,0,0


Show values each tri-SNP genotype as percentage of whole. This is the "All" row in table 1 of the published paper.

In [6]:
sum_row = dist.sum().astype(str) + " (" + (round(dist.sum().div(dist.sum().sum()) * 100, 1)).astype(str) + "%)"
sum_row

tri
010/010    1986 (25.6%)
010/101    2920 (37.6%)
101/101    1183 (15.2%)
000/010    1383 (17.8%)
001/010      217 (2.8%)
000/101       42 (0.5%)
000/000       17 (0.2%)
000/001        5 (0.1%)
001/101        3 (0.0%)
011/101        2 (0.0%)
010/011        1 (0.0%)
dtype: object

Show values each HLA type as percentage of whole. This is the "All" column of Table 1 in the published paper.

In [7]:
sum_col = dist.sum(axis=1).astype(str) + " (" + (
    round(dist.sum(axis=1).div(dist.sum(axis=1).sum(), axis=0) * 100, 1)).astype(str) + "%)"
sum_col

hla_type
DR3/DR3              1616 (20.8%)
DR3/DR4              3022 (38.9%)
DR4/DR4              1531 (19.7%)
DR4/DR8              1334 (17.2%)
DR1/DR4                160 (2.1%)
DR4/DR13                57 (0.7%)
DR3/DR9                 18 (0.2%)
DR4/DR9                 14 (0.2%)
DR4/DR4*030X/020X        4 (0.1%)
DR4/DR4*030X/0304        3 (0.0%)
dtype: object

## Supplementary Tables 1-6
Descriptive characteristics of children with respect to tested outcomes.

### Import meta data
These tables are created from the "filtered" final analysis set that contains 7703 children. 

In [8]:
filtered = pd.read_csv("../data/filtered_meta_final.csv")
filtered.head()

Unnamed: 0,mp257_maskid,family_id,immunochip_id,sex,Cntry,POP,ancestry,POP_reported,t1d,t1d_diag_age_censor,...,rs61751041 (LAMB1),rs6967298 (AUTS2),rs72704176 (ASH1L),rs72717025 (FCGR2A),rs73043122 (RNASET2/MIR3939),rs77532435 (GRB10),rs8013918 (FOS),rs9934817 (RBFOX1),rs389884,rs926552
0,996512,9000247.0,9000247_996512,Male,SWE,EUR,EUR,,1,135.972,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
1,474979,474979.0,474979_474979,Male,SWE,EUR,EUR,,0,183.443,...,1.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0
2,581412,581412.0,581412_581412,Male,SWE,EUR,EUR,,0,186.235,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,864158,864158.0,864158_864158,Male,SWE,EUR,EUR,EUR,0,167.017,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0
4,669152,669152.0,669152_669152,Male,SWE,EUR,EUR,,0,174.08,...,0.0,1.0,0.0,0.0,1.0,0.0,2.0,0.0,1.0,0.0


Convert the used covariates to ordered categoricals for consistency.

In [9]:
select_pops = ["EUR", "AMR", "AFR"]
filtered["POP"] = pd.Categorical(
    filtered["POP"], categories=select_pops, ordered=True)

In [10]:
ctry = ["US", "SWE", "FIN", "GER"]
filtered["Country"] = pd.Categorical(
    filtered["Cntry"], categories=ctry, ordered=True)

Change column names for the tested outcomes to match those reported in the paper.

In [11]:
filtered.rename(
    columns={"t1d": "T1D", "persist_conf_ab": "IA",
             "celiac_diagnosis": "CD", "tga_persist": "CDA"},
    inplace=True)

Add an additional column which is same for all rows to be used in total number rows/columns

In [12]:
filtered["All"] = "All"

Create a list of covariates and outcomes for 6 tables.

In [13]:
covariates = ["sex", "POP", "hla_type", "Country", "FDR", "All"]
outcomes = ["IA", "T1D", "GADA first", "IAA first", "CD", "CDA"]          

Loop through outcomes and covariates, print supplementary tables 1-6.

In [14]:
for i in range(len(outcomes)):
    o = outcomes[i]
    tab_list = []
    # set the correct FDR column for Celiac vs T1D
    if o.startswith("CD"):
        filtered["FDR"] = filtered["celiac_fdr"].replace(
            {0: "No", 1: "Yes"})
    else:
        filtered["FDR"] = filtered["FDR-T1D"].replace(
            {0: "No", 1: "Yes"})

    # loop through covariates and create a pivot table for each
    for cov in covariates:
        tab = filtered.pivot_table(
            index=cov, columns=o,
            values="mp257_maskid", aggfunc="count")
        tab.rename(columns={0: "No", 1: "Yes"}, inplace=True)

        # create "All" column in the pivot table for total counts
        tab_sum = tab.sum(axis=1)
        tab["All"] = tab_sum

        # Determine total numbers to use for calculating percentage
        # values. These will be different for actual covariates and
        # the pseudo-covariate "All".
        # We use the "All" column from the main data as a covariate
        # for the totals row. This is different from the "All" column
        # in the pivot table created above.
        if cov == "All":
            total = tab_sum
            axis = 0
        else:
            total = tab.sum()
            axis = 1

        # calculate values as percentage and add within parantheses
        tab_pct = tab.astype(str) + " (" + (
            round(tab.div(total, axis=axis) * 100, 1)).astype(str) + ")"
        tab_list.append(tab_pct)
    # concatenate tables from all covariates for this outcome
    outcome_tab = pd.concat(tab_list, keys=covariates)

    # print the table
    print("Supplementary Table {}.".format(i+1))
    print()
    print(outcome_tab)
    print()
    print()


  

Supplementary Table 1.

IA                          No         Yes           All
sex      Female    3388 (49.5)  390 (45.5)   3778 (49.0)
         Male      3457 (50.5)  468 (54.5)   3925 (51.0)
POP      EUR       6122 (89.4)  796 (92.8)   6918 (89.8)
         AMR         647 (9.5)    56 (6.5)     703 (9.1)
         AFR          76 (1.1)     6 (0.7)      82 (1.1)
hla_type DR1/DR4     142 (2.1)    18 (2.1)     160 (2.1)
         DR3/DR3   1486 (21.7)  121 (14.1)   1607 (20.9)
         DR3/DR4   2599 (38.0)  418 (48.7)   3017 (39.2)
         DR4/DR13     47 (0.7)    10 (1.2)      57 (0.7)
         DR4/DR4   1371 (20.0)  158 (18.4)   1529 (19.8)
         DR4/DR8   1200 (17.5)  133 (15.5)   1333 (17.3)
Country  US        2885 (42.1)  297 (34.6)   3182 (41.3)
         SWE       2019 (29.5)  290 (33.8)   2309 (30.0)
         FIN       1482 (21.7)  214 (24.9)   1696 (22.0)
         GER         459 (6.7)    57 (6.6)     516 (6.7)
FDR      No        6155 (89.9)  691 (80.5)   6846 (88.9)
       