[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rsinghlab/pyaging/blob/main/tutorials/tutorial_dnam.ipynb) [![Open In nbviewer](https://img.shields.io/badge/View%20in-nbviewer-orange)](https://nbviewer.jupyter.org/github/rsinghlab/pyaging/blob/main/tutorials/tutorial_dnam.ipynb)

# RRBS DNA methylation

This tutorial focuses on predicting age from Mus musculus reduced-representation bisulfite sequencing (RRBS) data. There are a few clocks available that were trained on RRBS data. Moreover, it is possible to use Horvath's mammalian clocks by converting the genomic location to the probes in the Horvath methylation array.

In [1]:
import pandas as pd
import pyaging as pya
import os
import numpy as np

## Download and load example data

Let's download the publicly available dataset GSE130735 with RRBS samples from mouse. Given it is RRBS, there are millions of CpG sites.

In [2]:
pya.data.download_example_data('GSE130735')

|-----> 🏗️ Starting download_example_data function
|-----------> Data found in pyaging_data/GSE130735_subset.pkl
|-----> 🎉 Done! [0.5066s]


|-----------> in progress: 1.0022%

|-----------> in progress: 2.0043%

|-----------> in progress: 3.0065%

|-----------> in progress: 4.0086%

|-----------> in progress: 5.0108%

|-----------> in progress: 6.0130%

|-----------> in progress: 7.0151%

|-----------> in progress: 8.0173%

|-----------> in progress: 9.0194%

|-----------> in progress: 10.0216%

|-----------> in progress: 11.0237%

|-----------> in progress: 12.0259%

|-----------> in progress: 13.0281%

|-----------> in progress: 14.0302%

|-----------> in progress: 15.0324%

|-----------> in progress: 16.0345%

|-----------> in progress: 17.0367%

|-----------> in progress: 18.0389%

|-----------> in progress: 19.0410%

|-----------> in progress: 20.0432%

|-----------> in progress: 21.0453%

|-----------> in progress: 22.0475%

|-----------> in progress: 23.0496%

|-----------> in progress: 24.0518%

|-----------> in progress: 25.0540%

|-----------> in progress: 26.0561%

|-----------> in progress: 27.0583%

|-----------> in progress: 28.0604%

|-----------> in progress: 29.0626%

|-----------> in progress: 30.0648%

|-----------> in progress: 31.0669%

|-----------> in progress: 32.0691%

|-----------> in progress: 33.0712%

|-----------> in progress: 34.0734%

|-----------> in progress: 35.0755%

|-----------> in progress: 36.0777%

|-----------> in progress: 37.0799%

|-----------> in progress: 38.0820%

|-----------> in progress: 39.0842%

|-----------> in progress: 40.0863%

|-----------> in progress: 41.0885%

|-----------> in progress: 42.0907%

|-----------> in progress: 43.0928%

|-----------> in progress: 44.0950%

|-----------> in progress: 45.0971%

|-----------> in progress: 46.0993%

|-----------> in progress: 47.1014%

|-----------> in progress: 48.1036%

|-----------> in progress: 49.1058%

|-----------> in progress: 50.1079%

|-----------> in progress: 51.1101%

|-----------> in progress: 52.1122%

|-----------> in progress: 53.1144%

|-----------> in progress: 54.1166%

|-----------> in progress: 55.1187%

|-----------> in progress: 56.1209%

|-----------> in progress: 57.1230%

|-----------> in progress: 58.1252%

|-----------> in progress: 59.1273%

|-----------> in progress: 60.1295%

|-----------> in progress: 61.1317%

|-----------> in progress: 62.1338%

|-----------> in progress: 63.1360%

|-----------> in progress: 64.1381%

|-----------> in progress: 65.1403%

|-----------> in progress: 66.1425%

|-----------> in progress: 67.1446%

|-----------> in progress: 68.1468%

|-----------> in progress: 69.1489%

|-----------> in progress: 70.1511%

|-----------> in progress: 71.1532%

|-----------> in progress: 72.1554%

|-----------> in progress: 73.1576%

|-----------> in progress: 74.1597%

|-----------> in progress: 75.1619%

|-----------> in progress: 76.1640%

|-----------> in progress: 77.1662%

|-----------> in progress: 78.1684%

|-----------> in progress: 79.1705%

|-----------> in progress: 80.1727%

|-----------> in progress: 81.1748%

|-----------> in progress: 82.1770%

|-----------> in progress: 83.1791%

|-----------> in progress: 84.1813%

|-----------> in progress: 85.1835%

|-----------> in progress: 86.1856%

|-----------> in progress: 87.1878%

|-----------> in progress: 88.1899%

|-----------> in progress: 89.1921%

|-----------> in progress: 90.1943%

|-----------> in progress: 91.1964%

|-----------> in progress: 92.1986%

|-----------> in progress: 93.2007%

|-----------> in progress: 94.2029%

|-----------> in progress: 95.2050%

|-----------> in progress: 96.2072%

|-----------> in progress: 97.2094%

|-----------> in progress: 98.2115%

|-----------> in progress: 99.2137%

|-----------> in progress: 100.0000%


|-----> 🎉 Done! [42.4343s]


In [3]:
df = pd.read_pickle('pyaging_data/GSE130735_subset.pkl')

It is important to note that the features for RRBS clocks are the genomic coordinates in the format below.

In [4]:
df.head()

Unnamed: 0,chr1:3020814,chr1:3020842,chr1:3020877,chr1:3020891,chr1:3020945,chr1:3020971,chr1:3020987,chr1:3021012,chr1:3037802,chr1:3037820,...,chrY:1825397,chrY:4682362,chrY:32122892,chrY:85867071,chrY:85867083,chrY:85867117,chrY:85867137,chrY:85867139,chrY:85867178,chrY:88224179
GSM3752631,0.609,0.25,0.408,0.189,0.068,0.373,0.571,0.252,0.333,0.158,...,,,,,,,,,,
GSM3752625,,,0.973,0.984,0.912,0.915,0.987,0.974,0.991,0.932,...,,,,,,,,,,
GSM3752634,,,0.526,0.131,0.0,0.038,0.469,0.769,0.772,0.146,...,,,,,,,,,,
GSM3752620,0.931,0.92,0.988,0.949,0.897,0.921,0.907,0.958,1.0,0.867,...,,,,,,,,,,
GSM3752622,,,0.205,0.382,0.091,0.132,0.174,0.227,0.108,0.053,...,,,,,,,,,,


## Convert data to AnnData object

AnnData objects are highly flexible and are thus our preferred method of organizing data for age prediction.

In [5]:
adata = pya.pp.df_to_adata(df, imputer_strategy='mean') # knn might be a bit slow

|-----> 🏗️ Starting df_to_adata function
|-----> ⚙️ Create anndata object started
|-----> ✅ Create anndata object finished [3.8270s]
|-----> ⚙️ Add metadata to anndata started
|-----------? No metadata provided. Leaving adata.obs empty
|-----> ⚠️ Add metadata to anndata finished [0.0012s]
|-----> ⚙️ Log data statistics started
|-----------> There are 14 observations
|-----------> There are 1778324 features
|-----------> Total missing values: 6322346
|-----------> Percentage of missing values: 25.39%
|-----> ✅ Log data statistics finished [0.0586s]
|-----> ⚙️ Impute missing values started
|-----------> Imputing missing values using mean strategy
|-----> ✅ Impute missing values finished [1.3584s]
|-----> ⚙️ Add imputer strategy to adata.uns started
|-----> ✅ Add imputer strategy to adata.uns finished [0.0006s]
|-----> 🎉 Done! [5.2660s]


|-----> ✅ Create anndata object finished [0.9810s]


|-----> ⚙️ Add metadata to anndata started


|-----------? No metadata provided. Leaving adata.obs empty


|-----> ⚠️ Add metadata to anndata finished [0.0005s]


|-----> ⚙️ Log data statistics started


|-----------> There are 14 observations


|-----------> There are 1778324 features


|-----------> Total missing values: 6322346


|-----------> Percentage of missing values: 25.39%


|-----> ✅ Log data statistics finished [0.0171s]


|-----> ⚙️ Impute missing values started


|-----------> Imputing missing values using mean strategy


|-----> ✅ Impute missing values finished [0.3931s]


|-----> ⚙️ Add imputer strategy to adata.uns started


|-----> ✅ Add imputer strategy to adata.uns finished [0.0003s]


|-----> 🎉 Done! [1.3951s]


This is what the `adata` object looks like:

## Predict age with RRBS clocks

We can either predict one clock at once or all at the same time. For convenience, let's simply input all four available mammalian clocks at once. The function is invariant to the capitalization of the clock name.

In [6]:
pya.pred.predict_age(adata, ['Thompson', 'Meer', 'Petkovich', 'Stubbs'])

|-----> 🏗️ Starting predict_age function
|-----> ⚙️ Set PyTorch device started
|-----------> Using device: cpu
|-----> ✅ Set PyTorch device finished [0.0018s]
|-----> 🕒 Processing clock: thompson
|-----------> ⚙️ Load clock started
|-----------------> Data found in pyaging_data/thompson.pt
|-----------> ✅ Load clock finished [0.4952s]
|-----------> ⚙️ Check features in adata started
|-----------------? 1 out of 582 features (0.17%) are missing: ['chr4:91376687'], etc.
|-----------------> Filling missing features entirely with 0
|-----------------> Added prepared input matrix to adata.obsm[X_thompson]
|-----------> ⚠️ Check features in adata finished [0.0566s]
|-----------> ⚙️ Predict ages with model started
|-----------------> There is no preprocessing necessary
|-----------------> There is no postprocessing necessary
|-----------------> in progress: 100.0000%
|-----------> ✅ Predict ages with model finished [0.0039s]
|-----------> ⚙️ Add predicted ages and clock metadata to adata star

|-----------> Using device: cpu


|-----> ✅ Set PyTorch device finished [0.0004s]


|-----> 🕒 Processing clock: thompson


|-----------> ⚙️ Load clock started


|-----------------> Downloading data to pyaging_data/thompson.pt


|-----------------> in progress: 38.4999%

|-----------------> in progress: 76.9997%

|-----------------> in progress: 115.4996%

|-----------------> in progress: 100.0000%


|-----------> ✅ Load clock finished [0.9989s]


|-----------> ⚙️ Check features in adata started


|-----------------? 1 out of 582 features (0.17%) are missing: ['chr4:91376687'], etc.


|-----------------> Filling missing features entirely with 0


|-----------------> Added prepared input matrix to adata.obsm[X_thompson]


|-----------> ⚠️ Check features in adata finished [0.0459s]


|-----------> ⚙️ Predict ages with model started


|-----------------> There is no preprocessing necessary


|-----------------> There is no postprocessing necessary


|-----------------> in progress: 100.0000%


|-----------> ✅ Predict ages with model finished [0.0014s]


|-----------> ⚙️ Add predicted ages and clock metadata to adata started


|-----------> ✅ Add predicted ages and clock metadata to adata finished [0.0006s]


|-----> 🕒 Processing clock: meer


|-----------> ⚙️ Load clock started


|-----------------> Downloading data to pyaging_data/meer.pt


|-----------------> in progress: 49.5944%

|-----------------> in progress: 99.1888%

|-----------------> in progress: 148.7831%

|-----------------> in progress: 100.0000%


|-----------> ✅ Load clock finished [0.9468s]


|-----------> ⚙️ Check features in adata started


|-----------------? 225 out of 435 features (51.72%) are missing: ['chr10:111559529', 'chr10:115250413', 'chr10:127620127'], etc.


|-----------------> Filling missing features entirely with 0


|-----------------> Added prepared input matrix to adata.obsm[X_meer]


|-----------> ⚠️ Check features in adata finished [0.0252s]


|-----------> ⚙️ Predict ages with model started


|-----------------> There is no preprocessing necessary


|-----------------> There is no postprocessing necessary


|-----------------> in progress: 100.0000%


|-----------> ✅ Predict ages with model finished [0.0010s]


|-----------> ⚙️ Add predicted ages and clock metadata to adata started


|-----------> ✅ Add predicted ages and clock metadata to adata finished [0.0006s]


|-----> 🕒 Processing clock: petkovich


|-----------> ⚙️ Load clock started


|-----------------> Downloading data to pyaging_data/petkovich.pt


|-----------------> in progress: 142.9170%

|-----------------> in progress: 100.0000%


|-----------> ✅ Load clock finished [0.9550s]


|-----------> ⚙️ Check features in adata started


|-----------------? 58 out of 90 features (64.44%) are missing: ['chr19:23893237', 'chr18:45589182', 'chr16:10502162'], etc.


|-----------------> Filling missing features entirely with 0


|-----------------> Added prepared input matrix to adata.obsm[X_petkovich]


|-----------> ⚠️ Check features in adata finished [0.0147s]


|-----------> ⚙️ Predict ages with model started


|-----------------> There is no preprocessing necessary


|-----------------> The postprocessing method is petkovich


|-----------------> in progress: 100.0000%


|-----------> ✅ Predict ages with model finished [0.0010s]


|-----------> ⚙️ Add predicted ages and clock metadata to adata started


|-----------> ✅ Add predicted ages and clock metadata to adata finished [0.0006s]


|-----> 🕒 Processing clock: stubbs


|-----------> ⚙️ Load clock started


|-----------------> Downloading data to pyaging_data/stubbs.pt


|-----------------> in progress: 1.0707%

|-----------------> in progress: 2.1413%

|-----------------> in progress: 3.2120%

|-----------------> in progress: 4.2826%

|-----------------> in progress: 5.3533%

|-----------------> in progress: 6.4239%

|-----------------> in progress: 7.4946%

|-----------------> in progress: 8.5653%

|-----------------> in progress: 9.6359%

|-----------------> in progress: 10.7066%

|-----------------> in progress: 11.7772%

|-----------------> in progress: 12.8479%

|-----------------> in progress: 13.9185%

|-----------------> in progress: 14.9892%

|-----------------> in progress: 16.0598%

|-----------------> in progress: 17.1305%

|-----------------> in progress: 18.2012%

|-----------------> in progress: 19.2718%

|-----------------> in progress: 20.3425%

|-----------------> in progress: 21.4131%

|-----------------> in progress: 22.4838%

|-----------------> in progress: 23.5544%

|-----------------> in progress: 24.6251%

|-----------------> in progress: 25.6958%

|-----------------> in progress: 26.7664%

|-----------------> in progress: 27.8371%

|-----------------> in progress: 28.9077%

|-----------------> in progress: 29.9784%

|-----------------> in progress: 31.0490%

|-----------------> in progress: 32.1197%

|-----------------> in progress: 33.1904%

|-----------------> in progress: 34.2610%

|-----------------> in progress: 35.3317%

|-----------------> in progress: 36.4023%

|-----------------> in progress: 37.4730%

|-----------------> in progress: 38.5436%

|-----------------> in progress: 39.6143%

|-----------------> in progress: 40.6849%

|-----------------> in progress: 41.7556%

|-----------------> in progress: 42.8263%

|-----------------> in progress: 43.8969%

|-----------------> in progress: 44.9676%

|-----------------> in progress: 46.0382%

|-----------------> in progress: 47.1089%

|-----------------> in progress: 48.1795%

|-----------------> in progress: 49.2502%

|-----------------> in progress: 50.3209%

|-----------------> in progress: 51.3915%

|-----------------> in progress: 52.4622%

|-----------------> in progress: 53.5328%

|-----------------> in progress: 54.6035%

|-----------------> in progress: 55.6741%

|-----------------> in progress: 56.7448%

|-----------------> in progress: 57.8155%

|-----------------> in progress: 58.8861%

|-----------------> in progress: 59.9568%

|-----------------> in progress: 61.0274%

|-----------------> in progress: 62.0981%

|-----------------> in progress: 63.1687%

|-----------------> in progress: 64.2394%

|-----------------> in progress: 65.3100%

|-----------------> in progress: 66.3807%

|-----------------> in progress: 67.4514%

|-----------------> in progress: 68.5220%

|-----------------> in progress: 69.5927%

|-----------------> in progress: 70.6633%

|-----------------> in progress: 71.7340%

|-----------------> in progress: 72.8046%

|-----------------> in progress: 73.8753%

|-----------------> in progress: 74.9460%

|-----------------> in progress: 76.0166%

|-----------------> in progress: 77.0873%

|-----------------> in progress: 78.1579%

|-----------------> in progress: 79.2286%

|-----------------> in progress: 80.2992%

|-----------------> in progress: 81.3699%

|-----------------> in progress: 82.4406%

|-----------------> in progress: 83.5112%

|-----------------> in progress: 84.5819%

|-----------------> in progress: 85.6525%

|-----------------> in progress: 86.7232%

|-----------------> in progress: 87.7938%

|-----------------> in progress: 88.8645%

|-----------------> in progress: 89.9351%

|-----------------> in progress: 91.0058%

|-----------------> in progress: 92.0765%

|-----------------> in progress: 93.1471%

|-----------------> in progress: 94.2178%

|-----------------> in progress: 95.2884%

|-----------------> in progress: 96.3591%

|-----------------> in progress: 97.4297%

|-----------------> in progress: 98.5004%

|-----------------> in progress: 99.5711%

|-----------------> in progress: 100.6417%

|-----------------> in progress: 100.0000%


|-----------> ✅ Load clock finished [1.4773s]


|-----------> ⚙️ Check features in adata started


|-----------------? 8889 out of 17992 features (49.41%) are missing: ['chr1:10038066', 'chr1:106173313', 'chr1:106759301'], etc.


|-----------------> Using reference feature values for stubbs


|-----------------> Added prepared input matrix to adata.obsm[X_stubbs]


|-----------> ⚠️ Check features in adata finished [0.8155s]


|-----------> ⚙️ Predict ages with model started


|-----------------> The preprocessing method is quantile_normalization_and_scale_with_gold_standard


|-----------------> The postprocessing method is stubbs


|-----------------> in progress: 100.0000%


|-----------> ✅ Predict ages with model finished [0.0204s]


|-----------> ⚙️ Add predicted ages and clock metadata to adata started


|-----------> ✅ Add predicted ages and clock metadata to adata finished [0.0005s]


|-----> 🎉 Done! [5.5415s]


All of the age predictions are in unit of months.

In [7]:
adata.obs.head()

Unnamed: 0,thompson,meer,petkovich,stubbs
GSM3752631,19.634113,7.315183,8.075177,0.95777
GSM3752625,-1.410461,0.028221,2.953822,-0.074265
GSM3752634,61.058783,21.322178,9.640489,1.389193
GSM3752620,-2.663815,1.611947,3.019351,-0.09271
GSM3752622,20.594114,7.592145,7.104766,0.667168


In [8]:
adata.obs.head()

Unnamed: 0,thompson,meer,petkovich,stubbs
GSM3752631,19.634113,7.315183,8.075177,0.95777
GSM3752625,-1.410461,0.028221,2.953822,-0.074265
GSM3752634,61.058783,21.322178,9.640489,1.389193
GSM3752620,-2.663815,1.611947,3.019351,-0.09271
GSM3752622,20.594114,7.592145,7.104766,0.667168


Having so much information printed can be overwhelming, particularly when running several clocks at once. In such cases, just set verbose to False.

In [9]:
pya.data.download_example_data('GSE130735', verbose=False)
df = pd.read_pickle('pyaging_data/GSE130735_subset.pkl')
adata = pya.preprocess.df_to_adata(df, imputer_strategy='mean', verbose=False)
pya.pred.predict_age(adata, ['Thompson', 'Meer', 'Petkovich', 'Stubbs'], verbose=False)

In [10]:
adata.obs.head()

Unnamed: 0,thompson,meer,petkovich,stubbs
GSM3752631,19.634113,7.315183,8.075177,0.95777
GSM3752625,-1.410461,0.028221,2.953822,-0.074265
GSM3752634,61.058783,21.322178,9.640489,1.389193
GSM3752620,-2.663815,1.611947,3.019351,-0.09271
GSM3752622,20.594114,7.592145,7.104766,0.667168


After age prediction, the clocks are added to `adata.obs`. Moreover, the percent of missing values for each clock and other metadata are included in `adata.uns`.

In [11]:
adata

AnnData object with n_obs × n_vars = 14 × 1778324
    obs: 'thompson', 'meer', 'petkovich', 'stubbs'
    var: 'percent_na'
    uns: 'imputer_strategy', 'thompson_percent_na', 'thompson_missing_features', 'thompson_metadata', 'meer_percent_na', 'meer_missing_features', 'meer_metadata', 'petkovich_percent_na', 'petkovich_missing_features', 'petkovich_metadata', 'stubbs_percent_na', 'stubbs_missing_features', 'stubbs_metadata'
    layers: 'X_original', 'X_imputed'

## Predict age with mammalian clocks

We can predict age by converting the genomic locations directly into the probes from Horvath's methylation array. 

In [None]:
os.system('git clone https://github.com/shorvath/MammalianMethylationConsortium.git')

# Let's read the manifest from the mammalian consortium
annotation_df = pd.read_csv('MammalianMethylationConsortium/Annotations, Amin Haghani/Mammals/Mus_musculus.grcm38.100.HorvathMammalMethylChip40.v1.csv', index_col=0)
annotation_df = annotation_df[~annotation_df.seqnames.isna()]
mm_genomic_locations = 'chr' + annotation_df['seqnames'].astype(str) + ':' + annotation_df['CGstart'].astype(int).astype(str)
mm_genomic_locations = mm_genomic_locations.tolist()
mammalian_probes = annotation_df['CGid'].tolist()
mm_loc_to_probe = dict(zip(mm_genomic_locations, mammalian_probes))

# Let's get the previous RRBS dataset and filter only for the genomic locations in the manifest file
df_columns_set = set(df.columns)
mm_loc_to_probe_set = set(mm_loc_to_probe.keys())
common_columns = df_columns_set.intersection(mm_loc_to_probe_set)
df_converted = df[list(common_columns)].copy()

# Then, convert the genomic location to the probe name
df_converted.columns = [mm_loc_to_probe[col] for col in df_converted.columns]

# Let's clean the GitHub
os.system('rm -r MammalianMethylationConsortium')

In [None]:
df_converted.head()

Now we can finally put the dataframe into pyaging after defining the species as Mus musculus.

In [None]:
df_converted['Mus musculus'] = 1
adata_mammalian = pya.pp.df_to_adata(df_converted, imputer_strategy='mean')

|-----> ✅ Create anndata object finished [0.0039s]


|-----> ⚙️ Add metadata to anndata started


|-----------? No metadata provided. Leaving adata.obs empty


|-----> ⚠️ Add metadata to anndata finished [0.0004s]


|-----> ⚙️ Log data statistics started


|-----------> There are 14 observations


|-----------> There are 5150 features


|-----------> Total missing values: 17862


|-----------> Percentage of missing values: 24.77%


|-----> ✅ Log data statistics finished [0.0011s]


|-----> ⚙️ Impute missing values started


|-----------> Imputing missing values using mean strategy


|-----> ✅ Impute missing values finished [0.0029s]


|-----> ⚙️ Add imputer strategy to adata.uns started


|-----> ✅ Add imputer strategy to adata.uns finished [0.0003s]


|-----> 🎉 Done! [0.0108s]


Let's use these five mammalian predictors.

In [None]:
pya.pred.predict_age(adata_mammalian, ['Mammalian1', 'Mammalian2', 'Mammalian3', "MammalianLifespan", "MammalianFemale"])

|-----------> Using device: cpu


|-----> ✅ Set PyTorch device finished [0.0004s]


|-----> 🕒 Processing clock: mammalian1


|-----------> ⚙️ Load clock started


|-----------------> Data found in pyaging_data/mammalian1.pt


|-----------> ✅ Load clock finished [0.5462s]


|-----------> ⚙️ Check features in adata started


|-----------------? 274 out of 335 features (81.79%) are missing: ['cg00249943', 'cg00250826', 'cg00292639'], etc.


|-----------------> Filling missing features entirely with 0


|-----------------> Added prepared input matrix to adata.obsm[X_mammalian1]


|-----------> ⚠️ Check features in adata finished [0.0148s]


|-----------> ⚙️ Predict ages with model started


|-----------------> There is no preprocessing necessary


|-----------------> The postprocessing method is anti_logp2


|-----------------> in progress: 100.0000%


|-----------> ✅ Predict ages with model finished [0.0011s]


|-----------> ⚙️ Add predicted ages and clock metadata to adata started


|-----------> ✅ Add predicted ages and clock metadata to adata finished [0.0008s]


|-----> 🕒 Processing clock: mammalian2


|-----------> ⚙️ Load clock started


|-----------------> Data found in pyaging_data/mammalian2.pt


|-----------> ✅ Load clock finished [0.4681s]


|-----------> ⚙️ Check features in adata started


|-----------------? 2406 out of 2572 features (93.55%) are missing: ['cg00020468', 'cg00096922', 'cg00098422'], etc.


|-----------------> Using reference feature values for mammalian2


|-----------------> Added prepared input matrix to adata.obsm[X_mammalian2]


|-----------> ⚠️ Check features in adata finished [0.0295s]


|-----------> ⚙️ Predict ages with model started


|-----------------> There is no preprocessing necessary


|-----------------> The postprocessing method is mammalian2


|-----------------> in progress: 100.0000%


|-----------> ✅ Predict ages with model finished [0.0014s]


|-----------> ⚙️ Add predicted ages and clock metadata to adata started


|-----------> ✅ Add predicted ages and clock metadata to adata finished [0.0006s]


|-----> 🕒 Processing clock: mammalian3


|-----------> ⚙️ Load clock started


|-----------------> Data found in pyaging_data/mammalian3.pt


|-----------> ✅ Load clock finished [0.4942s]


|-----------> ⚙️ Check features in adata started


|-----------------? 2299 out of 2467 features (93.19%) are missing: ['cg00101675', 'cg06259996', 'cg15168457'], etc.


|-----------------> Using reference feature values for mammalian3


|-----------------> Added prepared input matrix to adata.obsm[X_mammalian3]


|-----------> ⚠️ Check features in adata finished [0.0315s]


|-----------> ⚙️ Predict ages with model started


|-----------------> There is no preprocessing necessary


|-----------------> The postprocessing method is mammalian3


|-----------------> in progress: 100.0000%


|-----------> ✅ Predict ages with model finished [0.0017s]


|-----------> ⚙️ Add predicted ages and clock metadata to adata started


|-----------> ✅ Add predicted ages and clock metadata to adata finished [0.0005s]


|-----> 🕒 Processing clock: mammalianlifespan


|-----------> ⚙️ Load clock started


|-----------------> Data found in pyaging_data/mammalianlifespan.pt


|-----------> ✅ Load clock finished [0.4444s]


|-----------> ⚙️ Check features in adata started


|-----------------? 133 out of 152 features (87.50%) are missing: ['cg00039845', 'cg00300233', 'cg00810217'], etc.


|-----------------> Using reference feature values for mammalianlifespan


|-----------------> Added prepared input matrix to adata.obsm[X_mammalianlifespan]


|-----------> ⚠️ Check features in adata finished [0.0095s]


|-----------> ⚙️ Predict ages with model started


|-----------------> There is no preprocessing necessary


|-----------------> There is no postprocessing necessary


|-----------------> in progress: 100.0000%


|-----------> ✅ Predict ages with model finished [0.0026s]


|-----------> ⚙️ Add predicted ages and clock metadata to adata started


|-----------> ✅ Add predicted ages and clock metadata to adata finished [0.0006s]


|-----> 🕒 Processing clock: mammalianfemale


|-----------> ⚙️ Load clock started


|-----------------> Data found in pyaging_data/mammalianfemale.pt


|-----------> ✅ Load clock finished [0.4082s]


|-----------> ⚙️ Check features in adata started


|-----------------? 73 out of 101 features (72.28%) are missing: ['cg01145947', 'cg02053792', 'cg02407848'], etc.


|-----------------> Filling missing features entirely with 0


|-----------------> Added prepared input matrix to adata.obsm[X_mammalianfemale]


|-----------> ⚠️ Check features in adata finished [0.0130s]


|-----------> ⚙️ Predict ages with model started


|-----------------> There is no preprocessing necessary


|-----------------> The postprocessing method is sigmoid


|-----------------> in progress: 100.0000%


|-----------> ✅ Predict ages with model finished [0.0009s]


|-----------> ⚙️ Add predicted ages and clock metadata to adata started


|-----------> ✅ Add predicted ages and clock metadata to adata finished [0.0004s]


|-----> 🎉 Done! [2.9784s]


Note that RRBS clocks are in units of months whereas the mammalian clocks are in units of years.

In [None]:
adata_mammalian.obs

## Get citation

The doi, citation, and some metadata are automatically added to the AnnData object under `adata.uns[CLOCKNAME_metadata]`.

In [None]:
adata.uns['thompson_metadata']

In [None]:
adata.uns['meer_metadata']

In [None]:
adata.uns['petkovich_metadata']

In [None]:
adata.uns['stubbs_metadata']

In [None]:
adata_mammalian.uns['mammalian1_metadata']

In [None]:
adata_mammalian.uns['mammalianlifespan_metadata']