# DeepTopic

Sample notebook to train DeepTopic model.

In [1]:
import crested

We can use function {func}`crested.import_topics` to import data into an {class}`anndata.AnnData` object,
with the imported topics as the `AnnData.obs` and the consensus peak regions as the `AnnData.var`..

In [2]:
adata = crested.import_topics(
    topics_folder="/staging/leuven/stg_00002/lcb/lmahieu/projects/DeepTopic/biccn_test/otsu",
    peaks_file="/staging/leuven/stg_00002/lcb/lmahieu/projects/DeepTopic/biccn_test/consensus_peaks_bicnn.bed",
    compress=True,
    # topics_subset=["topic_1", "topic_2"], # optional subset of topics to import
)
adata



AnnData object with n_obs × n_vars = 80 × 546993
    obs: 'file_path', 'n_open_regions'
    var: 'n_topics', 'chr', 'start', 'end'

The `import_topics` function will also add a couple of columns with variables of interest to your `AnnData.obs` and `Anndata.var` (AnnData.obs.n_open_regions and AnnData.var.n_topics), which you can use to inspect and get a feel of your data.

To train a model, we'll need to add a *split* column to our dataset, which we can do using {func}`crested.pp.train_val_test_split`.  
We can add a `random_state` to ensure the data will be split in the same manner in the future when `shuffle=True`(default).

In [3]:
# We can split randomly on the regions
crested.pp.train_val_test_split(
    adata, type="random", val_size=0.1, test_size=0.1, random_state=42
)

# Or, choose the chromosomes for the validation and test sets
# enhai.pp.train_val_test_split(
#     adata, type="chr", chr_val=["chr4", "chrX"], chr_test=["chr2", "chr3"]
# )

print(adata.var["split"].value_counts())
adata.var

split
train    437593
test      54700
val       54700
Name: count, dtype: int64


Unnamed: 0_level_0,n_topics,chr,start,end,split
region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
chr1:3094805-3095305,5,chr1,3094805,3095305,train
chr1:3095470-3095970,0,chr1,3095470,3095970,train
chr1:3112174-3112674,1,chr1,3112174,3112674,test
chr1:3113534-3114034,2,chr1,3113534,3114034,train
chr1:3119746-3120246,8,chr1,3119746,3120246,train
...,...,...,...,...,...
chrX:169879313-169879813,3,chrX,169879313,169879813,train
chrX:169880181-169880681,0,chrX,169880181,169880681,train
chrX:169925477-169925977,1,chrX,169925477,169925977,train
chrX:169948550-169949050,0,chrX,169948550,169949050,train


## Train