# Which CREs control the cell lineage specific genes

Here, we want to assign the CREs to cell lineage specific genes. We are going to use the regression model from the notebook regression_model.ipynb and the assignment of promoters and enhancers per genes from enhancers_promoters_regression.ipynb.

In [2]:
# Load the enhancer/promoters pairs
import pandas as pd

peak_gene_pairs = pd.read_csv("high_conf_filtered.csv")
peak_gene_pairs.head()

Unnamed: 0.1,Unnamed: 0,gene,peak_ID,coefficient,r2,distance_to_tss,role
0,13,Pcmtd1,ImmGenATAC1219.peak_376,-1.367742,0.752231,394.0,repressor
1,14,Pcmtd1,ImmGenATAC1219.peak_377,-2.494725,0.752231,144.0,repressor
2,15,Pcmtd1,ImmGenATAC1219.peak_378,3.276884,0.752231,132.0,promoter
3,16,Pcmtd1,ImmGenATAC1219.peak_380,5.020475,0.752231,7143.0,activator
4,17,Pcmtd1,ImmGenATAC1219.peak_408,0.9831,0.752231,58306.0,activator


In [None]:
import pandas as pd

# Load CRE‐assignment table
#    (export it first from the notebook if it isn’t already a CSV or DataFrame)
#
#    Expect columns like:  peak_id, gene, region_type (enhancer/promoter), …
assign_df = pd.read_csv("enhancers_promoters_regression.csv")  

# 2) Define your lineage‐specific gene sets
#    (use the same lists you used for the regression step)
lineage_genes = {
    "NKT":         ["CD56", "NCAM1", …],
    "abT":         ["TRAC", "TRBC1", …],
    "Progenitor":  ["SOX2", "POU5F1", …],
    "Tact":        ["IFNG", "GZMB", …],
}

# 3) Flatten into a gene → lineage lookup
gene2lineage = {
    gene: lineage 
    for lineage, genes in lineage_genes.items() 
    for gene in genes
}

# 4) Annotate each assignment with its lineage
assign_df["Lineage"] = assign_df["gene"].map(gene2lineage).fillna("Other")

# 5) Now you can filter or summarize by lineage
#    e.g. all NKT CREs:
nkt_cres = assign_df[assign_df["Lineage"] == "NKT"]

#    e.g. a table of how many enhancers vs promoters per lineage:
summary = (
    assign_df
    .groupby(["Lineage", "region_type"])
    .peak_id
    .nunique()
    .unstack(fill_value=0)
)
print(summary)
