**All Notebooks Used & What Each Contains:**

All notebooks are in .ipynb format.

### **1. ATAC-seq_filtering**
<p align="justify">
Processes the raw ATAC-seq dataset by annotating cell types, filtering low-quality cells, and preparing it for downstream analysis through basic visual and statistical checks.
</p>

### **2. Stats_Across_Cells**
<p align="justify">
Performs preliminary statistical analysis on unfiltered ATAC-seq data across immune cell types to summarize accessibility distributions. These early results informed later filtering decisions.
</p>

### **3. Stats_Across_Peaks**
<p align="justify">
Computes descriptive statistics of accessibility across ATAC-seq peaks and summarizes cell type-specific patterns.
</p>

### **4. qc_vs_signal**
<p align="justify">
Assesses data quality by analyzing correlations between ATAC signal and quality control metrics. Finds minimal correlation, suggesting overall good data quality.
</p>

### **5. tss_distance_part1** and **tss_distance_part2**
<p align="justify">
In these notebook we computed each peak’s distance to its nearest TSS, plotted the distance distribution, merged in mean ATAC signal, and assesed & plotted the signal–distance relationship using pearson.
</p>

### **6. signal_comparison**
<p align="justify">
This notebook represents an early attempt to classify regulatory elements as promoters or enhancers based solely on their distance to the transcription start site (TSS). Although we were aware that such a naive distance-based classification would likely be insufficient, this trial allowed us to explore the feasibility of this approach. The method was not robust or conclusive and ultimately was not used in downstream analyses.
</p>

### **7. ATAC_Clustering_Analysis**
<p align="justify">
This notebook contains the main ATAC-seq clustering analysis, performed on the filtered dataset. We reduced dimensionality using PCA, UMAP, and t-SNE, and visualized the accessibility profiles across immune cell types. Clustering patterns were inspected to assess whether related cell types (specifically, progenitor, αβ T cells, activated T cells) group together based on their chromatin accessibility. This analysis directly addressed our goal of defining similarities and differences in chromatin landscapes between cell types.
</p>


### **8. gene_expression_clustering**
This notebook mirrors the ATAC-seq clustering approach, but applied to RNA-seq data. 

### **9. comparison_ATAC_RNA**
</p align="justify">
In this notebook, we directly compared ATAC-seq and RNA-seq clustering results by aligning the dimensionality reduction plots (PCA, UMAP, t-SNE) of both datasets side by side. We matched and renamed cell types across datasets to unify labels and enable visual comparison. The aim was to assess whether chromatin accessibility and gene expression reflect similar relationships between immune cell types, and to what extent their clustering patterns overlap or diverge.
</p>

### **10. abT_Tact_gene_clusters**
<p align="justify">
Clusters of high-variance genes are assigned to αβ T or activated T cells based on mean expression, followed by GO enrichment to interpret their biological roles. Initial GO terms were broad, so results were refined by focusing on biological processes, selecting the most significant genes, and filtering out general terms.
</p>

### **11. regression_model**
<p align="justify">
This notebook implements a regression-based model to identify potential regulatory relationships between chromatin accessibility at peaks (CREs) and gene expression. Using LASSO regression, the model links each gene to its nearby peaks and selects only those with non-zero coefficients, which may have a regulatory influence. We computed R² scores to estimate how well each gene's expression is explained by nearby CREs, enabling downstream analysis of regulatory strength across genes.
</p>

### **12. regression_vs_correlation**
<p align="justify">
This notebook compares the predictive power of correlation-based versus regression-based peak–gene associations. After filtering the peak–gene links for proximity and data availability, we visualized and quantified the overlap between methods. Additionally, we explored the direction of effect (positive/negative) of each CRE, separated them into putative activators and repressors, and analyzed whether clusters of peaks correlate with clusters of genes—potentially revealing coordinated regulatory programs.
</p>

### **13. enhancers_promoters_regression.**
<p align="justify">
This notebook investigates whether promoters and enhancers differ in their influence on gene expression using the results from LASSO regression. Peaks were annotated as promoters or enhancers based on their genomic distance to transcription start sites. We compared the regression coefficients and R² values between the two groups, examined the number of target genes per CRE type, and visualized distributions to assess whether regulatory strength differs between enhancers and promoters. This analysis helped characterize the functional roles of different cis-regulatory element classes.
<p/>

### **14. CREs_per_cell_lineage**
<p align="justify">
This notebook quantifies how many unique cis-regulatory elements (CREs) regulate genes in each immune cell lineage. By combining the LASSO regression output with cell-type-specific gene expression, we identified peaks contributing to gene regulation in either abT or T.act cells. We then compared the distributions of these lineage-specific CREs to assess whether distinct sets of regulatory elements are used in different T cell lineages.
</p>
