# **Brain Polycomb project lab notebook**

## Links:

- **Telegram chat**: `https://web.telegram.org/a/#-4738561813`
- **Github repo**: `https://github.com/Ivan-chich/VAU1-Brain-Polycomb-ncRNA-2025`
- **Kanban board**: `https://app.holst.so/share/b/8b0c40b1-920a-461f-83ad-9014988b04e6`
- **Shared materials folder**: `https://msuru-my.sharepoint.com/:f:/g/personal/i_v_chicherin_g_lecturer_msu_ru/ErGoHKfAJ9FIq5EO8DY3Z_YBtj9RGZTYBWhUgU6FtfvR3Q`

# **Introduction**

*Date: 30.01.2025*

We start with the paper:

Pletenev, I. A., Bazarevich, M., Zagirova, D. R., Kononkova, A. D., Cherkasov, A. V., Efimova, O. I., Tiukacheva, E. A., Morozov, K. V., Ulianov, K. A., Komkov, D., Tvorogova, A. V., Golimbet, V. E., Kondratyev, N. V., Razin, S. V., Khaitovich, P., Ulianov, S. V., & Khrameeva, E. E. (2024). Extensive long-range polycomb interactions and weak compartmentalization are hallmarks of human neuronal 3D genome. Nucleic acids research, 52(11), 6234–6252. https://doi.org/10.1093/nar/gkae271

Particularly we are focused on the *neuron-specific dots* summarized in `polycomb_dots_hand_coords_update.tsv`. It contains all gene contacts annotated in these dots.

# **Plan:**

1. Extract **gene set** from starting dataset
2. Extract all transcription factors (**TF set**)
3. Annotate target genes for all TFs (**Target set**)
4. Build **GO enrichment** plots for target genes
5. Analyse **co-expression** for TF set and Target set

# 1. **Building total gene set**

*Date: 11.02.2025*

- `polycomb_dots_hand_coords_update.tsv` was viewed in Rstudio
- `parce_tsv.py` script was used to build gene set
- Total number of genes: 482
- Save result as **polycomb gene_list**

# 2. **Building TF gene subsetset**

*Date: 19.02.2025*

### We solve this task with **GO annotations**

1. Visit https://go.princeton.edu/

2. Follow **Generic GO Term Finder**

3. Upload `polycomb gene_list` from previous step

4. Choose options:

    - Ontology Aspects: Function
    - Choose annotation: GOA + HGNC Xrefs - H. sapiens (Human)
    - Choose Your Output Format: HTML table + GO tree view images

5. Run GO terms annotation

6. Save results in **GO_annotations/GO_term_finder** folder

7. Follow **Generic GO Term Mapper**

8. Choose options:

    - Ontology Aspects: Function
    - Organism: Homo sapiens (GOA @EBI + Ensembl)
    - Choose Your Output Format: HTML table

8. Run GO terms annotation

9. Save results in **GO_annotations/GO_term_mapper** folder

10. Run `GOtermsFinderFunction.py` script to select wanted GO terms

11. Results are saved as `GO_TF_set.txt`

### 197 genes were annotated as TFs with GO terms “DNA binding” and “transcription factor activity”

# 3. **Building TF Partner gene set**

### Options:

- UniProt https://www.uniprot.org
- BioGRID https://thebiogrid.org
- STRING https://string-db.org
- KEGG https://www.kegg.jp
- NCBI https://ncbi.nlm.nih.gov/gene

### More options:

- TRANSFAC: https://genexplain.com/transfac-product/
- ChIP-Atlas: https://chip-atlas.org/
- Ensembl: https://www.ensembl.org/index.html
- GeneCards: https://www.genecards.org/
- GTEX: https://www.gtexportal.org/home/multiGeneQueryPage

### Bioconductor R packages:

- CoRegNet http://bioconductor.riken.jp/packages/3.1/bioc/html/CoRegNet.html
- FGNet https://bioconductor.org/packages/release/bioc/html/FGNet.html

**Discussion on this topic:** https://www.biostars.org/p/2148/

### **TFTenricher**: doesn't work

- Guide: https://github.com/rasma774/Tftenricher
- This script is not functional: I could not build proper environment for it.

### **Tftargets**: this is our choice

1. Run `GOtermsFinderFunction.py`. Read comments! It gives TF list

2. Apply printed TF list in `rtargets.R`

3. Run `rtargets.R` script. Read comments! It gives TF target list. Do not clear global environment in Rstudio: data will be required in the next script!

4. Target set is saved in `tf_targets.csv`

### 664 target genes were found with “Tftarget”

# 4. **GO enrichment graphs**

Run `go_enrichment.R` script

## 4.1. Biological process

![alt text](../data/GO_bar_BP.png "GO_BP")

## 4.2 Subcellular localization

![alt text](../data/GO_bar_CC.png "GO_CC")

## 4.3. Molacular function

![alt text](../data/GO_bar_MF.png "GO_MF")

# 5. Coexpression of Polycomb-regulated genes

- t-SNE and UMAP data: `PcG_coexpression.ipynb`
- coexpression: `CS_CORE_PcG.ipynb`