# Loop Annotation

## Input files

Polaris requires a `.mcool` file as input. You can obtain `.mcool` files in the following ways:

### 1. Download from the 4DN Database

- Visit the [4DN Data Portal](https://data.4dnucleome.org/).
- Search for and download `.mcool` files suitable for your study.

### 2. Convert Files Using cooler

If you have data in formats such as `.pairs` or `.cool`, you can convert them to `.mcool` format using the Python library [cooler](https://cooler.readthedocs.io/en/latest/index.html). Follow these steps:

- **Install cooler**

  Ensure you have installed cooler using the following command:
  ```bash
  pip install cooler
  ```
- **Convert .pairs to .cool**

   If you are starting with a .pairs file (e.g., normalized contact data with columns for chrom1, pos1, chrom2, pos2), use this command to create a .cool file:
   ```bash
   cooler cload pairs --assembly <genome_version> -c1 chrom1 -p1 pos1 -c2 chrom2 -p2 pos2 <pairs_file> <resolution>.cool
   ```
   Replace `<genome_version> with the appropriate genome assembly (e.g., hg38) and <resolution> with the desired bin size in base pairs.
- **Generate a Multiresolution .mcool File**

   To convert a single-resolution .cool file into a multiresolution .mcool file, use the following command:

   ```bash
   cooler zoomify <input.cool>
   ```

The resulting `.mcool` file can be directly used as input for Polaris.

## Loop Annotation by Polaris

Polaris provides two methods to generate loop annotations for input `.mcool` file. Both methods ultimately yield consistent loop results.

### Method 1: polaris loop pred

This is the simplest approach, allowing you to directly predict loops in a single step.
The command below will take approximately 30 seconds, depending on your device, to identify loops in GM12878 data (250M valid read pairs).

In [5]:
%%bash

polaris loop pred --chrom chr15,chr16,chr17 -i GM12878_250M.bcool -o GM12878_250M_chr151617_loops.bedpe 

use gping cuda:0

Analysing chroms: ['chr15', 'chr16', 'chr17']


[analyzing chr17]: 100%|██████████| 3/3 [00:24<00:00,  8.31s/it]
[Runing clustering on chr15]: 100%|██████████| 3/3 [00:01<00:00,  1.87it/s]


1830 loops saved to  GM12878_250M_chr151617_loops.bedpe


> **Note:** If you encounter a `CUDA OUT OF MEMORY` error, please:
> - Check your GPU's status and available memory.
> - Reduce the --batchsize parameter. (The default value of 128 requires approximately 36GB of CUDA memory. Setting it to 24 will reduce the requirement to less than 10GB.)

### Method 2: polaris loop score and polaris loop pool

This method involves two steps: generating loop scores for each pixel in the contact map and clustering these scores to call loops.


**Step 1: Generate Loop Scores**

Run the following command to calculate the loop score for each pixel in the input contact map and save the result in `GM12878_250M_chr151617_loop_score.bedpe`.

In [6]:
%%bash

polaris loop score --chrom chr15,chr16,chr17 -i GM12878_250M.bcool -o GM12878_250M_chr151617_loop_score.bedpe 

use gping cuda:0

Analysing chroms: ['chr15', 'chr16', 'chr17']


[analyzing chr17]: 100%|██████████| 3/3 [00:34<00:00, 11.37s/it]


**Step 2: Call Loops from Loop Candidates**

Use the following command to identify loops by clustering from the generated loop score file.

In [7]:
%%bash

polaris loop pool -i GM12878_250M_chr151617_loop_score.bedpe  -o GM12878_250M_chr151617_loops_method2.bedpe 

[Runing clustering on chr16]: 100%|██████████| 3/3 [00:01<00:00,  1.72it/s]


1830 loops saved to  GM12878_250M_chr151617_loops_method2.bedpe


We can see both methods ultimately yield consistent loop number.

The we can perform [Aggregate Peak Analysis](https://github.com/ai4nucleome/Polaris/blob/master/example/APA/APA.ipynb) to visualize these results.