# Generate Flanking Regions for Common ChIP-seq Peaks

This notebook takes one or more “common peaks” BED files, extends each peak by multiple flank sizes using `bedtools slop`,  
and then subtracts the original peaks with `bedtools subtract` to retain only the flanking regions.  

**Key steps:**
1. Import modules and define a helper to run shell commands.  
2. Configure datasets and desired flank sizes.  
3. Loop over each dataset and flank size:  
   - run `bedtools slop` to extend  
   - run `bedtools subtract` to remove original peaks  


In [None]:
# ─── Cell 1: Imports & Command Runner Helper ───────────────────────────────────
# - subprocess: used to execute shell commands from Python
import subprocess

# run_cmd: helper function to execute a shell command and report its output or any errors
def run_cmd(cmd):
    """
    Execute a shell command, print the command being run, and then print
    either stdout on success or stderr on failure.

    Args:
        cmd (str): The shell command to execute.
    """
    # Print the command for logging
    print(f"Running:\n{cmd}")
    process = subprocess.run(
        cmd,
        shell=True,
        capture_output=True,
        text=True
    )
    if process.returncode != 0:
        # On error, print the stderr stream
        print(f" Error:\n{process.stderr}")
    else:
        # On success, print the stdout stream
        print(f" Done:\n{process.stdout}")


In [3]:
# ─── Cell 2: Configure Paths & Generate Flanking Regions ────────────────────
# - Specify the original “common peaks” BED file
# - Specify the genome chromosome sizes file (for bedtools slop)
# - Specify the output directory for all generated files

# Input BED file of GPS2–CTCF common peaks
BED_ORIG = "/projectnb/perissilab/Xinyu/GPS2_CHIPseq/Silencing/GPS2/results/annotation/sictl_filtered_by_GPS2_CTCF_common.bed"

# Chromosome sizes file for mm39 genome
GENOME = "/projectnb/perissilab/Xinyu/GPS2_CHIPseq/Adapters_and_Annotations/mm39.genome"

# Directory where output BEDs will be written
OUTDIR = "/projectnb/perissilab/Xinyu/GPS2_CHIPseq/Silencing/GPS2/results/annotation"

# List of flank sizes (in base pairs) to extend on both sides of each peak
sizes = [100, 200, 300, 400, 500]

# Loop over each flank size to:
# 1) extend original peaks by `size` bp using bedtools slop
# 2) subtract the original peaks to retain only the new flanking regions
for size in sizes:
    print(f"=== Processing {size}bp ===")

    # 1) Create extended peaks
    slop_file = f"{OUTDIR}/sictl_filtered_by_GPS2_CTCF_common_slop{size}bp.bed"
    cmd_slop = f"bedtools slop -i {BED_ORIG} -g {GENOME} -b {size} > {slop_file}"
    run_cmd(cmd_slop)

    # 2) Subtract original peaks to get only the flanks
    flank_file = f"{OUTDIR}/sictl_filtered_by_GPS2_CTCF_common_slop{size}bpflanks_only.bed"
    cmd_subtract = f"bedtools subtract -a {slop_file} -b {BED_ORIG} > {flank_file}"
    run_cmd(cmd_subtract)


In [1]:
# ─── Cell 3: ATF4–CTCF Common Peaks Flank Generation ─────────────────────────
# This cell generates flanking regions for the ATF4–CTCF common peaks BED file.
# Steps for each flank size:
# 1) Extend the original peaks by `size` bp using `bedtools slop`
# 2) Subtract the original peaks to retain only the new flanking regions

# Input BED file of ATF4–CTCF common promoter peaks
BED_ORIG = "/projectnb/perissilab/Xinyu/GPS2_CHIPseq/ATF4_3T3L1/results/annotation/ATF4_filtered_by_ATF4_CTCF_common_genes_promoter_only.bed"

# Chromosome sizes file for the mm39 genome (required by bedtools slop)
GENOME = "/projectnb/perissilab/Xinyu/GPS2_CHIPseq/Adapters_and_Annotations/mm39.genome"

# Output directory for all generated BED files
OUTDIR = "/projectnb/perissilab/Xinyu/GPS2_CHIPseq/ATF4_3T3L1/results/annotation"

# List of flank sizes (in base pairs)
sizes = [100, 200, 300, 400, 500]

for size in sizes:
    print(f"=== Processing {size}bp ===")

    # 1) Create extended peaks file
    slop_file = f"{OUTDIR}/ATF4_filtered_by_ATF4_CTCF_common_slop{size}bp.bed"
    cmd_slop = f"bedtools slop -i {BED_ORIG} -g {GENOME} -b {size} > {slop_file}"
    run_cmd(cmd_slop)

    # 2) Subtract original peaks to get only flank regions
    flank_file = f"{OUTDIR}/ATF4_filtered_by_ATF4_CTCF_common_slop{size}bpflanks_only.bed"
    cmd_subtract = f"bedtools subtract -a {slop_file} -b {BED_ORIG} > {flank_file}"
    run_cmd(cmd_subtract)


=== Processing 100bp ===


NameError: name 'run_cmd' is not defined

In [3]:
# ─── Cell 4: GPS2 Day6–CTCF Common Peaks Flank Generation ────────────────────
# This cell generates flanking regions for the GPS2 day6–CTCF common promoter peaks.
# For each flank size:
# 1) Extend the original peaks by `size` bp using `bedtools slop`
# 2) Subtract the original peaks to retain only the new flanking regions

# Input BED file of GPS2 day6–CTCF common promoter peaks
BED_ORIG = "/projectnb/perissilab/Xinyu/GPS2_CHIPseq/Adipocyte_differentiation/GPS2/results/annotation/gps2_day6_filtered_by_CTCF_GPS2_d6_common_genes_promoter_only.bed"

# Chromosome sizes file for the mm39 genome (required by bedtools slop)
GENOME = "/projectnb/perissilab/Xinyu/GPS2_CHIPseq/Adapters_and_Annotations/mm39.genome"

# Output directory for all generated BED files
OUTDIR = "/projectnb/perissilab/Xinyu/GPS2_CHIPseq/Adipocyte_differentiation/GPS2/results/annotation"

# List of flank sizes (in base pairs) to extend on both sides of each peak
sizes = [100, 200, 300, 400, 500]

for size in sizes:
    print(f"=== Processing {size}bp ===")

    # 1) Create extended peaks file
    slop_file = f"{OUTDIR}/gps2_day6_filtered_by_CTCF_GPS2_d6_common_genes_slop{size}bp.bed"
    cmd_slop = f"bedtools slop -i {BED_ORIG} -g {GENOME} -b {size} > {slop_file}"
    run_cmd(cmd_slop)

    # 2) Subtract original peaks to keep only the flank regions
    flank_file = f"{OUTDIR}/gps2_day6_filtered_by_CTCF_GPS2_d6_common_genes_slop{size}bpflanks_only.bed"
    cmd_subtract = f"bedtools subtract -a {slop_file} -b {BED_ORIG} > {flank_file}"
    run_cmd(cmd_subtract)


=== Processing 100bp ===
Running:
bedtools slop -i /projectnb/perissilab/Xinyu/GPS2_CHIPseq/Adipocyte_differentiation/GPS2/results/annotation/gps2_day6_filtered_by_CTCF_GPS2_d6_common_genes_promoter_only.bed -g /projectnb/perissilab/Xinyu/GPS2_CHIPseq/Adapters_and_Annotations/mm39.genome -b 100 > /projectnb/perissilab/Xinyu/GPS2_CHIPseq/Adipocyte_differentiation/GPS2/results/annotation/gps2_day6_filtered_by_CTCF_GPS2_d6_common_genes_slop100bp.bed
✅ Done:

Running:
bedtools subtract -a /projectnb/perissilab/Xinyu/GPS2_CHIPseq/Adipocyte_differentiation/GPS2/results/annotation/gps2_day6_filtered_by_CTCF_GPS2_d6_common_genes_slop100bp.bed -b /projectnb/perissilab/Xinyu/GPS2_CHIPseq/Adipocyte_differentiation/GPS2/results/annotation/gps2_day6_filtered_by_CTCF_GPS2_d6_common_genes_promoter_only.bed > /projectnb/perissilab/Xinyu/GPS2_CHIPseq/Adipocyte_differentiation/GPS2/results/annotation/gps2_day6_filtered_by_CTCF_GPS2_d6_common_genes_slop100bpflanks_only.bed
✅ Done:

=== Processing 200bp =