### Step 7 - Converting tab files to BED format
The point of this file is to convert miranda's tab formated outputs (converted from the miRanda_output_formating_2025.01.13.sh) into BED files. The
second sequence needs to be extracted so that we know the location of the target site.

Previous step: miRanda_output_formating_2025.01.13.sh (/home/administrator/Documents/Kaas/Venom_ncRNA_project/Scripts/miRanda/miranda_formating/miRanda_output_formating_2025.01.13.sh)

Next step: 3UTR_gtf_generation_2024-5-13.sh (/home/administrator/Documents/Kaas/Venom_ncRNA_project/Scripts/GTF_generation/2024-5-13/3UTR_gtf_generation_2024-5-13.sh)

5UTR_gtf_generation_2024-5-13.sh (/home/administrator/Documents/Kaas/Venom_ncRNA_project/Scripts/GTF_generation/2024-5-13/5UTR_gtf_generation_2024-5-13.sh)

Exon_gtf_generation_2024-5-13.sh (/home/administrator/Documents/Kaas/Venom_ncRNA_project/Scripts/GTF_generation/2024-5-13/Exon_gtf_generation_2024-5-13.sh)

In [4]:
# Import modules
import pandas as pd
import polars as pl
import os
from pathlib import Path

# Set the directory for all of the tab files
miranda_dir = Path('/home/administrator/Documents/Kaas/Venom_ncRNA_project/Results/miRanda/miRanda_2025-01-12')

# Create a array of for the tab files to be converted to a BED
miranda_tabs = list(miranda_dir.glob('*.tab'))

# Check if the files were found
if not miranda_tabs:
    print(f"Error: No miRanda output files found in {miranda_dir}")
else:
    print("Success: miRanda output files found. The following will be processed:")
    for file in miranda_tabs:
        print(file)

Success: miRanda output files found. The following will be processed:
/home/administrator/Documents/Kaas/Venom_ncRNA_project/Results/miRanda/miRanda_2025-01-12/Crotalus_viridis_reference_CDS_miranda_miRNA_targets.tab
/home/administrator/Documents/Kaas/Venom_ncRNA_project/Results/miRanda/miRanda_2025-01-12/consensus_CV1087_viridis_North_F_five_prime_utr_miranda_miRNA_targets.tab
/home/administrator/Documents/Kaas/Venom_ncRNA_project/Results/miRanda/miRanda_2025-01-12/consensus_CV0985_concolor_Other_F_CDS_miranda_miRNA_targets.tab
/home/administrator/Documents/Kaas/Venom_ncRNA_project/Results/miRanda/miRanda_2025-01-12/consensus_CV1082_viridis_South_M_three_prime_utr_miranda_miRNA_targets.tab
/home/administrator/Documents/Kaas/Venom_ncRNA_project/Results/miRanda/miRanda_2025-01-12/consensus_CV0857_viridis_North_M_three_prime_utr_miranda_miRNA_targets.tab
/home/administrator/Documents/Kaas/Venom_ncRNA_project/Results/miRanda/miRanda_2025-01-12/consensus_CV1082_viridis_South_M_CDS_miranda_

In [8]:
# Create a function that when given a file path it formats the file to BED format
def tab_to_bed(file_path):
    # Create a data frame to read the miRanda tab-formatted output
    df = pl.read_csv(file_path, separator="\t")

    # Format the data frame with Polars
    bed_df = (
        df.with_columns(
            # Create a new column named 'chrom'
            (pl.col('Seq2').str.split(":").list.get(0)
                .alias('chrom')),

            # Create a new column named 'chromStart' (cast to FixedSizeList of 1 element)
            (pl.col('Seq2').str.split(':').list.get(1)
                .str.split('-').list.get(0)
                .alias('chromStart')),

            # Create a new column named 'chromEnd' (cast to FixedSizeList of 1 element)
            (pl.col('Seq2').str.split(':').list.get(1)
                .str.split('-').list.get(1)
                .alias('chromEnd'))
        )
        # Select the columns typically found in a BED file
        .select(pl.col('chrom'), pl.col('chromStart'), pl.col('chromEnd'))
    )
    return bed_df

In [9]:
# Run the above function in a for loop that will create the new BED files
for tab_file in miranda_tabs:
    # Run the function to get a new data formated data frame
    mir_df = tab_to_bed(tab_file)

    # Create an output path to the directory that these new files should be put in.
    out_path = os.path.join(os.path.dirname(tab_file), os.path.basename(tab_file).replace('.tab', '.bed'))
    
    # Save the data frame to the new file path
    mir_df.write_csv(
        file = out_path,
        separator = '\t',
        include_header = True
    )