# Convert MAF file to a Bedgraph file
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Eitan177/EitanAmrom/blob/main/makebaf_bedgraphfile.ipynb)

This code processes MAF (Mutation Annotation Format) files to generate a bedgraph file representing the B-allele frequency (BAF)

**1. Libraries and initial setup:**

* `import pandas as pd`: Imports the pandas library for data manipulation.
* `import os`: Imports the os module for interacting with the operating system, specifically for file operations.
* `import shutil`: Imports the shutil module, used for file operations like moving files.

**2. Function Definition (`create_bedgraph`)**

* Encapsulates the entire process into a reusable function.
* Takes the input MAF file path and the output bedgraph file path as arguments.


**2A. Reading the MAF file:**

* The code first attempts to read a MAF file using pandas.
* `sep='\t'`: Specifies that the file is tab-separated.
* `skiprows=1`: Skips the first row (presumably a header).
* `header=0`: Treats the second row as the header row.
* **Error Handling:** The `try...except` block handles potential errors:
    * `FileNotFoundError`: If the specified file does not exist.
    * `pd.errors.ParserError`: If there's an issue parsing the file (e.g., incorrect format).
    * `Exception`: Catches any other unexpected errors.


**2B. Data Processing:**

* The code creates a new column named `AF` (Allele Frequency) in the DataFrame by calculating the BAF. It calculates this from the `DP4` column.
* The `DP4` column likely contains comma-separated values representing read counts.
* The code splits these values, converts them to integers, and then calculates the sum of the 3rd and 4th values divided by the sum of all values.
* It also creates or renames other columns to store relevant information (Chromosome, Position, Gene).

**2C. Creating the Bedgraph File:**

* The code opens a file named "output.bedgraph" in write mode.
* Writes a header line defining the track type, name, description, and visualization parameters for the bedgraph file.
* Iterates through each row of the DataFrame and writes a line to the bedgraph file. The bedgraph format follows: chromosome, start position, end position, value.
  * The end position is calculated as `row['POS'] + 200`.


In [None]:
import pandas as pd
import os

def create_bedgraph(input_maf_file, output_bedgraph_file):
    """
    Creates a bedgraph file from a MAF file.

    Args:
        input_maf_file: Path to the input MAF file.
        output_bedgraph_file: Path to the output bedgraph file.
    """
    try:
        df = pd.read_csv(input_maf_file, sep='\t', skiprows=1, header=0)

        # Error Handling for missing columns
        required_cols = ['DP4', 'Start_Position', 'Hugo_Symbol', 'Chromosome']
        missing_cols = set(required_cols) - set(df.columns)
        if missing_cols:
            raise ValueError(f"Missing required columns in the MAF file: {', '.join(missing_cols)}")

        chart_data = df
        chart_data['AF'] = chart_data['DP4'].str.split(',', expand=True).astype(int).apply(lambda x: x[2:4].sum() / x.sum(), axis=1)
        chart_data['POS'] = chart_data['Start_Position']
        chart_data['GENE'] = chart_data['Hugo_Symbol']
        chart_data['CHROM'] = chart_data['Chromosome']

        with open(output_bedgraph_file, "w") as f:
            f.write('track type=bedGraph name="BAF" description="B allele frequency" visibility=full color=0,0,255\n')
            for index, row in chart_data.iterrows():
                f.write(f"{row['CHROM']}\t{row['POS']}\t{row['POS'] + 200}\t{row['AF']}\n")
        print(f"bedgraph file '{output_bedgraph_file}' created successfully!")

    except FileNotFoundError:
        print(f"Error: File not found. Please check the file path: {input_maf_file}")
    except pd.errors.ParserError:
        print(f"Error: Could not parse the file. Please check the file format: {input_maf_file}")
    except ValueError as ve:
        print(f"Error: {ve}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")


**Iterating over .maf Files**

* loop finds all files ending with `.snpbackbone.maf` in the /content directory
* For each .maf file found, it calls the `create_bedgraph` function, generating a bedgraph file for every maf file.

In [None]:
for filename in os.listdir('/content/'):
    if filename.endswith('.snpbackbone.maf'):
        input_maf_file = os.path.join('/content/', filename)
        output_bedgraph_file = filename.replace('.snpbackbone.maf', '.bedgraph')  # Create output filename
        create_bedgraph(input_maf_file, output_bedgraph_file)

**File Organization**

* Creates a directory named `bedgraph_files` to store bedgraph files.
* Moves all created `.bedgraph` files into the `bedgraph_files` folder.

**Zipping Files**

* Creates a zip archive of all files in the `bedgraph_files` directory named `bedgraph_files.zip`.

In [None]:
import shutil
import os

# Create the folder if it doesn't exist
folder_name = "bedgraph_files"
if not os.path.exists(folder_name):
    os.makedirs(folder_name)

# Move all .bedgraph files to the folder
for filename in os.listdir():
    if filename.endswith(".bedgraph"):
        source_path = os.path.join(os.getcwd(), filename)
        destination_path = os.path.join(os.getcwd(), folder_name, filename)
        shutil.move(source_path, destination_path)
        print(f"Moved '{filename}' to '{folder_name}'")

!zip -r /content/bedgraph_files.zip /content/bedgraph_files