# MAF to VCF and MAF Conversion

This notebook demonstrates how to:
1. Read a MAF file using `read_maf` with assembly=37
2. Export the PyMutation object to VCF format using `to_vcf`
3. Export the PyMutation object to MAF format using `to_maf`


## Import the necessary functions


In [1]:
import os
from pyMut import read_maf

print("✅ Functions imported correctly")


✅ Functions imported correctly


## Define the path to the MAF file


In [2]:
# Path to the MAF file
maf_path = "../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz"

print("📁 File to process:")
print(f"  - MAF file: {maf_path}")

# Verify that the file exists
if os.path.exists(maf_path):
    print("✅ File found")
else:
    print("❌ File not found")


📁 File to process:
  - MAF file: ../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz
✅ File found


## Read the MAF file with assembly=37


In [3]:
print("📖 Reading MAF file...")

try:
    # Read the MAF file with assembly=37
    pymutation_obj = read_maf(maf_path, "37")
    
    print("✅ PyMutation object created successfully")
    print(f"   DataFrame shape: {pymutation_obj.data.shape}")
    print(f"   Number of variants: {len(pymutation_obj.data)}")
    print(f"   Number of columns: {len(pymutation_obj.data.columns)}")
    print(f"   Number of samples: {len(pymutation_obj.samples)}")
    
except Exception as e:
    print(f"❌ Error reading the file: {e}")
    import traceback
    traceback.print_exc()


2025-08-01 01:51:46,880 | INFO | pyMut.input | Starting MAF reading: ../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz
2025-08-01 01:51:46,881 | INFO | pyMut.input | Loading from cache: ../../../src/pyMut/data/examples/MAF/.pymut_cache/tcga_laml.maf_8bfbda65c4b23428.parquet
2025-08-01 01:51:46,910 | INFO | pyMut.input | Cache loaded successfully in 0.03 seconds


📖 Reading MAF file...
✅ PyMutation object created successfully
   DataFrame shape: (2091, 216)
   Number of variants: 2091
   Number of columns: 216
   Number of samples: 193


## Show the first rows of the DataFrame


In [4]:
print("🔍 First 3 rows of the DataFrame:")
pymutation_obj.head(3)


🔍 First 3 rows of the DataFrame:


Unnamed: 0,CHROM,POS,ID,REF,ALT,QUAL,FILTER,TCGA-AB-2988,TCGA-AB-2869,TCGA-AB-3009,...,Strand,Variant_Classification,Variant_Type,Reference_Allele,Tumor_Seq_Allele1,Tumor_Seq_Allele2,Tumor_Sample_Barcode,Protein_Change,i_TumorVAF_WU,i_transcript_name
0,chr9,100077177,.,T,C,.,.,T|T,T|T,T|T,...,+,SILENT,SNP,T,T,C,TCGA-AB-2886,p.T431T,9.76,NM_020893.1
1,chr9,100085148,.,G,A,.,.,G|G,G|G,G|G,...,+,MISSENSE_MUTATION,SNP,G,G,A,TCGA-AB-2917,p.R581H,18.4,NM_020893.1
2,chr9,100971322,.,A,C,.,.,A|A,A|A,A|A,...,+,MISSENSE_MUTATION,SNP,A,A,C,TCGA-AB-2841,p.L593R,45.83,NM_018421.3


## Define output paths for VCF and MAF exports


In [5]:
# Create output directory if it doesn't exist
output_dir = "./output"
os.makedirs(output_dir, exist_ok=True)

# Define output paths
vcf_output_path = os.path.join(output_dir, "maf_to_vcf_output.vcf")
maf_output_path = os.path.join(output_dir, "maf_to_maf_output.maf")

print("📁 Output files will be saved to:")
print(f"  - VCF output: {vcf_output_path}")
print(f"  - MAF output: {maf_output_path}")


📁 Output files will be saved to:
  - VCF output: ./output/maf_to_vcf_output.vcf
  - MAF output: ./output/maf_to_maf_output.maf


## Export to VCF format


In [6]:
print("📝 Exporting to VCF format...")

try:
    # Export to VCF format
    pymutation_obj.to_vcf(vcf_output_path)
    
    # Check if the file was created
    if os.path.exists(vcf_output_path):
        print(f"✅ VCF file created successfully: {vcf_output_path}")
        print(f"   File size: {os.path.getsize(vcf_output_path) / (1024 * 1024):.2f} MB")
    else:
        print("❌ VCF file was not created")
        
except Exception as e:
    print(f"❌ Error exporting to VCF: {e}")
    import traceback
    traceback.print_exc()


2025-08-01 01:51:47,147 | INFO | pyMut.output | Starting VCF export to: output/maf_to_vcf_output.vcf
2025-08-01 01:51:47,150 | INFO | pyMut.output | Starting to process 2091 variants from 193 samples


📝 Exporting to VCF format...


2025-08-01 01:51:47,263 | INFO | pyMut.output | Processing genotype data to replace bases with indices
2025-08-01 01:51:50,943 | INFO | pyMut.output | Writing 2091 variants to file
2025-08-01 01:51:51,008 | INFO | pyMut.output | Progress: 2091/2091 variants written (100.0%)
2025-08-01 01:51:51,011 | INFO | pyMut.output | VCF export completed successfully: 2091 variants processed and written to output/maf_to_vcf_output.vcf
2025-08-01 01:51:51,012 | INFO | pyMut.output | Conversion summary: 193 samples, 2091 input variants, 2091 output variants


✅ VCF file created successfully: ./output/maf_to_vcf_output.vcf
   File size: 1.84 MB


## Export to MAF format


In [7]:
print("📝 Exporting to MAF format...")

try:
    # Export to MAF format
    pymutation_obj.to_maf(maf_output_path)
    
    # Check if the file was created
    if os.path.exists(maf_output_path):
        print(f"✅ MAF file created successfully: {maf_output_path}")
        print(f"   File size: {os.path.getsize(maf_output_path) / (1024 * 1024):.2f} MB")
    else:
        print("❌ MAF file was not created")
        
except Exception as e:
    print(f"❌ Error exporting to MAF: {e}")
    import traceback
    traceback.print_exc()


2025-08-01 01:51:51,038 | INFO | pyMut.output | Starting MAF export to: output/maf_to_maf_output.maf
2025-08-01 01:51:51,039 | INFO | pyMut.output | Starting to process 2091 variants from 193 samples
2025-08-01 01:51:51,043 | INFO | pyMut.output | Processing sample 1/193: TCGA-AB-2988 (0.5%)
2025-08-01 01:51:51,056 | INFO | pyMut.output | Sample TCGA-AB-2988: 15 variants found
2025-08-01 01:51:51,085 | INFO | pyMut.output | Processing sample 3/193: TCGA-AB-3009 (1.6%)
2025-08-01 01:51:51,098 | INFO | pyMut.output | Sample TCGA-AB-3009: 42 variants found
2025-08-01 01:51:51,132 | INFO | pyMut.output | Processing sample 6/193: TCGA-AB-2920 (3.1%)
2025-08-01 01:51:51,144 | INFO | pyMut.output | Sample TCGA-AB-2920: 11 variants found
2025-08-01 01:51:51,180 | INFO | pyMut.output | Processing sample 9/193: TCGA-AB-2999 (4.7%)
2025-08-01 01:51:51,191 | INFO | pyMut.output | Sample TCGA-AB-2999: 11 variants found
2025-08-01 01:51:51,224 | INFO | pyMut.output | Processing sample 12/193: TCGA-A

📝 Exporting to MAF format...


2025-08-01 01:51:51,235 | INFO | pyMut.output | Sample TCGA-AB-2923: 23 variants found
2025-08-01 01:51:51,269 | INFO | pyMut.output | Processing sample 15/193: TCGA-AB-2931 (7.8%)
2025-08-01 01:51:51,280 | INFO | pyMut.output | Sample TCGA-AB-2931: 11 variants found
2025-08-01 01:51:51,312 | INFO | pyMut.output | Processing sample 18/193: TCGA-AB-2906 (9.3%)
2025-08-01 01:51:51,322 | INFO | pyMut.output | Sample TCGA-AB-2906: 15 variants found
2025-08-01 01:51:51,354 | INFO | pyMut.output | Processing sample 21/193: TCGA-AB-2945 (10.9%)
2025-08-01 01:51:51,363 | INFO | pyMut.output | Sample TCGA-AB-2945: 13 variants found
2025-08-01 01:51:51,396 | INFO | pyMut.output | Processing sample 24/193: TCGA-AB-2952 (12.4%)
2025-08-01 01:51:51,407 | INFO | pyMut.output | Sample TCGA-AB-2952: 15 variants found
2025-08-01 01:51:51,439 | INFO | pyMut.output | Processing sample 27/193: TCGA-AB-2862 (14.0%)
2025-08-01 01:51:51,451 | INFO | pyMut.output | Sample TCGA-AB-2862: 11 variants found
2025-

✅ MAF file created successfully: ./output/maf_to_maf_output.maf
   File size: 0.35 MB


## Examine the exported files


In [8]:
# Show the first few lines of the exported VCF file
print("🔍 First 10 lines of the exported VCF file:")
!head -10 {vcf_output_path}


🔍 First 10 lines of the exported VCF file:
##fileformat=VCFv4.3
##fileDate=20250801
##source=https://github.com/Luisruimor/pyMut
##reference=37
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=9>
##contig=<ID=X>
##contig=<ID=14>
##contig=<ID=2>
##contig=<ID=12>


In [9]:
# Show the first few lines of the exported MAF file
print("🔍 First 10 lines of the exported MAF file:")
!head -10 {maf_output_path}


🔍 First 10 lines of the exported MAF file:
Hugo_Symbol	Entrez_Gene_Id	Center	NCBI_Build	NCBI_Build	Chromosome	Start_Position	Start_Position	End_Position	Strand	Variant_Classification	Variant_Type	Reference_Allele	Reference_Allele	Tumor_Seq_Allele1	Tumor_Seq_Allele1	Tumor_Seq_Allele2	Tumor_Seq_Allele2	dbSNP_RS	Tumor_Sample_Barcode	Tumor_Sample_Barcode	FILTER	i_TumorVAF_WU	End_position	Protein_Change	i_transcript_name	QUAL
BAAT	570	genome.wustl.edu	37	37	9	104124840	104124840	104124840	+	MISSENSE_MUTATION	SNP	G	G	G	G	A	A	.	TCGA-AB-2988	TCGA-AB-2988	.	48.35	104124840	p.T376M	NM_001701.1	.
TKTL1	8277	genome.wustl.edu	37	37	X	153557894	153557894	153557894	+	SILENT	SNP	C	C	C	C	T	T	.	TCGA-AB-2988	TCGA-AB-2988	.	41.11	153557894	p.A549A	NM_012253.1	.
ANG	283	genome.wustl.edu	37	37	14	21161742	21161742	21161742	+	MISSENSE_MUTATION	SNP	G	G	G	G	A	A	.	TCGA-AB-2988	TCGA-AB-2988	.	47.43	21161742	p.V7I	NM_001097577.2	.
DNMT3A	1788	genome.wustl.edu	37	37	2	25457161	25457161	25457161	+	MISSENSE_MUTA

## Summary

In this notebook, we demonstrated how to:
1. Read a MAF file using `read_maf` with assembly=37
2. Export the PyMutation object to VCF format using `to_vcf`
3. Export the PyMutation object to MAF format using `to_maf`

These conversion capabilities allow for seamless interoperability between different mutation data formats.