# MAF to VCF and MAF Conversion

This notebook demonstrates how to:
1. Read a MAF file using `read_maf` with assembly=37
2. Export the PyMutation object to VCF format using `to_vcf`
3. Export the PyMutation object to MAF format using `to_maf`


In [1]:
import sys
import os

# Configure project directory
project_root = os.path.abspath(os.path.join(os.getcwd(), '..', '..', '..', 'src'))
if project_root not in sys.path:
    sys.path.append(project_root)

print('✅ PYTHONPATH configured to include:', project_root)


✅ PYTHONPATH configured to include: /home/luisruimore/Escritorio/TFG/src


## Import the necessary functions


In [2]:
from pyMut import read_maf

print("✅ Functions imported correctly")


✅ Functions imported correctly


## Define the path to the MAF file


In [3]:
# Path to the MAF file
maf_path = "../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz"

print("📁 File to process:")
print(f"  - MAF file: {maf_path}")

# Verify that the file exists
if os.path.exists(maf_path):
    print("✅ File found")
else:
    print("❌ File not found")


📁 File to process:
  - MAF file: ../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz
✅ File found


## Read the MAF file with assembly=37


In [4]:
print("📖 Reading MAF file...")

try:
    # Read the MAF file with assembly=37
    pymutation_obj = read_maf(maf_path, "37")
    
    print("✅ PyMutation object created successfully")
    print(f"   DataFrame shape: {pymutation_obj.data.shape}")
    print(f"   Number of variants: {len(pymutation_obj.data)}")
    print(f"   Number of columns: {len(pymutation_obj.data.columns)}")
    print(f"   Number of samples: {len(pymutation_obj.samples)}")
    
except Exception as e:
    print(f"❌ Error reading the file: {e}")
    import traceback
    traceback.print_exc()


2025-07-30 22:48:24,678 | INFO | pyMut.input | Starting MAF reading: ../../../src/pyMut/data/examples/MAF/tcga_laml.maf.gz
2025-07-30 22:48:24,679 | INFO | pyMut.input | Loading from cache: ../../../src/pyMut/data/examples/MAF/.pymut_cache/tcga_laml.maf_3515f757055e6890.parquet
2025-07-30 22:48:24,703 | INFO | pyMut.input | Cache loaded successfully in 0.03 seconds


📖 Reading MAF file...
✅ PyMutation object created successfully
   DataFrame shape: (2207, 216)
   Number of variants: 2207
   Number of columns: 216
   Number of samples: 193


## Show the first rows of the DataFrame


In [5]:
print("🔍 First 3 rows of the DataFrame:")
pymutation_obj.head(3)


🔍 First 3 rows of the DataFrame:


Unnamed: 0,CHROM,POS,ID,REF,ALT,QUAL,FILTER,TCGA-AB-2988,TCGA-AB-2869,TCGA-AB-3009,...,Strand,Variant_Classification,Variant_Type,Reference_Allele,Tumor_Seq_Allele1,Tumor_Seq_Allele2,Tumor_Sample_Barcode,Protein_Change,i_TumorVAF_WU,i_transcript_name
0,chr17,67170917,.,T,C,.,.,T|C,T|T,T|T,...,+,SPLICE_SITE,SNP,T,T,C,TCGA-AB-2988,p.K960R,45.66,NM_080282.3
1,chr1,94490594,.,C,T,.,.,C|C,C|T,C|C,...,+,MISSENSE_MUTATION,SNP,C,C,T,TCGA-AB-2869,p.R1517H,38.12,NM_000350.2
2,chr2,169780250,.,G,A,.,.,G|G,G|G,G|A,...,+,MISSENSE_MUTATION,SNP,G,G,A,TCGA-AB-3009,p.A1283V,46.972177,NM_003742.2


## Define output paths for VCF and MAF exports


In [6]:
# Create output directory if it doesn't exist
output_dir = "./output"
os.makedirs(output_dir, exist_ok=True)

# Define output paths
vcf_output_path = os.path.join(output_dir, "maf_to_vcf_output.vcf")
maf_output_path = os.path.join(output_dir, "maf_to_maf_output.maf")

print("📁 Output files will be saved to:")
print(f"  - VCF output: {vcf_output_path}")
print(f"  - MAF output: {maf_output_path}")


📁 Output files will be saved to:
  - VCF output: ./output/maf_to_vcf_output.vcf
  - MAF output: ./output/maf_to_maf_output.maf


## Export to VCF format


In [7]:
print("📝 Exporting to VCF format...")

try:
    # Export to VCF format
    pymutation_obj.to_vcf(vcf_output_path)
    
    # Check if the file was created
    if os.path.exists(vcf_output_path):
        print(f"✅ VCF file created successfully: {vcf_output_path}")
        print(f"   File size: {os.path.getsize(vcf_output_path) / (1024 * 1024):.2f} MB")
    else:
        print("❌ VCF file was not created")
        
except Exception as e:
    print(f"❌ Error exporting to VCF: {e}")
    import traceback
    traceback.print_exc()


2025-07-30 22:48:24,897 | INFO | pyMut.output | Starting VCF export to: output/maf_to_vcf_output.vcf
2025-07-30 22:48:24,899 | INFO | pyMut.output | Starting to process 2207 variants from 193 samples


📝 Exporting to VCF format...


2025-07-30 22:48:25,010 | INFO | pyMut.output | Processing genotype data to replace bases with indices
2025-07-30 22:48:28,960 | INFO | pyMut.output | Writing 2207 variants to file
2025-07-30 22:48:29,011 | INFO | pyMut.output | Progress: 2207/2207 variants written (100.0%)
2025-07-30 22:48:29,012 | INFO | pyMut.output | VCF export completed successfully: 2207 variants processed and written to output/maf_to_vcf_output.vcf
2025-07-30 22:48:29,012 | INFO | pyMut.output | Conversion summary: 193 samples, 2207 input variants, 2207 output variants


✅ VCF file created successfully: ./output/maf_to_vcf_output.vcf
   File size: 1.94 MB


## Export to MAF format


In [8]:
print("📝 Exporting to MAF format...")

try:
    # Export to MAF format
    pymutation_obj.to_maf(maf_output_path)
    
    # Check if the file was created
    if os.path.exists(maf_output_path):
        print(f"✅ MAF file created successfully: {maf_output_path}")
        print(f"   File size: {os.path.getsize(maf_output_path) / (1024 * 1024):.2f} MB")
    else:
        print("❌ MAF file was not created")
        
except Exception as e:
    print(f"❌ Error exporting to MAF: {e}")
    import traceback
    traceback.print_exc()


2025-07-30 22:48:29,118 | INFO | pyMut.output | Starting MAF export to: output/maf_to_maf_output.maf
2025-07-30 22:48:29,120 | INFO | pyMut.output | Starting to process 2207 variants from 193 samples
2025-07-30 22:48:29,124 | INFO | pyMut.output | Processing sample 1/193: TCGA-AB-2988 (0.5%)
2025-07-30 22:48:29,137 | INFO | pyMut.output | Sample TCGA-AB-2988: 15 variants found
2025-07-30 22:48:29,165 | INFO | pyMut.output | Processing sample 3/193: TCGA-AB-3009 (1.6%)
2025-07-30 22:48:29,180 | INFO | pyMut.output | Sample TCGA-AB-3009: 42 variants found


📝 Exporting to MAF format...


2025-07-30 22:48:29,222 | INFO | pyMut.output | Processing sample 6/193: TCGA-AB-2920 (3.1%)
2025-07-30 22:48:29,235 | INFO | pyMut.output | Sample TCGA-AB-2920: 11 variants found
2025-07-30 22:48:29,267 | INFO | pyMut.output | Processing sample 9/193: TCGA-AB-2999 (4.7%)
2025-07-30 22:48:29,277 | INFO | pyMut.output | Sample TCGA-AB-2999: 11 variants found
2025-07-30 22:48:29,307 | INFO | pyMut.output | Processing sample 12/193: TCGA-AB-2923 (6.2%)
2025-07-30 22:48:29,317 | INFO | pyMut.output | Sample TCGA-AB-2923: 23 variants found
2025-07-30 22:48:29,348 | INFO | pyMut.output | Processing sample 15/193: TCGA-AB-2931 (7.8%)
2025-07-30 22:48:29,359 | INFO | pyMut.output | Sample TCGA-AB-2931: 11 variants found
2025-07-30 22:48:29,395 | INFO | pyMut.output | Processing sample 18/193: TCGA-AB-2906 (9.3%)
2025-07-30 22:48:29,410 | INFO | pyMut.output | Sample TCGA-AB-2906: 15 variants found
2025-07-30 22:48:29,453 | INFO | pyMut.output | Processing sample 21/193: TCGA-AB-2945 (10.9%)
20

✅ MAF file created successfully: ./output/maf_to_maf_output.maf
   File size: 0.35 MB


## Examine the exported files


In [9]:
# Show the first few lines of the exported VCF file
print("🔍 First 10 lines of the exported VCF file:")
!head -10 {vcf_output_path}


🔍 First 10 lines of the exported VCF file:
##fileformat=VCFv4.3
##fileDate=20250730
##source=https://github.com/Luisruimor/pyMut
##reference=37
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=17>
##contig=<ID=1>
##contig=<ID=2>
##contig=<ID=16>
##contig=<ID=6>


In [10]:
# Show the first few lines of the exported MAF file
print("🔍 First 10 lines of the exported MAF file:")
!head -10 {maf_output_path}


🔍 First 10 lines of the exported MAF file:
Hugo_Symbol	Entrez_Gene_Id	Center	NCBI_Build	NCBI_Build	Chromosome	Start_Position	Start_Position	End_Position	Strand	Variant_Classification	Variant_Type	Reference_Allele	Reference_Allele	Tumor_Seq_Allele1	Tumor_Seq_Allele1	Tumor_Seq_Allele2	Tumor_Seq_Allele2	dbSNP_RS	Tumor_Sample_Barcode	Tumor_Sample_Barcode	FILTER	End_position	i_transcript_name	QUAL	i_TumorVAF_WU	Protein_Change
ABCA10	10349	genome.wustl.edu	37	37	17	67170917	67170917	67170917	+	SPLICE_SITE	SNP	T	T	T	T	C	C	.	TCGA-AB-2988	TCGA-AB-2988	.	67170917	NM_080282.3	.	45.66	p.K960R
ANG	283	genome.wustl.edu	37	37	14	21161742	21161742	21161742	+	MISSENSE_MUTATION	SNP	G	G	G	G	A	A	.	TCGA-AB-2988	TCGA-AB-2988	.	21161742	NM_001097577.2	.	47.43	p.V7I
BAAT	570	genome.wustl.edu	37	37	9	104124840	104124840	104124840	+	MISSENSE_MUTATION	SNP	G	G	G	G	A	A	.	TCGA-AB-2988	TCGA-AB-2988	.	104124840	NM_001701.1	.	48.35	p.T376M
CDH4	1002	genome.wustl.edu	37	37	20	60318829	60318829	60318829	+	MISSENSE_M

## Summary

In this notebook, we demonstrated how to:
1. Read a MAF file using `read_maf` with assembly=37
2. Export the PyMutation object to VCF format using `to_vcf`
3. Export the PyMutation object to MAF format using `to_maf`

These conversion capabilities allow for seamless interoperability between different mutation data formats.