Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing required fields from MAF: Hugo_Symbol #397

Closed
gianfilippo opened this issue Sep 30, 2019 · 23 comments
Closed

missing required fields from MAF: Hugo_Symbol #397

gianfilippo opened this issue Sep 30, 2019 · 23 comments

Comments

@gianfilippo
Copy link

Hi,

I have a gatk generated MAF file and I am trying to read it with read.maf. It fails at the validation stage with the following:

-Reading
-Validating
Error in validateMaf(maf = maf, isTCGA = isTCGA, rdup = removeDuplicatedVariants, :
missing required fields from MAF: Hugo_Symbol

Do you have any suggestions ?

Thanks

@PoisonAlien
Copy link
Owner

Hello,
Make sure your file is MAF according to the specifications.. Could you please provide me with a sample file.. maybe top 100 rows?

@gianfilippo
Copy link
Author

Hi,

the sample file is attached. I left the header as well, for you to see the gatk commands I used to generate it.

sample.maf.zip

Thanks
Gianfilippo

@PoisonAlien
Copy link
Owner

Hello,
Thank you for the sample file. I took a look at it, and although it looks like an maf file - its weirdly formatted. R has trouble reading it in since the columns are seperated neither by tab nor by comma. I see that you have used Funcotator for annotations, which I have no experience with. Could you maybe try vcf2maf on your raw Mutect output?

@gianfilippo
Copy link
Author

Hi,

thanks. I will do that and let you know.

Best

@sahuno
Copy link

sahuno commented Jan 24, 2020

Dear all,

vcf2maf came with its own problems while installing on our servers/getting it to work.

Use read.delim() function in R to first read the .maf file generated by funcotator (specifying these options) and then parse it to read.maf() function to get a maf object.

Example codes below;

single.maf.delim <- read.delim(file = "/path/to/maf.file.generated.from.foncotator.maf", sep = "\t", header = TRUE, fill = TRUE, comment.char = "#")
maf.object <- read.maf(maf=single.maf.delim) ###optional, add clinical annotation file with "clinicalData" argument in read.maf()

you can use this to merge multiple .maf files produced by foncotator and use maftools to read and perform your downstream analysis/visualizations

###reading mutiple .maf files as a large list
setwd("/path/to/annotated.with.foncotator.maf.files.maf")
maf.filenames <- list.files(full.names=TRUE)  
list.all.maf.files <- lapply(maf.filenames,function(i){
  read.delim(i, sep = "\t", header = TRUE, fill = TRUE, comment.char = "#")
})                        

###merging the all the .maf files
merged_mafs_files <- maftools::merge_mafs(list.all.maf.files)                                      

###check to see if everything is okay  
getSampleSummary(merged_mafs_files)                                    
getGeneSummary(merged_mafs_files) 

Best of luck

PoisonAlien added a commit that referenced this issue Jan 28, 2020
@PoisonAlien
Copy link
Owner

Okay found a workaround. data.table has a quote argument which helps. Could you please try again and let me know if read.maf works on your funcotator output? You will have to re-install from Github.

@jungminchoilab
Copy link

@PoisonAlien Could you please elaborate how to fix this using data.table? I am having the same issue after annotating mutect2.vcf with the funcotator. I also tried suggestions from @sahuno which did not work in my hands.

Thanks so much for making this tool available to the field.

@PoisonAlien
Copy link
Owner

Hi @jungminchoilab ,
Is its possible to share reproducible sample output file?

@jungminchoilab
Copy link

Hi @PoisonAlien, thanks for your prompt response. Please see below and let me know if you are having issues downloading a file.

an example maf file

@PoisonAlien
Copy link
Owner

Thanks for sharing your file. It seems to work for me. What version of the package are you using?
Another thing I noticed from your file is that the entries in Tumor_Sample_Barcode columns are set to NA. This will be of trouble, and you will have to populate this column with corresponding sample names.

@jungminchoilab
Copy link

Thanks again for checking @PoisonAlien . The version that I installed is maftools_2.2.10. Which one am I supposed to use?

Apparently, funcotator does not annotate tumor sample ID. I will re-annotate it properly. Is normal sample ID also required?

@PoisonAlien
Copy link
Owner

Could you please install the tool from GitHub and try again.
I am considering of adding an argument to read.maf for adding sample names parsed from file names - in case Tumor_Sample_Barcode columns are empty as in this case. Would that be helpful?
No, normal sample IDs are not required.

@jungminchoilab
Copy link

As suggested, I re-installed the tool from GitHub and the version is now maftools_2.3.10 .

I also ran funcotator again and properly annotated a tumor_sample_barcode column (please see below).

re-annotated maf

Unfortunately, I am re-producing the same error message and wonder what might have gone wrong.

> library(maftools)
> #path to MAF file
> test.maf = system.file('extdata', 'YUL0207P.combined.mutect2.strelka2.funcotated.maf', package = 'maftools') 
> test.ar = read.maf(maf = test.maf)
-Reading
-Validating
Error in validateMaf(maf = maf, isTCGA = isTCGA, rdup = removeDuplicatedVariants,  : 
  missing required fields from MAF: Hugo_Symbol

I do appreciate your kind help @PoisonAlien .

@PoisonAlien
Copy link
Owner

You dont have to run system.file command. Its only for accessing example files that are bundled with the packages. Juts pass the path to your maf file.

e.g;

x = maftools::read.maf(maf = "~/Downloads/YUL0207P.combined.mutect2.strelka2.funcotated (1).maf")

P.S; Maftools would be of little help in case of single sample, you would need two or more samples to take advantage of all the functions.

@jungminchoilab
Copy link

@PoisonAlien hmm, same error... and yes, I have multiple cases for the analysis but wanted to troubleshoot with one sample first.

> test.maf = maftools::read.maf(maf = "/Users/jungminchoi/mutect2/YUL0207P.combined.mutect2.strelka2.funcotated.maf")
-Reading
-Validating
Error in validateMaf(maf = maf, isTCGA = isTCGA, rdup = removeDuplicatedVariants,  : 
  missing required fields from MAF: Hugo_Symbol

@PoisonAlien
Copy link
Owner

PoisonAlien commented Apr 5, 2020

I can not think of any reasons why it should give an error. I works for me with the same file.
Anyway could you check following commands:

> mymaf <-
      data.table::fread(
        file = "/Users/jungminchoi/mutect2/YUL0207P.combined.mutect2.strelka2.funcotated.maf",
        sep = "\t",
        stringsAsFactors = FALSE,
        verbose = FALSE,
        data.table = TRUE,
        showProgress = TRUE,
        header = TRUE,
        fill = TRUE,
        skip = "Hugo_Symbol",
        quote = ""
      )
> test.maf = maftools::read.maf(maf = mymaf)

@jungminchoilab
Copy link

Thanks for your patience, @PoisonAlien ! Now getting another error...

> test.maf = maftools::read.maf(maf = mymaf)
-Validating
-Silent variants: 12 
-Summarizing
-Processing clinical data
--Missing clinical data
Error in maftools::read.maf(maf = mymaf) : 
  lazy-load database '/Library/Frameworks/R.framework/Versions/3.6/Resources/library/maftools/R/maftools.rdb' is corrupt
In addition: Warning message:
In maftools::read.maf(maf = mymaf) : internal error -3 in R_decompress1

@PoisonAlien
Copy link
Owner

No problem. I am sorry that you are getting these errors.
For above error you need to restart your R session and it should be fine.
If you still get an error, do you mind opening a new issue. I don't want to spam others linked with the issue.

@jungminchoilab
Copy link

Will do. Thanks so much again for solving this for me.

@naiem836
Copy link

naiem836 commented Oct 4, 2023

i have problem running this code

mymaf <-
  data.table::fread(
    file = "data_mutations.maf",
    sep = "\t",
    stringsAsFactors = FALSE,
    verbose = FALSE,
    data.table = TRUE,
    showProgress = TRUE,
    header = TRUE,
    fill = TRUE,
    skip = "Hugo_Symbol",
    quote = ""
  )

test.maf = maftools::read.maf(maf = mymaf)


#Get the sample information
sample_info = read.table(file="clinicaldata_tcga_bc_f.tsv", sep="\t", header=T)

laml_tcga = read.maf(maf = test.maf, clinicalData = sample_info )


####333################################
test.maf = maftools::read.maf(maf = mymaf)
-Validating
--Removed 5396 duplicated variants
-Silent variants: 51626
-Summarizing
--Possible FLAGS among top ten genes:
  TTN
  MUC16
  SYNE1
-Processing clinical data
--Missing clinical data
-Finished in 3.300s elapsed (3.090s cpu)
> #Get the sample information
> sample_info = read.table(file="clinicaldata_tcga_bc_f.tsv", sep="\t", header=T)

laml_tcga = read.maf(maf = test.maf, clinicalData = sample_info ) 
-Reading Error in file.info(file) : invalid filename argument

@PoisonAlien
Copy link
Owner

Hi,

Note that test.maf should be a path to the maf file or a data.frame/data.table. In your last command test.maf is already an maf object.

@naiem836
Copy link

naiem836 commented Oct 4, 2023

Thank you. The code worked, but another issue has popped up.

After running .....all things

mymaf <-
  data.table::fread(
    file = "data_mutations.maf", sep = "\t",stringsAsFactors = FALSE,verbose = FALSE,data.table = TRUE,showProgress = TRUE,
    header = TRUE,
    fill = TRUE,
    skip = "Hugo_Symbol",
    quote = "")
    
##Get the sample information
sample_info = read.table(file="clinicaldata_tcga_bc_f.tsv", sep="\t", header=T)
laml_tcga = read.maf(maf = mymaf, clinicalData = sample_info,verbose = TRUE )

#################
issue ; Error in summarizeMaf(maf = nonSyn, anno = clinicalData, chatty = verbose) : 
  Tumor_Sample_Barcode column not found in provided clinical data. Rename column containing sample names to Tumor_Sample_Barcode if necessary.

############### 
some people said It has an issue when (mymaf file) here can't recognize the nonsynonymous mutation . so added 

#################
mymaf <-
  data.table::fread(
    file = "data_mutations.maf", sep = "\t",stringsAsFactors = FALSE,verbose = FALSE,data.table = TRUE,showProgress = TRUE,
    header = TRUE,
    fill = TRUE,
    skip = "Hugo_Symbol",
    quote = "",vc_nonSyn=c("Missense_Mutation", "Nonsense_Mutation"))

##### but 
Error in data.table::fread(file = "data_mutations.maf", sep = "\t", stringsAsFactors = FALSE,  : 
  unused argument (vc_nonSyn = c("Missense_Mutation", "Nonsense_Mutation"))
#########
The argument here remains unrecognized 

@PoisonAlien
Copy link
Owner

Hi,

You could directly use read.maf instead of fread. Also note that vc_nonSyn is not an argument to fread.

my_maf <- "data_mutations.maf"

# You need to make sure that this tsv has a column with the column name Tumor_Sample_Barcode containing sample names 
clinical_data <- "clinicaldata_tcga_bc_f.tsv" 

maftools::fread(maf = my_maf, clinicalData = clinical_data, vc_nonSyn = c("Missense_Mutation", "Nonsense_Mutation"))

You should only use vc_nonSyn argument when you want to customise variant classes to be considered as affecting. Missense_Mutation and Nonsense_Mutation are by default considered non synonymous. Please read documentation for ?read.maf for more details. Hope this helps..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants