-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get ref genome from meta_study instead of command line #7969
Conversation
@@ -37,7 +37,7 @@ | |||
ADD_CASE_LIST_CLASS = "org.mskcc.cbio.portal.scripts.AddCaseList" | |||
VERSION_UTIL_CLASS = "org.mskcc.cbio.portal.util.VersionUtil" | |||
|
|||
# provides a key for data types to metafile specification dict. | |||
# provides a key for data types to metafile specification dict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somehow my editor automatically removes trailing whitespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some sidenotes
@@ -685,7 +686,6 @@ def validate_types_and_id(meta_dictionary, logger, filename): | |||
def parse_metadata_file(filename, | |||
logger, | |||
study_id=None, | |||
genome_name=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somehow this parameter is passed on but never used.
required=False) | ||
parser.add_argument('-ncbi', '--ncbi_build_number', type=str, default='37', | ||
help='NCBI reference genome build number (default: assumed 37 for UCSC reference genome build hg19)', | ||
required=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arguments became obsolete now
'hg38':('human','38','hg38'), | ||
'mm10':('mouse','38','mm10') | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if we should also allow GRCxXX values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should follow the values defined in portal.properties
, so it's good like this. If you want to add the GRCxXX values we should change it in the properties. And just a thought: if we want to change the genome build properties in portal.properties
I think we should simplify it to one property: hg19
/hg38
/mm10
already gives you all the information given from the three properties that we currently have.
@@ -4864,8 +4892,6 @@ def validate_data_relations(validators_by_meta_type, logger): | |||
def request_from_portal_api(server_url, api_name, logger): | |||
"""Send a request to the portal API and return the decoded JSON object.""" | |||
|
|||
print(api_name) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relic from previous work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM! Could you also update the docs https://docs.cbioportal.org/5.1-data-loading/data-loading/using-the-dataset-validator#validation-of-non-human-data ?
And maybe double check if https://github.com/cBioPortal/cbioportal-docker-compose still works for grch38. Thanks! |
Just tested, it still works. I did not change anything there. |
portal_instance.species, portal_instance.ncbi_build, portal_instance.genome_build = \ | ||
reference_genome_map[meta_dictionary['reference_genome']] | ||
else: | ||
logger.info('No reference genome specified -- using default (hg19)', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is wrong to assume that the default is hg19. The default is the genome specified in portal.properties
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to set the default to hg19 instead of being defined in portal.properties
. But it should be checked if the genome build properties in portal.properties
are only used here before removing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The portal.properties
are not used in the validator. AFAIK, those are only used in the Java code, and some parameters are passed on to the frontend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SRodenburg I see. I was confused by the portal_instance
name.
Defaults for these properties are, indeed, set here: https://github.com/cBioPortal/cbioportal/pull/7969/files/e805ff2f295eae71032d52dfcd754849a07a401c#diff-9be14d6564536099d9c431cf330dc2ad62abfecbbc191ac7c5814855dc37685eL305
So I think this three values can be simplified into one. As I mentioned, hg19/hg38/mm10 already gives you all the information given from the three properties that we currently have, and there is no need to deviate from the reference_genome
property
'hg38':('human','38','hg38'), | ||
'mm10':('mouse','38','mm10') | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should follow the values defined in portal.properties
, so it's good like this. If you want to add the GRCxXX values we should change it in the properties. And just a thought: if we want to change the genome build properties in portal.properties
I think we should simplify it to one property: hg19
/hg38
/mm10
already gives you all the information given from the three properties that we currently have.
@inodb docs are updated ✔️ |
|
||
``` | ||
|
||
* You can also overwrite default genome values by using the following options of the importer script. For that, you have to specify the `reference_genome` field in the `meta_study.txt` file of your study. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not entirely clear here what the values are supposed to be but will update that
* In accordance with cBioPortal/cbioportal#7969 which is merged since version 3.4.17
* In accordance with cBioPortal/cbioportal#7969 which is merged since version 3.4.17
* In accordance with cBioPortal/cbioportal#7969 which is merged since version 3.4.17
The reference genome (and thereby species and build) is now retrieved from the
meta_study.txt
file after the optionalreference_genome
tag.Allowed values are hg19, hg38 and mm10. The validator will raise an error when any other value is encountered, and will default to hg19.
I also added a more specific error when validating the
NCBI_Build
column in MAF files, validating whether the correct reference genome is used.This was already present in the FIle Formats documentation.