Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get ref genome from meta_study instead of command line #7969

Merged
merged 2 commits into from
Oct 16, 2020

Conversation

SRodenburg
Copy link

The reference genome (and thereby species and build) is now retrieved from the meta_study.txt file after the optional reference_genome tag.

Allowed values are hg19, hg38 and mm10. The validator will raise an error when any other value is encountered, and will default to hg19.

I also added a more specific error when validating the NCBI_Build column in MAF files, validating whether the correct reference genome is used.

This was already present in the FIle Formats documentation.

@SRodenburg SRodenburg requested review from pvannierop, oplantalech and inodb and removed request for pvannierop October 15, 2020 15:22
@@ -37,7 +37,7 @@
ADD_CASE_LIST_CLASS = "org.mskcc.cbio.portal.scripts.AddCaseList"
VERSION_UTIL_CLASS = "org.mskcc.cbio.portal.util.VersionUtil"

# provides a key for data types to metafile specification dict.
# provides a key for data types to metafile specification dict.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somehow my editor automatically removes trailing whitespace

Copy link
Author

@SRodenburg SRodenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some sidenotes

@@ -685,7 +686,6 @@ def validate_types_and_id(meta_dictionary, logger, filename):
def parse_metadata_file(filename,
logger,
study_id=None,
genome_name=None,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somehow this parameter is passed on but never used.

required=False)
parser.add_argument('-ncbi', '--ncbi_build_number', type=str, default='37',
help='NCBI reference genome build number (default: assumed 37 for UCSC reference genome build hg19)',
required=False)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arguments became obsolete now

'hg38':('human','38','hg38'),
'mm10':('mouse','38','mm10')
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we should also allow GRCxXX values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should follow the values defined in portal.properties, so it's good like this. If you want to add the GRCxXX values we should change it in the properties. And just a thought: if we want to change the genome build properties in portal.properties I think we should simplify it to one property: hg19/hg38/mm10 already gives you all the information given from the three properties that we currently have.

@@ -4864,8 +4892,6 @@ def validate_data_relations(validators_by_meta_type, logger):
def request_from_portal_api(server_url, api_name, logger):
"""Send a request to the portal API and return the decoded JSON object."""

print(api_name)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relic from previous work

Copy link
Member

@inodb inodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inodb
Copy link
Member

inodb commented Oct 15, 2020

And maybe double check if https://github.com/cBioPortal/cbioportal-docker-compose still works for grch38. Thanks!

@SRodenburg
Copy link
Author

And maybe double check if https://github.com/cBioPortal/cbioportal-docker-compose still works for grch38. Thanks!

Just tested, it still works. I did not change anything there.

portal_instance.species, portal_instance.ncbi_build, portal_instance.genome_build = \
reference_genome_map[meta_dictionary['reference_genome']]
else:
logger.info('No reference genome specified -- using default (hg19)',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is wrong to assume that the default is hg19. The default is the genome specified in portal.properties, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to set the default to hg19 instead of being defined in portal.properties. But it should be checked if the genome build properties in portal.properties are only used here before removing them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The portal.properties are not used in the validator. AFAIK, those are only used in the Java code, and some parameters are passed on to the frontend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SRodenburg I see. I was confused by the portal_instance name.

Defaults for these properties are, indeed, set here: https://github.com/cBioPortal/cbioportal/pull/7969/files/e805ff2f295eae71032d52dfcd754849a07a401c#diff-9be14d6564536099d9c431cf330dc2ad62abfecbbc191ac7c5814855dc37685eL305

So I think this three values can be simplified into one. As I mentioned, hg19/hg38/mm10 already gives you all the information given from the three properties that we currently have, and there is no need to deviate from the reference_genome property

'hg38':('human','38','hg38'),
'mm10':('mouse','38','mm10')
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should follow the values defined in portal.properties, so it's good like this. If you want to add the GRCxXX values we should change it in the properties. And just a thought: if we want to change the genome build properties in portal.properties I think we should simplify it to one property: hg19/hg38/mm10 already gives you all the information given from the three properties that we currently have.

@SRodenburg
Copy link
Author

@inodb docs are updated ✔️

@inodb inodb changed the title impl. ref genome parsing from meta_study Get ref genome from meta_study instead of command line Oct 16, 2020
@inodb inodb merged commit 0f23ed4 into cBioPortal:master Oct 16, 2020

```

* You can also overwrite default genome values by using the following options of the importer script. For that, you have to specify the `reference_genome` field in the `meta_study.txt` file of your study.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not entirely clear here what the values are supposed to be but will update that

@oplantalech oplantalech deleted the metafile_ref_genome_parsing branch November 19, 2020 09:50
oplantalech added a commit to thehyve/cbioportal-staging that referenced this pull request Feb 26, 2021
* In accordance with cBioPortal/cbioportal#7969 which is merged since version 3.4.17
oplantalech added a commit to thehyve/cbioportal-staging that referenced this pull request Feb 26, 2021
* In accordance with cBioPortal/cbioportal#7969 which is merged since version 3.4.17
oplantalech added a commit to thehyve/cbioportal-staging that referenced this pull request Mar 1, 2021
* In accordance with cBioPortal/cbioportal#7969 which is merged since version 3.4.17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants