Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to create or find "genes.refGene" file for hg19 and hg38 #39

Open
anopperl opened this issue Aug 18, 2017 · 11 comments
Open

how to create or find "genes.refGene" file for hg19 and hg38 #39

anopperl opened this issue Aug 18, 2017 · 11 comments

Comments

@anopperl
Copy link

how to create or find "genes.refGene" file for hg19, hg38.
i have got "genes.refGene" file from USSC but these are not working for my case

error shows :

Traceback (most recent call last):
File "first_py.py", line 38, in
hgvs_name, genome, get_transcript=get_transcript)
File "build/bdist.linux-x86_64/egg/pyhgvs/init.py", line 1356, in parse_hgvs_name
ValueError: transcript is required

@naegelyd
Copy link
Contributor

@anopperl Could you post the code you are trying to run that caused the error shown above? Might help in coming up with a solution. Thanks.

@lacek
Copy link

lacek commented Aug 29, 2017

There is already a genes.refGene in the directory pyhgvs/data of this repository. It is simply old but working.

"genes.refGene" is not directly available at UCSC. However, you should be able to created one with latest tables from UCSC database. Refer #26 for an example for hg19. For hg38, you simply need to replace hg19 by hg38 in the example command. (Or download the refGene.txt.gz from ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz)

@wxc0218
Copy link

wxc0218 commented Aug 9, 2019

I used the following command to get the "genes.refGene" file, run the Example usage, still still have the above error 。
mysql -ugenomep --password=password -hgenome-mysql.cse.ucsc.edu -ACD hg19 -BNe "SELECT r.bin,CONCAT(r.name, '.', g.version) AS name,r.chrom,r.strand,r.txStart,r.txEnd,r.cdsStart,r.cdsEnd,r.exonCount,r.exonStarts,r.exonEnds,r.score,r.name2,r.cdsStartStat,r.cdsEndStat,r.exonFrames FROM hg19.refGene r, hgFixed.gbCdnaInfo g WHERE r.name = g.acc" > genes.refGene

@davmlaw
Copy link

davmlaw commented Aug 21, 2019

This is a dupe, see other issue for scripts: #26 (comment)

@davmlaw
Copy link

davmlaw commented Feb 3, 2022

I've made a Python package that provides ~800k transcripts (both RefSeq and Ensembl) for PyHGVS

https://github.com/SACGF/cdot

You can either download a JSON.gz file, or use a REST service. To use it:

from cdot.pyhgvs.pyhgvs_transcript import JSONPyHGVSTranscriptFactory, RESTPyHGVSTranscriptFactory

factory = RESTPyHGVSTranscriptFactory()
# factory = JSONPyHGVSTranscriptFactory(["./cdot-0.2.1.refseq.grch38.json.gz"])  # Uses local JSON file
pyhgvs.parse_hgvs_name(hgvs_c, genome, get_transcript=factory.get_transcript_grch37)

@GuoFengWang
Copy link

I've made a Python package that provides ~800k transcripts (both RefSeq and Ensembl) for PyHGVS

https://github.com/SACGF/cdot

You can either download a JSON.gz file, or use a REST service. To use it:

from cdot.pyhgvs.pyhgvs_transcript import JSONPyHGVSTranscriptFactory, RESTPyHGVSTranscriptFactory

factory = RESTPyHGVSTranscriptFactory()
# factory = JSONPyHGVSTranscriptFactory(["./cdot-0.2.1.refseq.grch38.json.gz"])  # Uses local JSON file
pyhgvs.parse_hgvs_name(hgvs_c, genome, get_transcript=factory.get_transcript_grch37)

Hi,@davmlaw, I want to get HGVS cdot from REF/ALT using pyhgvs(not parse hgvs string), how could I use the cdot package to help me? Could you show me some example of scripts? Thanks

@davmlaw
Copy link

davmlaw commented Feb 27, 2023

First, install cdot:

python3 -m pip install cdot

Then get the cdot data:

wget https://github.com/SACGF/cdot/releases/download/v0.2.12/cdot-0.2.12.refseq.grch37_grch38.json.gz

This is based off the pyhgvs README - basically we need to load cdot data files to populate the pyhgvs "transcript" record (they wrote their own get_transcript() method in the README)

import pyhgvs as hgvs
from pysam.libcfaidx import FastaFile
from cdot.pyhgvs.pyhgvs_transcript import JSONPyHGVSTranscriptFactory

factory = JSONPyHGVSTranscriptFactory(["./cdot-0.2.12.refseq.grch37_grch38.json.gz"])
transcript = factory.get_transcript_grch37("NM_000352.3")  # change to 38 if you want that build
chrom, offset, ref, alt = ('chr11', 17496508, 'T', 'C')
genome = FastaFile("/data/annotation/fasta/GCF_000001405.25_GRCh37.p13_genomic.fna.gz")
hgvs_name = hgvs.format_hgvs_name(
    chrom, offset, ref, alt, genome, transcript)

@GuoFengWang
Copy link

@davmlaw Thank you very much! And I wonder is there any way to get all the relevant transcripts based on chrom and offset?

@davmlaw
Copy link

davmlaw commented Mar 1, 2023

The easiest way to do that is probably using bedtools on a GTF then pulling out transcripts from the cut down GTF

Another way would be to use another Python library for doing genomic type stuff that I wrote called pyreference untested code would be:

my_region_iv = GenomicInterval("chr1", 100, 10000)
for iv, transcript_set in reference.genomic_transcripts[my_region_iv]
    pass

@GuoFengWang
Copy link

@davmlaw Hi, your work is great! I used your package and pyhgvs to generate HGVS name. Then, I used another package: hgvs to parse the variant, but it raised HGVSParserError: raise HGVSParseError("{s}: char {exc.position}: {reason}".format( hgvs.exceptions.HGVSParseError: NM_001286350.2(DACT2):c.681_692dup12: char 35: expected EOF . Why hgvs can not parse the result which generated by pyhgvs? How could I solve it?

@davmlaw
Copy link

davmlaw commented Mar 5, 2023

Hi, HGVS is very broad spec and each library only parses a subset of it.

I think the issue is the 12 at the end (this is redundant, and you could work it out from the difference between positions) you need to regenerate them without the numbers or strip them off

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants