how to create or find "genes.refGene" file for hg19 and hg38 #39

anopperl · 2017-08-18T07:14:44Z

how to create or find "genes.refGene" file for hg19, hg38.
i have got "genes.refGene" file from USSC but these are not working for my case

error shows :

Traceback (most recent call last):
File "first_py.py", line 38, in
hgvs_name, genome, get_transcript=get_transcript)
File "build/bdist.linux-x86_64/egg/pyhgvs/init.py", line 1356, in parse_hgvs_name
ValueError: transcript is required

naegelyd · 2017-08-18T16:49:43Z

@anopperl Could you post the code you are trying to run that caused the error shown above? Might help in coming up with a solution. Thanks.

lacek · 2017-08-29T09:49:11Z

There is already a genes.refGene in the directory pyhgvs/data of this repository. It is simply old but working.

"genes.refGene" is not directly available at UCSC. However, you should be able to created one with latest tables from UCSC database. Refer #26 for an example for hg19. For hg38, you simply need to replace hg19 by hg38 in the example command. (Or download the refGene.txt.gz from ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz)

wxc0218 · 2019-08-09T02:41:06Z

I used the following command to get the "genes.refGene" file, run the Example usage, still still have the above error 。
mysql -ugenomep --password=password -hgenome-mysql.cse.ucsc.edu -ACD hg19 -BNe "SELECT r.bin,CONCAT(r.name, '.', g.version) AS name,r.chrom,r.strand,r.txStart,r.txEnd,r.cdsStart,r.cdsEnd,r.exonCount,r.exonStarts,r.exonEnds,r.score,r.name2,r.cdsStartStat,r.cdsEndStat,r.exonFrames FROM hg19.refGene r, hgFixed.gbCdnaInfo g WHERE r.name = g.acc" > genes.refGene

davmlaw · 2019-08-21T14:23:06Z

This is a dupe, see other issue for scripts: #26 (comment)

davmlaw · 2022-02-03T06:56:48Z

I've made a Python package that provides ~800k transcripts (both RefSeq and Ensembl) for PyHGVS

https://github.com/SACGF/cdot

You can either download a JSON.gz file, or use a REST service. To use it:

from cdot.pyhgvs.pyhgvs_transcript import JSONPyHGVSTranscriptFactory, RESTPyHGVSTranscriptFactory

factory = RESTPyHGVSTranscriptFactory()
# factory = JSONPyHGVSTranscriptFactory(["./cdot-0.2.1.refseq.grch38.json.gz"])  # Uses local JSON file
pyhgvs.parse_hgvs_name(hgvs_c, genome, get_transcript=factory.get_transcript_grch37)

GuoFengWang · 2023-02-24T08:43:06Z

I've made a Python package that provides ~800k transcripts (both RefSeq and Ensembl) for PyHGVS

https://github.com/SACGF/cdot

You can either download a JSON.gz file, or use a REST service. To use it:
from cdot.pyhgvs.pyhgvs_transcript import JSONPyHGVSTranscriptFactory, RESTPyHGVSTranscriptFactory

factory = RESTPyHGVSTranscriptFactory()
# factory = JSONPyHGVSTranscriptFactory(["./cdot-0.2.1.refseq.grch38.json.gz"])  # Uses local JSON file
pyhgvs.parse_hgvs_name(hgvs_c, genome, get_transcript=factory.get_transcript_grch37)

Hi,@davmlaw, I want to get HGVS cdot from REF/ALT using pyhgvs(not parse hgvs string), how could I use the cdot package to help me? Could you show me some example of scripts? Thanks

davmlaw · 2023-02-27T05:34:59Z

First, install cdot:

python3 -m pip install cdot

Then get the cdot data:

wget https://github.com/SACGF/cdot/releases/download/v0.2.12/cdot-0.2.12.refseq.grch37_grch38.json.gz

This is based off the pyhgvs README - basically we need to load cdot data files to populate the pyhgvs "transcript" record (they wrote their own get_transcript() method in the README)

import pyhgvs as hgvs
from pysam.libcfaidx import FastaFile
from cdot.pyhgvs.pyhgvs_transcript import JSONPyHGVSTranscriptFactory

factory = JSONPyHGVSTranscriptFactory(["./cdot-0.2.12.refseq.grch37_grch38.json.gz"])
transcript = factory.get_transcript_grch37("NM_000352.3")  # change to 38 if you want that build
chrom, offset, ref, alt = ('chr11', 17496508, 'T', 'C')
genome = FastaFile("/data/annotation/fasta/GCF_000001405.25_GRCh37.p13_genomic.fna.gz")
hgvs_name = hgvs.format_hgvs_name(
    chrom, offset, ref, alt, genome, transcript)

GuoFengWang · 2023-03-01T09:02:43Z

@davmlaw Thank you very much! And I wonder is there any way to get all the relevant transcripts based on chrom and offset?

davmlaw · 2023-03-01T09:36:48Z

The easiest way to do that is probably using bedtools on a GTF then pulling out transcripts from the cut down GTF

Another way would be to use another Python library for doing genomic type stuff that I wrote called pyreference untested code would be:

my_region_iv = GenomicInterval("chr1", 100, 10000)
for iv, transcript_set in reference.genomic_transcripts[my_region_iv]
    pass

GuoFengWang · 2023-03-03T08:11:33Z

@davmlaw Hi, your work is great! I used your package and pyhgvs to generate HGVS name. Then, I used another package: hgvs to parse the variant, but it raised HGVSParserError: raise HGVSParseError("{s}: char {exc.position}: {reason}".format( hgvs.exceptions.HGVSParseError: NM_001286350.2(DACT2):c.681_692dup12: char 35: expected EOF . Why hgvs can not parse the result which generated by pyhgvs? How could I solve it?

davmlaw · 2023-03-05T07:58:00Z

Hi, HGVS is very broad spec and each library only parses a subset of it.

I think the issue is the 12 at the end (this is redundant, and you could work it out from the difference between positions) you need to regenerate them without the numbers or strip them off

davmlaw mentioned this issue Mar 1, 2023

hgvs/pyhgvs/data/genes.refGen file #63

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to create or find "genes.refGene" file for hg19 and hg38 #39

how to create or find "genes.refGene" file for hg19 and hg38 #39

anopperl commented Aug 18, 2017

naegelyd commented Aug 18, 2017

lacek commented Aug 29, 2017

wxc0218 commented Aug 9, 2019

davmlaw commented Aug 21, 2019

davmlaw commented Feb 3, 2022

GuoFengWang commented Feb 24, 2023

davmlaw commented Feb 27, 2023

GuoFengWang commented Mar 1, 2023

davmlaw commented Mar 1, 2023 •

edited

Loading

GuoFengWang commented Mar 3, 2023

davmlaw commented Mar 5, 2023

how to create or find "genes.refGene" file for hg19 and hg38 #39

how to create or find "genes.refGene" file for hg19 and hg38 #39

Comments

anopperl commented Aug 18, 2017

naegelyd commented Aug 18, 2017

lacek commented Aug 29, 2017

wxc0218 commented Aug 9, 2019

davmlaw commented Aug 21, 2019

davmlaw commented Feb 3, 2022

GuoFengWang commented Feb 24, 2023

davmlaw commented Feb 27, 2023

GuoFengWang commented Mar 1, 2023

davmlaw commented Mar 1, 2023 • edited Loading

GuoFengWang commented Mar 3, 2023

davmlaw commented Mar 5, 2023

davmlaw commented Mar 1, 2023 •

edited

Loading