-
Notifications
You must be signed in to change notification settings - Fork 12
Description
I have a file with output from intersecting the C. virginica mRNA GFF with DMRs, found here. In the last column, there is information about the gene the DMR was found in. Specifically, there is a "product=" designation that describes the protein generated from this mRNA coding region. For example:
ID=rna48;Parent=gene35;Dbxref=GeneID:111114201,Genbank:XM_022452489.1;Name=XM_022452489.1;Note=The sequence of the model RefSeq transcript was modified relative to this genomic sequence to represent the inferred CDS: inserted 2 bases in 2 codons;exception=unclassified transcription discrepancy;gbkey=mRNA;gene=LOC111114201;model_evidence=Supporting evidence includes similarity to: 4 Proteins%2C and 99%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 9 samples with support for all annotated introns;product=vacuolar protein sorting-associated protein 13B-like;transcript_id=XM_022452489.1
For each line, I want to isolate the product information in this column so I can easily describe/visualize the different product functions. I tried just unfolding this column in Excel using semicolon delimiters, but each line has a different amount of information, so the product information doesn't end up in the same column throughout the document. Is there any way to extract the product information and paste it into a new column? Could I specify product= as a delimiter to separate the information into a new column?