Added functionality for writeBambuOutput() to write to gff3#567
Added functionality for writeBambuOutput() to write to gff3#567hafiz-ismail wants to merge 1 commit intodevel_pre_v4from
Conversation
There was a problem hiding this comment.
Pull request overview
Adds optional GFF3 annotation export support to writeBambuOutput() so Bambu results can be written as either GTF (default) or GFF3, and updates user-facing documentation accordingly.
Changes:
- Added
outputAnnFormatparameter towriteBambuOutput()to select"gtf"vs"gff3". - Implemented
writeToGFF3()/writeAnnotationsToGFF3()helpers analogous to existing GTF writers. - Updated README text to mention GFF3 output support.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| README.md | Documents the new GFF3 option for annotation output. |
| R/readWrite.R | Adds outputAnnFormat to writeBambuOutput() and introduces GFF3-writing helpers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| For a full description of the other outputs see [Output Description](#Output-Description) | ||
|
|
||
| The full output can be written to a file using writeBambuOutput(). Using this function will generate six files, including 4four .gtf files(detailed below), and two .txt files for the expression counts at transcript and gene levels. | ||
| The full output can be written to a file using writeBambuOutput(). Using this function will generate six files, including four .gtf/.gff3 files(detailed below), and two .txt files for the expression counts at transcript and gene levels. |
| For a full description of the other outputs see [Output Description](#Output-Description) | ||
|
|
||
| The full output can be written to a file using writeBambuOutput(). Using this function will generate six files, including 4four .gtf files(detailed below), and two .txt files for the expression counts at transcript and gene levels. | ||
| The full output can be written to a file using writeBambuOutput(). Using this function will generate six files, including four .gtf/.gff3 files(detailed below), and two .txt files for the expression counts at transcript and gene levels. |
| #' @param path the destination of the output files | ||
| #' (gtf, transcript counts, and gene counts) | ||
| #' @param prefix the prefix of the output files | ||
| #' @details The function will write the output from Bambu to files. The | ||
| #' annotations will be written to a .gtf file, transcript counts (total counts, | ||
| #' @param outputAnnFormat the file format in which to output annotations, must | ||
| #' be one of \code{"gtf"} or \code{"gff3"}. \code{"gtf"} is specified by default. | ||
| #' @details The function will write the output from Bambu to files. The | ||
| #' annotations will be written to a .gtf or .gff3 file, transcript counts (total counts, | ||
| #' CPM, full-length counts, partial-length counts, and unique counts) and gene counts | ||
| #' will be written to .txt files. | ||
| #' will be written to .txt files. |
| writeBambuOutput <- function(se, path, prefix = "", outputExtendedAnno = TRUE, | ||
| outputAll = TRUE, outputBambuModels = TRUE, outputNovelOnly = TRUE, seperateSamples = FALSE) { | ||
| if (missing(se) | missing(path)) { | ||
| stop("Both summarizedExperiment object from bambu and | ||
| outputAll = TRUE, outputBambuModels = TRUE, outputNovelOnly = TRUE, seperateSamples = FALSE, | ||
| outputAnnFormat = "gtf") { |
| if (outputAnnFormat == "gtf") { | ||
| gtf <- writeAnnotationsToGTF(annotation = transcript_grList, | ||
| file = transcript_annfn, outputExtendedAnno = outputExtendedAnno, | ||
| outputAll = outputAll, outputBambuModels = outputBambuModels, outputNovelOnly = outputNovelOnly) | ||
| } | ||
|
|
||
| if (outputAnnFormat == "gff3") { | ||
| gff3 <- writeAnnotationsToGFF3(annotation = transcript_grList, | ||
| file = transcript_annfn, outputExtendedAnno = outputExtendedAnno, | ||
| outputAll = outputAll, outputBambuModels = outputBambuModels, outputNovelOnly = outputNovelOnly) | ||
| } |
| if(outputAll){ | ||
| annotationAll = setNDR(annotation, 1) | ||
| if(length(annotationAll) == length(annotation)) | ||
| message("The current NDR threshold already outputs all transcript models. This may result in reduced precision for th extendedAnnotations and supportedTranscriptModels gtfs") |
Code reviewFound 3 issues:
Lines 338 to 341 in 67f184d
Lines 320 to 355 in 67f184d
Lines 376 to 378 in 67f184d 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
| gtf <- writeAnnotationsToGTF(annotation = transcript_grList, | ||
| file = transcript_gtffn, outputExtendedAnno = outputExtendedAnno, | ||
| outputAll = outputAll, outputBambuModels = outputBambuModels, outputNovelOnly = outputNovelOnly) | ||
| } else if (!outputAnnFormat %in% c("gtf", "gff3")) { |
There was a problem hiding this comment.
maybe should also allow for gff? and assume gff meant for gff3?
| } | ||
|
|
||
| if (outputAnnFormat == "gff3") { | ||
| gff3 <- writeAnnotationsToGFF3(annotation = transcript_grList, |
There was a problem hiding this comment.
writeAnnotationsToGFF3 and writeAnnotationsToGTF already write to files, so not need to assign to gff3, same for above, no need to assign to gtf, and both are not used in following lines as well
| gff3 <- writeAnnotationsToGFF3(annotation = transcript_grList, | |
| writeAnnotationsToGFF3(annotation = transcript_grList, |
| } | ||
|
|
||
|
|
||
| writeAnnotationsToGFF3 <- function(annotation, file, geneIDs = NULL, outputExtendedAnno = TRUE, |
There was a problem hiding this comment.
this function maybe can be combined with writeAnnotationsToGTF to one writeAnnotations function, with one more parameter to call writeToGTF or writeToGFF3 later?
There was a problem hiding this comment.
to do in the future as part of deduplicating code
|
|
||
| df <- left_join(df, geneIDs[, c("TXNAME", "GENEID")], | ||
| by = c("group_name" = "TXNAME")) | ||
|
|
There was a problem hiding this comment.
processing until this step is duplicated from writeToGTF and writeToGFF3, maybe can consider to combine 2 functions?
There was a problem hiding this comment.
to do in the future as part of deduplicating code
| dfTx$GENEID <- paste("Parent=", dfTx$GENEID, ";", sep = "") | ||
|
|
||
|
|
||
| if(!is.null(mcols(annotation)$NDR)) { |
There was a problem hiding this comment.
this if section is duplicated (appears 4 times)? maybe can modularise it as well?
There was a problem hiding this comment.
the column assignment is slightly different from the one at line 287
i.e. lines 296-302
df$NDR <- paste("NDR=", as.character(NDR), ";", sep = "")
df$txScore <- paste("maxTxScore=", as.character(txScore), ";", sep = "")
df$txScore.noFit <- paste("maxTxScore.noFit=", as.character(txScore.noFit), ";", sep = "")
df$novelGene <- paste("novelGene=", as.character(novelGene), ";", sep = "")
df$novelTranscript <- paste("novelTranscript=", as.character(novelTranscript), ";", sep = "")
df$txClassDescription <- paste("txClassDescription=", as.character(txClassDescription), ";", sep = "")
}
vs lines 338-344
dfTx$NDR <- paste("NDR=", as.character(mcols(annotation)$NDR), ";", sep = "")
dfTx$txScore <- paste("txScore=", as.character(mcols(annotation)$txScore), ";", sep = "")
dfTx$txScore.noFit <- paste("txScore.noFit=", as.character(mcols(annotation)$txScore.noFit), ";", sep = "")
dfTx$novelGene <- paste("novelGene=", as.character(mcols(annotation)$novelGene), ";", sep = "")
dfTx$novelTranscript <- paste("novelTranscript=", as.character(mcols(annotation)$novelTranscript), ";", sep = "")
dfTx$txClassDescription <- paste("txClassDescription=", as.character(mcols(annotation)$txClassDescription), ";", sep = "")
}
likewise the same block of code is also 'duplicated' in writeToGTF.
nevertheless, they are definitely very similar, can be rewritten and also modularized in the future
| # replace * with . and remove trailing ; | ||
| gff3 <- mutate(gff3, strand = recode_factor(strand, `*` = "."), | ||
| attributes = sub(";$", "", attributes)) | ||
|
|
There was a problem hiding this comment.
not sure where sorting should happen, I think maybe the step before mutate would be fine
|
Code review from Claude code: my feeling is that: issue 4 and 5 maybe also good to address, the usage of mRNA might also need to be changed, maybe can just be transcript? 1. Copy-paste "gtfs" in GFF3 message — score 75
message("The current NDR threshold already outputs all transcript models.
This may result in reduced precision for th extendedAnnotations and
supportedTranscriptModels gtfs")Two problems: (a) "gtfs" should be "gff3s" — copy-paste leftover from Fix location: 2.
|

Related Issue
Fixes # (issue number)
Type of Change
Description
Implemented functionality for writeBambuOutput to export gff3 files from SummarizedExperiment objects. Currently Bambu can export the more stringent gtf file format, but there is no functionality for gff3. An overview of the differences of the 2 file formats:
GTF differs from GFF3 mainly in the 9th column (attribute) and the syntax of the key:value pairs.
GTF:
GTF documentation (Ensembl)
GFF3:
GFF3 documentation (Ensembl)
Also, GTF attributes have a trailing
;whereas GFF3 does not.Other differences include
Implementation Details
All code changes were implemented in readWrite.R.
outputAnnFormatin writeBambuOutput: takes either"gtf"(default) or"gff3"as argumentstranscript_gtffntotranscript_annfnImpact of Changes
In addition, this output has not been updated for compatibility with importBambuResults - providing functionality for this will be associated with future implementations for the prepareAnnotations function
Checklist