Skip to content

Make SQLDataFrame a concrete subclass of virtual class DataFrame #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 24, 2021
Merged

Conversation

hpages
Copy link
Contributor

@hpages hpages commented Nov 16, 2021

See issue #4 for a discussion of this change.

Starting with S4Vectors 0.33.3, DataFrame is a virtual class with no listData, nrows, or rownames slot, making it possible for SQLDataFrame to become a concrete subclass of DataFrame.

The ultimate goal is to be able to use an SQLDataFrame object in any place where a DataFrame derivative is expected. For example inside a Vector derivative to store the metadata columns, or inside a SummarizedExperiment derivative to store the colData. With SQLDataFrame 1.9.1, this is now possible but with a big gotcha:

library(SQLDataFrame)
library(DBI)
dbfile <- system.file("extdata/test.db", package = "SQLDataFrame")
conn <- DBI::dbConnect(DBI::dbDriver("SQLite"), dbname = dbfile)
obj <- SQLDataFrame(conn = conn, dbtable = "state", dbkey = "state")
dim(obj)
# [1] 50  4
is(obj, "DataFrame")
# [1] TRUE

library(GenomicRanges)
gr <- GRanges("chr1", IRanges(1:50, 60))
mcols(gr) <- obj
gr
# GRanges object with 50 ranges and 4 metadata columns:
#        seqnames    ranges strand |           division        region population        size
#           <Rle> <IRanges>  <Rle> |        <character>   <character>  <numeric> <character>
#    [1]     chr1      1-60      * | East South Central         South       3615      medium
#    [2]     chr1      2-60      * |            Pacific          West        365       small
#    [3]     chr1      3-60      * |           Mountain          West       2280      medium
#    [4]     chr1      4-60      * | West South Central         South       2110      medium
#    [5]     chr1      5-60      * |            Pacific          West      21198       large
#    ...      ...       ...    ... .                ...           ...        ...         ...
#   [46]     chr1     46-60      * |     South Atlantic         South       4981      medium
#   [47]     chr1     47-60      * |            Pacific          West       3559      medium
#   [48]     chr1     48-60      * |     South Atlantic         South       1799      medium
#   [49]     chr1     49-60      * | East North Central North Central       4589      medium
#   [50]     chr1     50-60      * |           Mountain          West        376       small
#   -------
#   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Looks good... BUT:

mcols(gr)
# Error in `dimnames<-`(`*tmp*`, value = dn) : 
#   invalid to use dimnames()<- on an S4 object of class 'SQLDataFrame'

Basically, this fails because SQLDataFrame objects don't support rownames<-:

rownames(obj) <- NULL
# Error in `dimnames<-`(`*tmp*`, value = dn) : 
#   invalid to use dimnames()<- on an S4 object of class 'SQLDataFrame'

Supporting rownames<- would go a long way in making SQLDataFrame objects drop-in replacements for DFrame objects.

H.

@Liubuntu Liubuntu merged commit da7fdcb into Bioconductor:master Nov 24, 2021
@Liubuntu
Copy link
Collaborator

HI @hpages Thanks for all the changes! I'll go through the package to refresh my memories, and see if anything else (except for the rownames) needed to be added (I've had a reimplement branch earlier so maybe adding more changes based on that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants