Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(r): Create and modify nanoarrow_schema objects #101

Merged
merged 17 commits into from
Feb 6, 2023

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Feb 1, 2023

This PR implements "create" and "modify" for nanoarrow_schema objects. Before this PR, all type, field, and schemas originated in the Arrow package and were imported via C. This PR doesn't change that, but does implement the ability to do that for the next PR. That infrastructure is rather extensive, since it involves implementing schema$some_field <- some_object. In R the expectation is "copy on modify", so appropriately this implementation does a deep copy and then sets the field value. I checked to make sure this wouldn't be prohibitively slow and it's not...on my computer modifying a 1-million column struct takes ~0.1s. For Arrays the approach will have to be slightly different because deep copying an Array is more problematic.

An interesting problem came up when trying to make sure that some_struct$children$some_name <- na_int32() did "the right thing" (I would expect that line to update the type of some_name but not the name of the column). A consequence of that is that doing some_struct$children$some_name$name <- "a new name" has no effect, which might be confusing (you'd have to do names(some_struct$children)[1] <- "a new name" instead).

Anyway, after this PR you can create a struct ArrowSchema representing any Arrow type and modify all of the fields.

library(nanoarrow)

# create any type
schema <- na_struct(
  list(
    col1 = na_int32(),
    col2 = na_string()
  )
)

#...and modify it
schema$children$col3 <- na_timestamp(timezone = "UTC")
schema$metadata$some_key <- "some_value"

format(schema)
#> [1] "<nanoarrow_schema struct<col1: int32, col2: string, col3: timestamp('ms', 'UTC')>>"

@paleolimbot paleolimbot marked this pull request as ready for review February 6, 2023 01:45
@paleolimbot paleolimbot merged commit 060eed9 into apache:main Feb 6, 2023
@paleolimbot paleolimbot deleted the r-make-array-schema branch February 6, 2023 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant