New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: edits to copy and modify vignettes #1098
Changes from 12 commits
ec23e1d
eabc637
7b5d39c
fb6ff5d
3745dc8
8b28a8b
0836644
3795173
d05c881
63c119d
fa926d7
cb8ddd0
2115a4a
61a7a3b
21737fd
1356f9d
e078883
e6cbe2c
8002f0e
07ab1d2
ce6c781
49ea232
3f53afa
2826379
afb14e1
b3692c5
a5c6b83
3ee6a83
889cdcb
991ce20
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
title: "Insert, update or remove rows in a database dm" | ||
title: "Insert, update, or remove rows in a database" | ||
date: "`r Sys.Date()`" | ||
author: James Wondrasek | ||
output: rmarkdown::html_vignette | ||
|
@@ -19,25 +19,25 @@ source("setup/setup.R") | |
## Introduction {#intro} | ||
|
||
This tutorial introduces the methods {dm} provides for modifying the data in the tables of a relational model. | ||
There are 6 methods: | ||
There are 5 methods: | ||
|
||
* [`dm_rows_insert()`](#insert) - adds new rows | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should all instances of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have both, but There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, in that case, I think we should only mention The fewer functions the user needs to learn about the better, I think, especially if two functions are doing the same thing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They are different, though -- |
||
* [`dm_rows_update()`](#update) - changes values in rows | ||
* [`dm_rows_patch()`](#patch) - fills in missing values | ||
* [`dm_rows_upsert()`](#upsert) - adds new rows or changes values if pre-existing | ||
* [`dm_rows_delete()`](#delete) - deletes rows | ||
* [`dm_rows_truncate()`](#truncate) - removes all rows, leaving table structure intact | ||
|
||
## The dm_rows_* process | ||
|
||
All six methods take the same arguments and using them follows the same process: | ||
|
||
1. Create a temporary *changeset dm* that defines the intended changes on the RDBMS | ||
1. Create a temporary *changeset dm* object that defines the intended changes on the RDBMS <!-- TODO The technical term "changeset" needs to be explained here --> | ||
maelle marked this conversation as resolved.
Show resolved
Hide resolved
|
||
1. If desired, simulate changes with `in_place = FALSE` to double-check | ||
1. Apply changes with `in_place = TRUE`. | ||
|
||
To start, a dm object is created containing the tables, and rows, that you want to change. | ||
This changeset dm is then copied into the same source as the dm you want to modify. | ||
|
||
To start, a `dm` object is created containing the tables and rows that you want to change. | ||
This changeset `dm` is then copied into the same source as the dm you want to modify. | ||
With the dm in the same RDBMS as the destination dm, you call the appropriate method, such as `dm_rows_insert()`, to make your planned changes, along with an argument of `in_place = FALSE` so you can confirm you achieve the changes that you want. | ||
|
||
This verification can be done visually, looking at row counts and the like, or using {dm}'s constraint checking method, `dm_examine_constraints()`. | ||
|
@@ -48,26 +48,26 @@ With the changes confirmed, you execute the method again, this time with the arg | |
Note that `in_place = FALSE` is the default: you must opt in to actually change data on the database. | ||
|
||
Each method has its own requirements in order to maintain database consistency. | ||
These involve constraints on primary key values as they are how rows are identified. | ||
These involve constraints on primary key values that uniquely identify rows. | ||
|
||
| Method | Requirements | | ||
|--------|--------------| | ||
maelle marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| `dm_rows_insert()` | The primary keys must differ from existing records.| | ||
| `dm_rows_insert()` | Primary keys must be present for all tables.| | ||
| `dm_rows_append()` | Primary keys must differ from existing records.| | ||
IndrajeetPatil marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| `dm_rows_update()` | Primary keys must match for all records to be updated.| | ||
| `dm_rows_patch()` | Updates missing values in existing records. Primary keys must match for all records to be patched.| | ||
| `dm_rows_upsert()` | Updates existing records and adds new records, based on the primary key.| | ||
| `dm_rows_delete()` | Removes matching records based on the primary key.| | ||
| `dm_rows_truncate()` | Removes all records, only for tables in the changeset dm.| | ||
| `dm_rows_delete()` | Removes matching records based on the primary key. Primary keys must match for all records to be deleted.| | ||
|
||
To ensure the integrity of all relations during the process, all methods automatically determine the correct processing order for the tables involved. | ||
For operations that create records, parent tables are processed before child tables. | ||
For `dm_rows_delete()` and `dm_rows_truncate()`, child tables are processed before their parent tables. | ||
For operations that create records, parent tables (which hold primary keys) are processed before child tables (which hold foreign keys). | ||
For `dm_rows_delete()`, child tables are processed before their parent tables. | ||
For more details on this see `vignette("howto-dm-theory")` and `vignette("howto-dm-db")`. | ||
|
||
IndrajeetPatil marked this conversation as resolved.
Show resolved
Hide resolved
|
||
## Usage {#usage} | ||
|
||
To demonstrate the use of these table modifying methods we will create a simple dm object with two tables linked by a foreign key. | ||
Note the foreign key of `NA` in the `child` table. | ||
To demonstrate the use of these table modifying methods, we will create a simple `dm` object with two tables linked by a foreign key. | ||
Note that the `child` table has a foreign key missing (`NA`). | ||
|
||
``````{r } | ||
library(tidyverse) | ||
|
@@ -88,9 +88,9 @@ demo_dm %>% | |
|
||
{dm} doesn't check your key values when you create a dm, we add this check:[^null-fk] | ||
|
||
[^null-fk]: Be aware that when using `dm_examine_constraints()` NULL (`NA`) foreign keys are allowed and will be counted as a match. | ||
[^null-fk]: Be aware that when using `dm_examine_constraints()`, missing (denoted by `NULL` in SQL, while `NA` in R) foreign keys are allowed and will be counted as a match. | ||
In some cases this doesn't make sense and non-NULL columns should be enforced by the RDBMS. | ||
Currently {dm} does not specify or check non-NULL constraints for columns. | ||
Currently, {dm} does not specify or check non-NULL constraints for columns. | ||
|
||
``````{r } | ||
dm_examine_constraints(demo_dm) | ||
|
@@ -109,12 +109,12 @@ demo_sql | |
`````` | ||
|
||
{dm}'s table modification methods can be piped together to create a repeatable sequence of operations that returns a dm incorporating all the changes required. | ||
This is a common use case for {dm} -- building by hand a sequence of operations using temporary results until it is complete and correct, then committing the result. | ||
This is a common use case for {dm} -- manually building a sequence of operations using temporary results until it is complete and correct, and then committing the result. | ||
|
||
## `dm_rows_insert()` {#insert} | ||
|
||
To demonstrate `dm_rows_insert()` we create a dm with tables containing the rows to insert and copy it to `sqlite_db`, the same source as `demo_sql`. | ||
For all of the `dm_rows_*` methods the source and destination dm objects must be in the same RDBMS. | ||
To demonstrate `dm_rows_insert()`, we create a dm with tables containing the rows to insert and copy it to `sqlite_db`, the same source as `demo_sql`. | ||
For all of the `dm_rows_*` methods, the source and destination `dm` objects must be in the same RDBMS. | ||
You will get an error message if this is not the case. | ||
|
||
The code below adds `parent` and `child` table entries for the letter "D". | ||
|
@@ -138,7 +138,7 @@ dm_insert_out <- | |
dm_rows_insert(dm_insert_in) | ||
`````` | ||
|
||
This gives us a warning that changes will not be persisted. | ||
This gives us a warning that changes will not persist (i.e., they are temporary). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this still true? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like it: library(DBI)
library(dm)
library(tidyverse)
parent <- tibble(value = c("A", "B", "C"), pk = 1:3)
child <- tibble(value = c("a", "b", "c"), pk = 1:3, fk = c(1, 1, NA))
demo_dm <-
dm(parent = parent, child = child) %>%
dm_add_pk(parent, pk) %>%
dm_add_pk(child, pk) %>%
dm_add_fk(child, fk, parent)
sqlite_db <- dbConnect(RSQLite::SQLite())
demo_sql <- copy_dm_to(sqlite_db, demo_dm, temporary = FALSE)
new_parent <- tibble(value = "D", pk = 4)
new_child <- tibble(value = "d", pk = 4, fk = 4)
dm_insert_in <-
dm(parent = new_parent, child = new_child) %>%
copy_dm_to(sqlite_db, ., temporary = TRUE)
dm_insert_out <-
demo_sql %>%
dm_rows_insert(dm_insert_in)
#> Not persisting, use `in_place = FALSE` to turn off this message.
dbDisconnect(sqlite_db) Created on 2022-06-19 by the reprex package (v2.0.1.9000) |
||
Inspecting the `child` table of the resulting `dm_insert_out` and `demo_sql`, we can see that's exactly what happened. | ||
{dm} returned to us a dm object with our inserted rows in place, but the underlying database has not changed. | ||
|
||
|
@@ -181,8 +181,8 @@ demo_sql$child | |
## `dm_rows_delete()` {#delete} | ||
|
||
`dm_rows_delete()` is not currently implemented to work with an RDBMS, so we will shift our demonstrations back to the local R environment. | ||
We've made changes to `demo_sql` so we use `collect()` to copy the current tables out of SQLite. | ||
Note that persistence is not a concern with local dm objects. | ||
We've made changes to `demo_sql`, so we use `collect()` to copy the current tables out of SQLite. | ||
Note that persistence is not a concern for *local* `dm` objects. | ||
Every operation returns a new dm object containing the changes made. | ||
|
||
``````{r } | ||
|
@@ -230,27 +230,6 @@ dm_upserted$parent | |
dm_upserted$child | ||
`````` | ||
|
||
## `dm_rows_truncate()` {#truncate} | ||
|
||
`dm_rows_truncate()` deletes all the rows in a table while leaving all other related information intact, including column names, column types, and key relations. | ||
The function derives its name from the SQL `TRUNCATE TABLE` statement, so we will return to our SQLite database to demonstrate its use. | ||
The example below truncates only the `child` table. | ||
Note how a modified version of the destination dm is used as "changeset dm": the rows in the changeset dm do not matter here. | ||
|
||
``````{r } | ||
dm_trunc_in <- | ||
demo_sql %>% | ||
dm_select_tbl(child) | ||
dm_trunc_in | ||
dm_trunc_out <- | ||
demo_sql %>% | ||
dm_rows_truncate(dm_trunc_in, in_place = TRUE) | ||
|
||
demo_sql$child | ||
`````` | ||
|
||
|
||
|
||
When done, do not forget to disconnect: | ||
|
||
``````{r disconnect} | ||
|
@@ -260,7 +239,7 @@ DBI::dbDisconnect(sqlite_db) | |
## Conclusion {#conclusion} | ||
|
||
The `dm_rows_*` methods give you row-level granularity over the modifications you need to make to your relational model. | ||
By using the `in_place` argument they all share you can construct and verify your modifications before committing them. | ||
Using the common `in_place` argument, they all can construct and verify your modifications before committing them. | ||
There are a few limitations, as mentioned in the tutorial, but these will be addressed in future updates to {dm}. | ||
|
||
## Next Steps | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfeasible is gaining traction, but infeasible doesn't seem to be wrong: https://books.google.com/ngrams/graph?content=infeasible%2Cunfeasible&year_start=1800&year_end=2019&corpus=26&smoothing=3&direct_url=t1%3B%2Cinfeasible%3B%2Cc0%3B.t1%3B%2Cunfeasible%3B%2Cc0
Is this a UK/US thing?