Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spell ID creation fails on large datasets #3

Open
fiona-grimm opened this issue Jan 15, 2021 · 1 comment
Open

Spell ID creation fails on large datasets #3

fiona-grimm opened this issue Jan 15, 2021 · 1 comment
Assignees

Comments

@fiona-grimm
Copy link
Collaborator

SQL query to generate spell IDs requires too much temporary storage (?) when used on large datasets and fails with a disk I/O error

@fiona-grimm fiona-grimm self-assigned this Jan 15, 2021
@fiona-grimm
Copy link
Collaborator Author

fiona-grimm commented Jan 25, 2021

Somewhat improved by changes in commit 39c98c9.

If spell creating fails as described above, the spell creation algorithm can be split and run manually as below, in order to avoid the I/O error. Not efficient, but it works.

dbExecute(con, paste0("CREATE TABLE APC2 AS     
                       WITH APC_DERIVED AS    
                       (    
                         SELECT ", str_c(cols_to_keep, collapse = ", "), ",
                         LAG(ADMIDATE_FILLED, 1) OVER ", spell_grouping, " AS PREV_ADMIDATE_FILLED,
                         LAG(EPISTART, 1) OVER ", spell_grouping, " AS PREV_EPISTART,
                         LAG(DISMETH, 1) OVER ", spell_grouping, " AS PREV_DISMETH,
                         LAG(EPIEND, 1) OVER ", spell_grouping, " AS PREV_EPIEND,
                         ROW_NUMBER() OVER ", spell_grouping, " AS ROWNUMBER
                         FROM APC
                       ),
                       APC_NEWSPELL AS
                       (
                         SELECT ", str_c(cols_to_keep, collapse = ", "), ", ROWNUMBER, CASE
                         WHEN EPI_VALID IS 0 THEN NULL
                         WHEN ADMIDATE_FILLED = PREV_ADMIDATE_FILLED THEN 0
                         WHEN EPISTART = PREV_EPISTART THEN 0
                         WHEN PREV_DISMETH IN (8,9) AND EPISTART = PREV_EPIEND THEN 0
                         ELSE 1
                         END NEW_SPELL 
                         FROM APC_DERIVED
                       )                      
                         SELECT *, CASE
                          WHEN NEW_SPELL = 1 THEN ROWNUMBER
                          WHEN NEW_SPELL = 0 THEN 0
                          WHEN NEW_SPELL IS NULL THEN NULL
                         END SPELL_ID_TEMP
                         FROM APC_NEWSPELL"))      
                
dbRemoveTable(con, "APC")

dbExecute(con, paste0("CREATE TABLE APC3 AS 
                       SELECT ", str_c(cols_to_keep, collapse = ", "), ", ROWNUMBER, NEW_SPELL, CASE
                       WHEN NEW_SPELL = 1 THEN SPELL_ID_TEMP
                       WHEN NEW_SPELL IS NULL THEN NULL
                       WHEN NEW_SPELL = 0 THEN MAX(SPELL_ID_TEMP) OVER (PARTITION BY ENCRYPTED_HESID, 
                                                                        PROCODE3 ORDER BY ROWNUMBER ASC ROWS BETWEEN
                                                                        UNBOUNDED PRECEDING AND CURRENT ROW)
                       END SPELL_ID    
                       FROM APC2"))    
 
dbRemoveTable(con, "APC2")    

dbExecute(con, "ALTER TABLE APC3 RENAME TO APC")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant