# Records

::: {.content-hidden when-format="html"}

## Project Setup

Install and load the necessary packages

In [4]:
#| echo: false
#| output: false
import os
os.environ['R_HOME'] = f'C:/Users/{os.environ.get('USERNAME')}/Miniconda3/envs/r_python_jl/Lib/R'

In [5]:
#| echo: false
#| output: false
%load_ext rpy2.ipython
# only have to run once to allow the R magic command



::: {.panel-tabset}

#### R

In [6]:
%%capture 
%%R

library("dplyr")
library("jsonlite")
library("tidyr")
library("REDCapR")
library("knitr")
library("remotes")
library("gt")

In [7]:
%%capture --no-display --no-stdout
%%R

version <- packageVersion("REDCapR")
version

[1] '1.1.9005'


In this project, we will use the bleeding edge version of REDCapR available on Github

In [8]:
%%capture --no-display --no-stdout
%%R

# Detach REDCapR if already loaded, and download the latest version
if (version!='1.1.9005') {
    detach("package:REDCapR", unload=TRUE)
    remotes::install_github("OuhscBbmc/REDCapR")
    library("REDCapR")
    print(packageVersion("REDCapR"))
}

else {
    print("REDCapR package up to date")
}

[1] "REDCapR package up to date"


#### Python

In [9]:
import redcap
import json
import pandas as pd

:::

Assign your project URL and Token

::: {.panel-tabset}

#### R

In [10]:
%%R
path = paste0("C:/Users/", Sys.getenv("USERNAME"), '/json_api_data.json')
token <- jsonlite::fromJSON(path)$dev_token$'308'
url <- "https://dev-redcap.doh.wa.gov/api/"

#### Python

In [11]:
path_to_json = f"C:/Users/{os.environ.get('USERNAME')}/json_api_data.json"
api_key = json.load(open(path_to_json))
api_token = api_key['dev_token']['308']
api_url = api_key['dev_url']
project = redcap.Project(api_url, api_token)

:::

:::

## Exporting Raw Data

::: {.panel-tabset}

#### R

`redcap_read_oneshot()`

In [12]:
%%capture 
%%R
records <- redcap_read_oneshot(
    redcap_uri = url, 
    token = token
)$data

::: {.content-hidden when-format="html"}

In [13]:
%%R
records_tbl<- gt(head(records))
gt::gtsave(records_tbl, filename = 'export_records1.html', path = "./files/export_files/")

:::

#### Python

`export_records()`

In [14]:
records = project.export_records(format_type='df') #all records with raw data values
records.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,redcap_repeat_instrument,redcap_repeat_instance,first_name,last_name,phone_num,zip_code,dob,age,ethnicity,race,...,cc_phone,cc_email,close_contacts_complete,supervisor_name,supervisor_email,work_inperson_yesno,work_date,work_contagious,work_contagious_calc,work_information_complete
record_id,redcap_event_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1,personal_info_arm_1,,,John,Doe,(999) 999-9999,98105.0,2006-04-11,18.0,1.0,4.0,...,,,,,,,,,,
1,notifications_arm_1,,,,,,,,,,,...,,,,Boss,,0.0,,0.0,,2.0
1,case_intake_arm_1,,1.0,,,,,,,,,...,,,,,,,,,,
1,notifications_arm_1,close_contacts,1.0,,,,,,,,,...,(999) 999-9999,fake_email@gmail.com,2.0,,,,,,,
1,notifications_arm_1,close_contacts,2.0,,,,,,,,,...,(999) 999-9999,fake_email@gmail.com,2.0,,,,,,,
2,personal_info_arm_1,,,Jane,Doe,(999) 999-9999,98105.0,1994-06-29,29.0,0.0,5.0,...,,,,,,,,,,
2,notifications_arm_1,,,,,,,,,,,...,,,,Boss,fake_email@gmail.com,1.0,2023-10-10,1.0,,2.0
2,case_intake_arm_1,,1.0,,,,,,,,,...,,,,,,,,,,
2,case_intake_arm_1,,2.0,,,,,,,,,...,,,,,,,,,,
2,notifications_arm_1,close_contacts,1.0,,,,,,,,,...,(999) 999-9999,fake_email@gmail.com,2.0,,,,,,,


When `format_type = 'df'`, there is a multi-index automatically assigned including `record_id` and `redcap_event_name`. 

It is not recommended to use the index automatically asigned upon export as it is not always correct. In this case, `record_id`, `redcap_event_name`, `redcap_repeat_instrument`, and `redcap_repeat_instance` combined define the unique key for this data frame. The user should asign their own index accordingly; best practice is to add is to use the `reset_index()` method on the data export to use the row number as the index. 

Exporting as a CSV or JSON creates the index using the row number.

:::

## Exporting Labeled Data & Headers {#sec-labeled}

The `raw_or_label` parameter exports raw or labeled choice values (i.e. ‘male’ instead of ‘1’), while the `raw_or_label_headers` parameter exports raw or labeled variable names (i.e. shows the actual prompt/question instead of the raw variable name). 

::: {.panel-tabset}

#### R

In [12]:
%%capture
%%R
data_labeled <- redcap_read_oneshot(
    redcap_uri = url, 
    token = token, 
    raw_or_label = "label", 
    raw_or_label_headers = "label")$data

::: {.content-hidden when-format="html"}

In [13]:
%%R
data_labeled_tbl <- gt(head(data_labeled)) %>% cols_width(everything() ~ px(150))
gt::gtsave(data_labeled_tbl, filename = 'export_records2.html', path = "./files/export_files/")

:::

#### Python

In [32]:
data_labeled = project.export_records(raw_or_label='label', format_type='df').reset_index()
data_labeled.head(10)

Unnamed: 0,record_id,redcap_event_name,redcap_repeat_instrument,redcap_repeat_instance,first_name,last_name,phone_num,zip_code,dob,age,...,cc_phone,cc_email,close_contacts_complete,supervisor_name,supervisor_email,work_inperson_yesno,work_date,work_contagious,work_contagious_calc,work_information_complete
0,1,Personal Info,,,John,Doe,(999) 999-9999,98105.0,2006-04-11,18.0,...,,,,,,,,,,
1,1,Notifications,,,,,,,,,...,,,,Boss,,No,,No,,Complete
2,1,Case Intake,,1.0,,,,,,,...,,,,,,,,,,
3,1,Notifications,Close Contacts,1.0,,,,,,,...,(999) 999-9999,fake_email@gmail.com,Complete,,,,,,,
4,1,Notifications,Close Contacts,2.0,,,,,,,...,(999) 999-9999,fake_email@gmail.com,Complete,,,,,,,
5,2,Personal Info,,,Jane,Doe,(999) 999-9999,98105.0,1994-06-29,29.0,...,,,,,,,,,,
6,2,Notifications,,,,,,,,,...,,,,Boss,fake_email@gmail.com,Yes,2023-10-10,Yes,,Complete
7,2,Case Intake,,1.0,,,,,,,...,,,,,,,,,,
8,2,Case Intake,,2.0,,,,,,,...,,,,,,,,,,
9,2,Notifications,Close Contacts,1.0,,,,,,,...,(999) 999-9999,fake_email@gmail.com,Complete,,,,,,,


**Note:** Exporting labeled headers only works when `format_type='csv'`.

:::

## Export Data In Batches (REDCapR Only)

::: {.panel-tabset}

#### R

`redcap_read()` is almost the same as `redcap_read_oneshot()`. The only difference is that `redcap_read()` retrieves the data in quantified batches or rows, and then combines the batches to return a single data set. This function may be more appropriate than `redcap_read_oneshot()` when exporting large datasets that could tie up the server. [(Source)](https://ouhscbbmc.github.io/REDCapR/reference/redcap_read.html)


In [13]:
%%capture
%%R
batched_export <- redcap_read(
    redcap_uri = url, 
    token = token, 
    batch_size = 50L
)$data

In this example, the batch size was set to 50 records. The default is 100 records. The data exported using this method has the exact same format as the data exported using `redcap_read_oneshot`.

:::

## Exporting The Next Available Record ID {#sec-next_record}

When a project is set up in REDCap it has **auto-numbering for records** enabled by default. This allows a new and unique record_id to be automatically assigned every time you enter a new record within REDCap. Before importing new records via API, you may want to know what the next available Record ID is to ensure you are assigning new Record IDs to these new records before import (rather than overwriting an existing record). 

This function is more important if using REDCapR because PyCap has a way to auto-number records on import.

::: {.panel-tabset}

#### R

In [15]:
%%capture --no-stdout
%%R
next_record <- redcap_next_free_record_name(
    redcap_uri = url, 
    token = token, 
    verbose = TRUE,
    config_options = NULL)

next_record

[1] "7"


#### Python

In [16]:
project.generate_next_record_name()

'7'

:::

**Note:**  If Data Access Groups (DAGs) are used in the REDCap project, this method accounts for the special formatting of the record name for users in DAGs, where the unique auto-assigned DAG number is a prefix to the actual record_id (i.e. `<DAG_ID>_<record_id>`). A user assigned to a DAG with ID 1732 that already has 3 existing records will return '1732-4' as the next available record.