add length check <=200 bytes #194

kaz462 · 2023-11-28T00:12:34Z

Thank you for your Pull Request!

We have developed a Pull Request template to aid you and our reviewers. Completing the below tasks helps to ensure our reviewers can maximize their time on your code as well as making sure the xportr codebase remains robust and consistent.

The scope of `{xportr}`

{xportr}'s scope is to enable R users to write out submission compliant xpt files that can be delivered to a Health Authority or to downstream validation software programs. We see labels, lengths, types, ordering and formats from a dataset specification object (SDTM and ADaM) as being our primary focus. We also see messaging and warnings to users around applying information from the specification file as a primary focus. Please make sure your Pull Request meets this scope of {xportr}. If your Pull Request moves beyond this scope, please get in touch with the {xportr} team on slack or create an issue to discuss.

Please check off each task box as an acknowledgment that you completed the task. This checklist is part of the Github Action workflows and the Pull Request will not be merged into the devel branch until you have checked off each task.

Changes Description

(descriptions of changes)

Task List

codecov · 2023-11-28T00:30:01Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (b09805e) 100.00% compared to head (6a74578) 100.00%.
Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##              main      #194   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           12        12           
  Lines          588       595    +7     
=========================================
+ Hits           588       595    +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bms63

Looks good to me!! Thanks for doing this one!!

bms63 · 2023-11-29T16:57:48Z

For #189 are you going to implement that in here or in another PR? I see it tagged on the side

kaz462 · 2023-11-29T17:24:52Z

@bms63 thanks for the review!

For #189 are you going to implement that in here or in another PR? I see it tagged on the side

we used nchar(type = chars) for labels and nchar(type = bytes) for length as suggested by @siye6. I think users don't need to choose between nchar(type = bytes) and nchar(type = chars). @siye6 @bms63 do you agree?

from SAS help doc:

Variable names can be up to 8 characters, and they are stored in their original case (upper or lower).
Character variables can have lengths up to 200 bytes.
Variable labels can be up to 40 characters.

cpiraux · 2023-11-29T20:46:24Z

For #189 are you going to implement that in here or in another PR? I see it tagged on the side

I see that the issue #91 is also tagged but there are other elements to consider regarding the length in this issue. We could have another PR for it or do it in this one; I am not sure of the best approach, but I prefer the issue not to be closed until the following points are handled:
There are two types of length:

Metadata length
Data length (maximum value length)

What to do in case these lengths are different:

Data length > metadata length: This will cause the truncation of the data. This case is handled by Haven. The length is the one from the data, and a message is given.
Data length < metadata length: In that case, to reduce the file size, the FDA requests trimming the variable length. In this case, we could give a message so the user can update the metadata length.
Metadata length is missing: For character variables, should we assign the data length or 200? In order to trim the variable, I suggest using the data length. Per define.xml specification, the --DTC variables should have the length missing in define.xml, so this scenario is probable.

Note:
In theory, the metadata length corresponds to the planned length and might be different from the data length. In practice, the metadata and the data length are usually similar.

The FDA requests trimming the variable across datasets. For example, if AVISIT data length = 40 in ADLB and AVISIT length = 30 in ADVS, the AVISIT length should be 40 for all datasets. In this case, the metadata length will be different from the data length. I don’t think we can check this scenario, but maybe we could mention this point in the documentation.

cpiraux

Thanks for the implementation; it looks great. I've just added a small comment.
Since the 200-bytes limitation is a technical constraint, I also believe that users don't need to choose between nchar (type = bytes) and nchar (type = chars).

R/utils-xportr.R

bms63 · 2023-12-03T15:20:23Z

Shall we mention the byte/character issue in the documentaion for xportr_length() would that suffice?

siye6 · 2023-12-05T03:05:19Z

@bms63 thanks for the review!

For #189 are you going to implement that in here or in another PR? I see it tagged on the side

we used nchar(type = chars) for labels and nchar(type = bytes) for length as suggested by @siye6. I think users don't need to choose between nchar(type = bytes) and nchar(type = chars). @siye6 @bms63 do you agree?

from SAS help doc:

Variable names can be up to 8 characters, and they are stored in their original case (upper or lower).
Character variables can have lengths up to 200 bytes.
Variable labels can be up to 40 characters.

Hi @kaz462 , thanks for the updates and sorry for late reply.
Regarding label, may also need to consider bytes and warning user if it is > 40, see below my test with Chinese Label

label <- "这是一段文字，用来测试在XPTversion5中作为数据集label是否会被截断"

label_len_char <- nchar(label, type = "chars")
label_len_char
label_len_bytes <- nchar(label, type = "bytes")
label_len_bytes

df <- data.frame(test = c(1, 2, 3))

haven::write_xpt(
  data = df,
  path = "test.xpt",
  version = 5,
  label = label
)

> label_len_char
[1] 40
> label_len_bytes
[1] 88

/* Use SAS convert XPT to SAS and show the label */

libname test xport "~/test.xpt";
libname sas "~";

proc copy in=test out=sas memtype=data;
	select test;
run;

proc contents data=sas.test;
run;

Label	这是一段文字，用来测试在XPTv
Encoding	utf-8 Unicode (UTF-8)

> nchar("这是一段文字，用来测试在XPTv", type="bytes")
[1] 40

kaz462 · 2023-12-07T02:59:35Z

Regarding label, may also need to consider bytes and warning user if it is > 40, see below my test with Chinese Label

Thanks @siye6 for providing a detailed example :) That's a good finding for haven::write_xpt - the label got truncated without any messaging, it seems haven used max 40 bytes for labels but checked against nchar(label) > 40.

For xportr, the following error message will be displayed if we use xportr_write. Do you think this message will be sufficient?

xportr_write(
    df, 
    path = "test.xpt",
    label = "这是一段文字，用来测试在XPTversion5中作为数据集label是否会被截断"
)
Error in `xportr_write()`:
! `label` cannot contain any non-ASCII, symbol or special characters.
Run `rlang::last_trace()` to see where the error occurred.

tests/testthat/test-utils-xportr.R

# Conflicts: # NEWS.md

R/utils-xportr.R

siye6 · 2023-12-08T13:03:31Z

non-ASCII

Hi @kaz462 , thanks for update the warning message, regarding the non-ascii I suggest we should allow utf8 characters (including Chinese, Japanese ...etc), For eSubmission in China, one of the request is translate the foreign language data package (e.g. English) to Chinese. (variable label, dataset label, meddra, whodrug terms, primary end point related code list ..etc need to be translated from English to Chinese).

Co-authored-by: André Veríssimo <211358+averissimo@users.noreply.github.com>

kaz462 · 2023-12-14T05:31:22Z

Hi @kaz462 , thanks for update the warning message, regarding the non-ascii I suggest we should allow utf8 characters (including Chinese, Japanese ...etc), For eSubmission in China, one of the request is translate the foreign language data package (e.g. English) to Chinese. (variable label, dataset label, meddra, whodrug terms, primary end point related code list ..etc need to be translated from English to Chinese).

@siye6 Thanks for the further information! I've created two issues below -

R/utils-xportr.R

Co-authored-by: André Veríssimo <211358+averissimo@users.noreply.github.com>

bms63 · 2023-12-15T16:35:58Z

@kaz462, @cpiraux and I met and think this is ready to merge!! yay!

kaz462 added 2 commits November 28, 2023 00:11

add length check <=200 bytes

f490414

update length check

bf1fc3e

This was linked to issues Nov 28, 2023

Feature Request: add arg to allow user choose nchar (type = 'bytes') #189

Closed

Allotted length: maximum length of the variable #91

Closed

kaz462 requested review from elimillera, bms63 and cpiraux November 28, 2023 23:13

bms63 approved these changes Nov 29, 2023

View reviewed changes

cpiraux reviewed Dec 1, 2023

View reviewed changes

R/utils-xportr.R Outdated Show resolved Hide resolved

elimillera reviewed Dec 7, 2023

View reviewed changes

tests/testthat/test-utils-xportr.R Show resolved Hide resolved

kaz462 added 2 commits December 7, 2023 21:22

Merge branch 'main' into 189_91_max_var_length

a355449

# Conflicts: # NEWS.md

add a test for non-acsii character >200 bytes, <200 characters

2851f70

kaz462 mentioned this pull request Dec 7, 2023

Feature Request: Add check for length to check for lengths that are > 200 #147

Closed

kaz462 requested review from elimillera, cpiraux and bms63 December 7, 2023 21:59

averissimo reviewed Dec 8, 2023

View reviewed changes

R/utils-xportr.R Outdated Show resolved Hide resolved

kaz462 mentioned this pull request Dec 14, 2023

Dataset label truncated after write_xpt tidyverse/haven#746

Open

Update R/utils-xportr.R

90a6914

Co-authored-by: André Veríssimo <211358+averissimo@users.noreply.github.com>

kaz462 mentioned this pull request Dec 14, 2023

allow utf8 characters #202

Open

kaz462 requested a review from averissimo December 14, 2023 05:33

averissimo reviewed Dec 14, 2023

View reviewed changes

R/utils-xportr.R Outdated Show resolved Hide resolved

kaz462 and others added 2 commits December 14, 2023 12:56

Update R/utils-xportr.R

5ff189a

Co-authored-by: André Veríssimo <211358+averissimo@users.noreply.github.com>

add across/where to import

6a74578

bms63 merged commit c84f815 into main Dec 15, 2023
12 of 13 checks passed

bms63 deleted the 189_91_max_var_length branch December 15, 2023 16:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add length check <=200 bytes #194

add length check <=200 bytes #194

kaz462 commented Nov 28, 2023

codecov bot commented Nov 28, 2023 •

edited

Loading

bms63 left a comment

bms63 commented Nov 29, 2023

kaz462 commented Nov 29, 2023

cpiraux commented Nov 29, 2023

cpiraux left a comment

bms63 commented Dec 3, 2023

siye6 commented Dec 5, 2023

kaz462 commented Dec 7, 2023

siye6 commented Dec 8, 2023

kaz462 commented Dec 14, 2023

bms63 commented Dec 15, 2023

add length check <=200 bytes #194

add length check <=200 bytes #194

Conversation

kaz462 commented Nov 28, 2023

Thank you for your Pull Request!

The scope of {xportr}

Changes Description

Task List

codecov bot commented Nov 28, 2023 • edited Loading

Codecov Report

bms63 left a comment

Choose a reason for hiding this comment

bms63 commented Nov 29, 2023

kaz462 commented Nov 29, 2023

cpiraux commented Nov 29, 2023

cpiraux left a comment

Choose a reason for hiding this comment

bms63 commented Dec 3, 2023

siye6 commented Dec 5, 2023

kaz462 commented Dec 7, 2023

siye6 commented Dec 8, 2023

kaz462 commented Dec 14, 2023

bms63 commented Dec 15, 2023

The scope of `{xportr}`

codecov bot commented Nov 28, 2023 •

edited

Loading