Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large data set: A row in the file was not the expected length. #35

Closed
vinhdizzo opened this issue Oct 7, 2015 · 26 comments
Closed

Large data set: A row in the file was not the expected length. #35

vinhdizzo opened this issue Oct 7, 2015 · 26 comments

Comments

@vinhdizzo
Copy link

I'm getting the follow error for a file with 9704906752 bytes (errors out after a long while trying to load the data). I am using RRO 3.2.2 (64 bit) with the latest haven and ReadStat from this issue.

> d1 = read_sas('combined_all.sas7bdat')
Error: Failed to parse combined_all.sas7bdat: A row in the file was not the expected length.
@vinhdizzo
Copy link
Author

Getting the same error after this fix. This data is an actual data set so it has string and text data where as the toy data set from from the previous issue consisted of only numeric data.

@evanmiller
Copy link
Contributor

If you update to the latest ReadStat code in master, it should give more information about the error (including the row #).

@vinhdizzo
Copy link
Author

ReadStat: Row #975568 decompressed to 6056 bytes (expected 6057 bytes)

ReadStat: Error parsing page 33141, bytes 2171936768-2172002303

Error: Failed to parse combined_all.sas7bdat: A row in the file was not the expected length.

@evanmiller
Copy link
Contributor

If you have other files that produce the same error, please post their error messages here (along with info about the file size).

@vinhdizzo
Copy link
Author

Here's another (9754124288 bytes):

> system.time(d1 <- read_sas('combined_final.sas7bdat'))
ReadStat: Row #1901558 decompressed to 6071 bytes (expected 6072 bytes)

ReadStat: Error parsing page 62761, bytes 4113113088-4113178623

Error: Failed to parse combined_final.sas7bdat: A row in the file was not the expected length.
Timing stopped at: 1067.47 12.98 1159.34 

@evanmiller
Copy link
Contributor

Great. Keep 'em coming if you have them.

@vinhdizzo
Copy link
Author

9746063360 bytes:

> system.time(d1 <- read_sas('combined_all2.sas7bdat'))
ReadStat: Row #1901558 decompressed to 6056 bytes (expected 6057 bytes)

ReadStat: Error parsing page 62753, bytes 4112588800-4112654335

Error: Failed to parse combined_all2.sas7bdat: A row in the file was not the expected length.
Timing stopped at: 866.59 13.26 964.92 

@vinhdizzo
Copy link
Author

2,734,039,040 bytes:

> system.time(d1 <- read_sas('xm_gain_wins_uic_all.sas7bdat')) / 60
ReadStat: Row #1689084 decompressed to 4855 bytes (expected 4856 bytes)

ReadStat: Error parsing page 29203, bytes 1913856000-1913921535

Error: Failed to parse xm_gain_wins_uic_all.sas7bdat: A row in the file was not the expected length.
Timing stopped at: 722.63 2.8 778.2 

@vinhdizzo
Copy link
Author

I was able to successfully read a file of size 13,326,688,256 bytes. This along with the previous example suggests that this error does not pertain to file size, but rather, data elements. Right? Any thoughts on what's going on here or are we lost?

@evanmiller
Copy link
Contributor

They're all off-by-one errors so there is certainly hope. The errors seem to occur near the 2/4 GB mark but that might be coincidence. I'll wait to see if we get any similar reports on smaller files (fewer columns), which will be easier to debug.

@vinhdizzo
Copy link
Author

Perhaps the 2gb and 4gb mark are not coincidental. I wanted to investigate whether the error was caused by the data elements, so decided to focus on the supposed problematic rows as suggested by the ReadStat error log by zero-ing in on them:

%let want_row=975568 ;
data libfoo1.combined_all_sub ;
    set libfoo.combined_all ;
    if _n_ >= (&want_row - 10) and _n_ <= (&want_row + 10) then output ;
run ;

%let want_row=1901558 ;
data libfoo1.combined_final_sub ;
    set libfoo.combined_final ;
    if _n_ >= (&want_row - 10) and _n_ <= (&want_row + 10) then output ;
run ;

These smaller data sets did not error out when importing into R using haven.

@evanmiller
Copy link
Contributor

That is certainly an interesting finding. The rows in question are RLE-encoded (I think SAS calls this CHARACTER compression) -- it is also possible the compression happens differently depending on the expected file size.

@vinhdizzo
Copy link
Author

Tried to replicate the errors on a smaller data set by subsetting until the problematic rows:

%let want_row=975568 ;
data libfoo1.combined_all_sub2 ;
    set libfoo.combined_all ;
    if _n_ <= (&want_row + 10) then output ;
run ;

%let want_row=1901558 ;
data libfoo1.combined_final_sub2 ;
    set libfoo.combined_final ;
    if _n_ <= (&want_row + 10) then output ;
run ;

Results: no error in haven!

It's good that you mentioned character compression, specified in SAS via OPTIONS COMPRESS=YES. I never specify this, but these files were generated by my colleague, and I was able to find 'COMPRESS' in his code! Re-did the smaller subset,

options compress=yes ;
%let want_row=975568 ;
data libfoo1.combined_all_sub ;
    set libfoo.combined_all ;
    if _n_ >= (&want_row - 10) and _n_ <= (&want_row + 10) then output ;
run ;

%let want_row=1901558 ;
data libfoo1.combined_final_sub ;
    set libfoo.combined_final ;
    if _n_ >= (&want_row - 10) and _n_ <= (&want_row + 10) then output ;
run ;

and ReadStat errors out

> library(haven)
> system.time(d1 <- read_sas('../combined_all_sub.sas7bdat')) / 60
ReadStat: Row #11 decompressed to 6056 bytes (expected 6057 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error: Failed to parse combined_all_sub.sas7bdat: A row in the file was not the expected length.
Timing stopped at: 0.01 0.02 0.04 
> system.time(d1 <- read_sas('../combined_final_sub.sas7bdat')) / 60
ReadStat: Row #11 decompressed to 6071 bytes (expected 6072 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error: Failed to parse combined_final_sub.sas7bdat: A row in the file was not the expected length.
Timing stopped at: 0.02 0 0.02 

combined_all_sub.sas7bdat is 270,336 bytes. Looks like COMPRESS is the culprit!

@evanmiller
Copy link
Contributor

If you can share the small file that's erroring out I'll try and have a look. It will especially help if you can reduce the # columns required to trigger the error -- but that might be difficult, I'm not sure.

@vinhdizzo
Copy link
Author

Is there a way to change the ReadStat code to report the column # it is currently working on for each row?

@evanmiller
Copy link
Contributor

In compressed files, each row of data is compressed separately -- ReadStat is reporting that when it attempted to decompress a row, it comes up 1 byte short. It's hard to know in advance which column the missing byte occurs in.

@vinhdizzo
Copy link
Author

I see.

The row counter from ReadStat seems to be off by 1. I couldn't get the error to reproduce when selecting the row reported by ReadStat (or that row - 1). The error was reproduced when I selected the row + 1:

> system.time(d1 <- read_sas('combined_all_sub.sas7bdat')) / 60
ReadStat: Row #0 decompressed to 6056 bytes (expected 6057 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error: Failed to parse combined_all_sub.sas7bdat: A row in the file was not the expected length.
Timing stopped at: 0.01 0 0.03 
> system.time(d1 <- read_sas('combined_final_sub.sas7bdat')) / 60
ReadStat: Row #0 decompressed to 6071 bytes (expected 6072 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error: Failed to parse combined_final_sub.sas7bdat: A row in the file was not the expected length.
Timing stopped at: 0 0 0.02 

I wrote code to generate sas data sets with 1, 2, ... columns to figure out what's going on.

options compress=yes ;
proc sql noprint;
    select count(1)
    into :ncol
    from dictionary.columns
    where libname='LIBFOO' and
    memname='COMBINED_ALL_SUB'
  ;
quit;


%macro iterate_data_columns(dsn=COMBINED_ALL_SUB, max_col=&ncol) ;
%do i=1 %to &max_col ;
proc sql noprint;
    select name
    into :keepnames separated by ' '
    from dictionary.columns
    where libname='LIBFOO' and
    memname="&dsn"
    and varnum <= &i
  ;
quit;

data LIBFOO.&dsn._&i ;
    set LIBFOO.&dsn.(keep=&keepnames) ;
run ;
%end ;
%mend ;
%iterate_data_columns(dsn=COMBINED_ALL_SUB, max_col=&ncol) ;

Here is the R log:

> for (i in 1:585) {
+   system.time(try(d1 <- read_sas(paste0('combined_all_sub_', i, '.sas7bdat')))) / 60
+ }
ReadStat: Error parsing page 0, bytes 8192-16383

Error : Failed to parse combined_all_sub_2.sas7bdat: Invalid file, or file has unsupported features.
ReadStat: Row #0 decompressed to 5221 bytes (expected 5222 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_500.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5271 bytes (expected 5272 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_501.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5321 bytes (expected 5322 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_502.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5371 bytes (expected 5372 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_503.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5379 bytes (expected 5380 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_504.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5387 bytes (expected 5388 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_505.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5395 bytes (expected 5396 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_506.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5403 bytes (expected 5404 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_507.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5411 bytes (expected 5412 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_508.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5419 bytes (expected 5420 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_509.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5427 bytes (expected 5428 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_510.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5435 bytes (expected 5436 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_511.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5439 bytes (expected 5440 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_512.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5447 bytes (expected 5448 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_513.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5449 bytes (expected 5450 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_514.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5453 bytes (expected 5454 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_515.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5457 bytes (expected 5458 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_516.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5461 bytes (expected 5462 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_517.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5469 bytes (expected 5470 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_518.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5471 bytes (expected 5472 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_519.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5479 bytes (expected 5480 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_520.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5481 bytes (expected 5482 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_521.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5489 bytes (expected 5490 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_522.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5497 bytes (expected 5498 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_523.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5505 bytes (expected 5506 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_524.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5513 bytes (expected 5514 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_525.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5521 bytes (expected 5522 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_526.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5529 bytes (expected 5530 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_527.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5558 bytes (expected 5559 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_528.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5566 bytes (expected 5567 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_529.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5574 bytes (expected 5575 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_530.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5582 bytes (expected 5583 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_531.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5590 bytes (expected 5591 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_532.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5598 bytes (expected 5599 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_533.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5606 bytes (expected 5607 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_534.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5614 bytes (expected 5615 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_535.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5622 bytes (expected 5623 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_536.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5630 bytes (expected 5631 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_537.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5638 bytes (expected 5639 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_538.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5646 bytes (expected 5647 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_539.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5654 bytes (expected 5655 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_540.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5662 bytes (expected 5663 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_541.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5670 bytes (expected 5671 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_542.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5678 bytes (expected 5679 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_543.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5686 bytes (expected 5687 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_544.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5694 bytes (expected 5695 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_545.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5702 bytes (expected 5703 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_546.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5710 bytes (expected 5711 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_547.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5718 bytes (expected 5719 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_548.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5720 bytes (expected 5721 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_549.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5750 bytes (expected 5751 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_550.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5752 bytes (expected 5753 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_551.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5832 bytes (expected 5833 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_552.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5840 bytes (expected 5841 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_553.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5842 bytes (expected 5843 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_554.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5922 bytes (expected 5923 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_555.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5930 bytes (expected 5931 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_556.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5958 bytes (expected 5959 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_557.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5966 bytes (expected 5967 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_558.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5974 bytes (expected 5975 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_559.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5976 bytes (expected 5977 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_560.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5984 bytes (expected 5985 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_561.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5986 bytes (expected 5987 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_562.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5988 bytes (expected 5989 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_563.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5990 bytes (expected 5991 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_564.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5992 bytes (expected 5993 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_565.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5994 bytes (expected 5995 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_566.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5996 bytes (expected 5997 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_567.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 5998 bytes (expected 5999 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_568.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 6000 bytes (expected 6001 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_569.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 6002 bytes (expected 6003 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_570.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 6004 bytes (expected 6005 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_571.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 6006 bytes (expected 6007 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_572.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 6008 bytes (expected 6009 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_573.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 6010 bytes (expected 6011 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_574.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 6012 bytes (expected 6013 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_575.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 6014 bytes (expected 6015 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_576.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 6016 bytes (expected 6017 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_577.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 6018 bytes (expected 6019 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_578.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 6020 bytes (expected 6021 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_579.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 6022 bytes (expected 6023 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_580.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 6024 bytes (expected 6025 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_581.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 6032 bytes (expected 6033 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_582.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 6040 bytes (expected 6041 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_583.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 6048 bytes (expected 6049 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Error : Failed to parse combined_all_sub_584.sas7bdat: A row in the file was not the expected length.
ReadStat: Row #0 decompressed to 6056 bytes (expected 6057 bytes)

ReadStat: Error parsing page 1, bytes 73728-139263

Seems like there may be 2 issues: column 2 and column 500. Column 500 gives the error that we've been seeing in this thread.

Variable 500 has the following raw data:

__________________________________________________

If I just keep this single variable, I still replicate the error:

> system.time(try(d1 <- read_sas(paste0('combined_all_sub_', 0, '.sas7bdat')))) / 60
ReadStat: Row #0 decompressed to 49 bytes (expected 50 bytes)

ReadStat: Error parsing page 0, bytes 8192-16383

Error : Failed to parse combined_all_sub_0.sas7bdat: A row in the file was not the expected length.
   user  system elapsed 
      0       0       0 

@evanmiller
Copy link
Contributor

Nice detective work -- I should have mentioned that ReadStat indexes the rows starting at 0, which is why you see the off-by-one problem.

Can you attach the smallest file sas7bdat which demonstrates the problem? I'll open it up and have a look at it.

@vinhdizzo
Copy link
Author

Sent it to your email already. Let me know if it got lost.

@evanmiller
Copy link
Contributor

Got it, thanks.

@vinhdizzo
Copy link
Author

As for the variable 2 error (which we did not see before), I looked at the data and it seems to be because the 2 variables had identical data. Played with it and got it to replicate when var 1 has 7 char followed by 7 digits. Var 2 has the same value. I'll also send you this problematic data set.

@evanmiller
Copy link
Contributor

Thanks.

@evanmiller
Copy link
Contributor

Try this: readstat_sas.c

@evanmiller
Copy link
Contributor

The above patch should fix the "Column 500" issue. Will look into the "Column 2" issue next.

@vinhdizzo
Copy link
Author

Confirming this fixed column 500 issue and not column 2. Re-reading the full 9GB data file to confirm, but assume it works if I don't report back.

@evanmiller
Copy link
Contributor

I'll open a separate issue for the "Column 2" issue.

Closing this one, please re-open if you see another "Row was not the expected length" error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants