Skip to content

Commit

Permalink
fread - progress meter and slowdown fixed when nThread=1, closes #2092
Browse files Browse the repository at this point in the history
  • Loading branch information
mattdowle committed Apr 19, 2017
1 parent a1d8f8b commit 88b53ae
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 3 deletions.
2 changes: 1 addition & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* Numeric data that has been quoted is now detected and read as numeric.
* The ability to position `autostart` anywhere inside one of multiple tables in a single file is removed with warning. It used to search upwards from that line to find the start of the table based on a consistent number of columns. People appear to be using `skip="string"` or `skip=nrow` to find the header row exactly, which is retained and simpler. It was too difficult to retain search-upwards-autostart together with skipping blank lines, filling incomplete rows and parallelization. Varying format and height messy header info above the column names is still auto detected and auto skipped.
* `dec=','` is now implemented directly so there is no dependency on locale. The options `datatable.fread.dec.experiment` and `datatable.fread.dec.locale` have been removed.
* Many thanks to @yaakovfeldman, Guillermo Ponce, Arun Srinivasan, Hugh Parsonage, Mark Klik, Pasha Stetsenko and more to add for testing before release to CRAN: [#2070](https://github.com/Rdatatable/data.table/issues/2070), [#2073](https://github.com/Rdatatable/data.table/issues/2073), [#2087](https://github.com/Rdatatable/data.table/issues/2087), [#2091](https://github.com/Rdatatable/data.table/issues/2091), [#2107](https://github.com/Rdatatable/data.table/issues/2107), [fst#50](https://github.com/fstpackage/fst/issues/50#issuecomment-294287846), [#2118](https://github.com/Rdatatable/data.table/issues/2118)
* Many thanks to @yaakovfeldman, Guillermo Ponce, Arun Srinivasan, Hugh Parsonage, Mark Klik, Pasha Stetsenko and more to add for testing before release to CRAN: [#2070](https://github.com/Rdatatable/data.table/issues/2070), [#2073](https://github.com/Rdatatable/data.table/issues/2073), [#2087](https://github.com/Rdatatable/data.table/issues/2087), [#2091](https://github.com/Rdatatable/data.table/issues/2091), [#2107](https://github.com/Rdatatable/data.table/issues/2107), [fst#50](https://github.com/fstpackage/fst/issues/50#issuecomment-294287846), [#2118](https://github.com/Rdatatable/data.table/issues/2118), [#2092](https://github.com/Rdatatable/data.table/issues/2092)
* Now detects GB-18030 and UTF-16 encodings and in verbose mode prints a message about BOM detection.

#### BUG FIXES
Expand Down
6 changes: 4 additions & 2 deletions src/fread.c
Original file line number Diff line number Diff line change
Expand Up @@ -1101,8 +1101,10 @@ int freadMain(freadMainArgs args) {
// For the 44GB file with 12875 columns, the max line len is 108,497. As each column has its own buffer per thread,
// that buffer allocation should be at least one page (4k). Hence 1000 rows of the smallest type (4 byte int) is just
// under 4096 to leave space for R's header + malloc's header. Around 50MB of buffer in this extreme case.
if (nJumps/*from sampling*/>1 && args.nth>1) {
if (nJumps/*from sampling*/>1) {
// ensure data size is split into same sized chunks (no remainder in last chunk) and a multiple of nth
// when nth==1 we still split by chunk and go via buffers for consistency (testing) and code sanity, even though
// a single thread could write directly to the final DT and skip buffers.
nJumps = (int)((size_t)(lastRowEnd-pos)/chunkBytes); // (int) rounds down
if (nJumps==0) nJumps=1;
else if (nJumps>args.nth) nJumps = args.nth*(1+(nJumps-1)/args.nth);
Expand Down Expand Up @@ -1191,7 +1193,7 @@ int freadMain(freadMainArgs args) {
for (int j=0, resj=-1; !stopTeam && j<ncol; j++) {
if (type[j] == CT_DROP) continue;
resj++;
if (type[j] < 0) continue;
if (type[j] < 0) continue; // this out-of-sample type exception column will be alloc'd new at the end for reread
size_t size = typeSize[type[j]];
if (!(mybuff[resj] = (void *)realloc(mybuff[resj], myBuffRows * size))) stopTeam=true;
}
Expand Down

0 comments on commit 88b53ae

Please sign in to comment.