Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow larger regions in r.terraflow #31

Closed
wants to merge 4 commits into from

Conversation

ninsbl
Copy link
Member

@ninsbl ninsbl commented Jun 12, 2019

see https://trac.osgeo.org/grass/ticket/1421
provided by dnewcomb

Tested with 40k x 50k region (though not fully on my limited box).

See e.g. http://osgeo-org.1560.x6.nabble.com/r-terraflow-on-massive-DEM-td5181062.html

@ninsbl ninsbl added the enhancement New feature or request label Jun 12, 2019
@ninsbl ninsbl requested review from glynnc and metzm June 12, 2019 19:53
@wenzeslaus
Copy link
Member

This is exactly the change where a test of computation correctness based on previous results would be appropriate and more than welcome. Would you or somebody else be willing to add one?

https://grass.osgeo.org/grass76/manuals/libpython/gunittest_testing.html

@ninsbl
Copy link
Member Author

ninsbl commented Jun 13, 2019

This is exactly the change where a test of computation correctness based on previous results would be appropriate and more than welcome. Would you or somebody else be willing to add one?

Very reasonable.

ToDo:

  • add a test with NC data.

A test of computational correctness would of course only check if the change affects behavior within current limits.
Running a test for the purpose of the introduced change would take several hours though. Would it in that case be sufficient to test if the process starts without issue (not waiting for a result)?
Something along the lines: "if no error after x seconds assume success and kill process" (something like https://stackoverflow.com/questions/1359383/run-a-process-and-kill-it-if-it-doesnt-end-within-one-hour).
In any case, r.terraflow would only start if the input DEM is cleanly aligned with the computational region. Thus a test of the new feature would require to write a artificial DEM with > 32k x 32k pixels to disk...

@neteler
Copy link
Member

neteler commented Jun 13, 2019

(At least) two options for a synthetic DEM:

@@ -123,7 +123,7 @@ scan3line(FUN &funobj,
a[2]=*tmp;
} else {
a[2] = nodata;
assert(ae == AMI_ERROR_END_OF_STREAM);
assert((off_t)ae == AMI_ERROR_END_OF_STREAM);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why (off_t)ae? ae and AMI_ERROR_END_OF_STREAM are of the same type AMI_err. Applies also to L140

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I just picked it up / copied it from https://trac.osgeo.org/grass/ticket/1421
cause I am no C/C++ programmer.
What should be done?
I can test if this change is not necessary....

Copy link
Member Author

@ninsbl ninsbl Jun 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing currently on a bigger box with 65kx50k pixels. Without (off_t) I get:
r.terraflow: 3scan.h:140: void scan3line(FUN&, AMI_STREAM<T>*, AMI_STREAM<T>*, AMI_STREAM<T>*, BASETYPE, dimension_type) [with T = float; BASETYPE = float; FUN = detectEdgeNodata; dimension_type = int]: Assertion ae == AMI_ERROR_END_OF_STREAM' failed.`
Changing back to with (off_t) did not help, unfortunately. So, I also changed back from int to long (original patch in 1421). But now I get:

Header of raster map <DTM_1M_full> compatible with region header
Elevation stored as FLOAT (4B)
Region size is 68385 x 56176
STREAM temporary files in </home/shared/tmp>. THESE INTERMEDIATE STREAMS
WILL NOT BE DELETED IN CASE OF ABNORMAL TERMINATION OF THE PROGRAM. TO SAVE
SPACE PLEASE DELETE THESE FILES MANUALLY!
Memory manager registering memory in MM_WARN_ON_MEMORY_EXCEEDED mode.
Reading data from <DTM_1M_full> to stream </home/shared/tmp/STREAM_gaaaaX>
Reading input data...
 100%
EMPQUEUEADAPTIVE: starting in-memory pqueue
EMPQUEUEADAPTIVE: available memory: 100000MB
EMPQUEUEADAPTIVE: desired memory: 100000MB
sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 104857600000.
EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB
EMPQUEUEADAPTIVE: pqsize set to 13106116688
EMPQUEUEADAPTIVE: starting in-memory pqueue
EMPQUEUEADAPTIVE: available memory: 100000MB
EMPQUEUEADAPTIVE: desired memory: 100000MB
sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 104857600000.
EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB
EMPQUEUEADAPTIVE: pqsize set to 13106116688
EMPQUEUEADAPTIVE: starting in-memory pqueue
EMPQUEUEADAPTIVE: available memory: 100000MB
EMPQUEUEADAPTIVE: desired memory: 100000MB
sz_stream: 270400 buf_arity: 200 mm_overhead: 8666496 mm_avail: 104857600000.
EMPQUEUEADAPTIVE: memory overhead set to 8.26501MB
EMPQUEUEADAPTIVE: pqsize set to 13106116688
EMPQUEUEADAPTIVE: starting in-memory pqueue
EMPQUEUEADAPTIVE: available memory: 100000MB
EMPQUEUEADAPTIVE: desired memory: 100000MB
sz_stream: 270424 buf_arity: 200 mm_overhead: 8705664 mm_avail: 104857600000.
EMPQUEUEADAPTIVE: memory overhead set to 8.30237MB
EMPQUEUEADAPTIVE: pqsize set to 3276527948
terminate called after throwing an instance of 'std::bad_array_new_length'
  what():  std::bad_array_new_length

Any idea?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little update:
Now I double checked that changes are exactly as in the original ticket, ran make distclean and recompiled with debugging flags, requested debugging output (g.gisenv set="DEBUG=3") and ran
r.terraflow --overwrite --verbose elevation=DTM_1M_full@p_CulvertFragmentation_stefan.blumentrath filled=DEM_1M_flooded memory=30000 directory=/home/shared/tmp/ stats=/home/shared/tmp/terra_stats.txt &> /home/shared/tmp/dbg.txt
with less RAM assigned to the process (30GB instead of 100GB).

For the time being it seems to succeed beyond where it crashed in the earlier try...
Once I have a successful run, we can try to reintroduce the requested changes in this PR and see if it still works (meaning if the issue is the amount of memory and not long vs. int).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the command above succeeded. Next step will be to check if the issue was caused by memory...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(off_t) in 3scan.h does not seem to be needed. However, int in types.h does not work on LINUX while long does. Would it be a solution to define dimaension_type platform dependent as long on LINUX and long long (?) on Windows?

Error message with INT in types.h

Reading input data...
 100%
D3/3: nrows=60012   ncols=45010    stream_len()=2701140120
D1/3: total elements=2701140120, nodata elements=90019
D1/3: largest temporary files: 
D1/3:            FILL: 150.94G (162068407200) [2701140120 elements, 60B each]
D1/3:            FLOW: -133889044380 [2701050101 elements, 84B each]
D1/3: Will need at least 301.88G (324136814400) space available in /data/grassdata/ETRS_33N/culverts/.tmp/localhost/93228.0
r.terraflow: 3scan.h:140: void scan3line(FUN&, AMI_STREAM<T>*, AMI_STREAM<T>*, AMI_STREAM<T>*, BASETYPE, dimension_type) [with T = float; BASETYPE = float; FUN = detectEdgeNodata; dimension_type = int]: Assertion `ae == AMI_ERROR_END_OF_STREAM' failed.
Aborted (core dumped)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason why int as dimension_type fails is integer overflow because of missing/incorrect type casts: at various places the number of cells is computed as nrows * ncols, but in this case nrows * ncols = 2701140120, larger than INT_MAX = 2147483647. E.g.

D1/3:            FLOW: -133889044380 [2701050101 elements, 84B each]

is a negative number because of integer overflow. The fix would be in main.cpp

-  long long  flowmaxsize = (long long)(nrows*ncols - nodata_count)*sizeof(sweepItem);
+  long long  flowmaxsize = ((long long)nrows*ncols - nodata_count)*sizeof(sweepItem);

long as dimension_type avoids these integer overflows, but GRASS uses int as type for the number of rows and columns. The real fix would thus be to fix all occurrences where nrows * ncols or equivalent is not properly converted to a 64 bit integer type. r.terraflow (and the iostream lib) currently use long as 64 bit integer type which is not guaranteed to be 64 bit, sometimes it is 32 bit and thus the code is not portable.

In contrast to other GRASS modules, r.terraflow was not designed to work with more than 2 billion grid cells.

I suggest to use int as dimension_type and fix any resultant integer overflows.

raster/r.terraflow/types.h Outdated Show resolved Hide resolved
raster/r.terraflow/types.h Outdated Show resolved Hide resolved
@metzm
Copy link
Contributor

metzm commented Jun 13, 2019

Regarding tests, a one-time test with a large region where the number of rows and columns exceed SHORT_MAX should be ok.
For the testsuite, a test based on one of the North Carolina DEMS should be sufficient. Expected results should be computed with a version using
typedef short dimension_type;
then univariate statistics can be computed on selected outputs. In the testsuite, univariate statistics must match these selected outputs.

@ninsbl
Copy link
Member Author

ninsbl commented Jun 13, 2019

Tests added based on original version of r.terraflow (without the changes in this PR). Those tests succeed also with the latest changes modified based on the review (@metzm , thanks for taking the time!)

@landam
Copy link
Member

landam commented Nov 8, 2019

Still changes requested?

@ninsbl
Copy link
Member Author

ninsbl commented Dec 19, 2019

replaced by #265

@ninsbl ninsbl closed this Dec 19, 2019
@ninsbl ninsbl deleted the r_terraflow_long branch December 19, 2019 09:43
@neteler neteler added this to the 7.8.3 milestone Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants