Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coast: limit decimals when dumping GSHHG/DCW #8516

Closed
anbj opened this issue Jun 7, 2024 · 18 comments
Closed

coast: limit decimals when dumping GSHHG/DCW #8516

anbj opened this issue Jun 7, 2024 · 18 comments
Assignees
Labels
feature request Request a new feature

Comments

@anbj
Copy link
Contributor

anbj commented Jun 7, 2024

coast will use 10 decimals when dumping coastlines or country polygons.
Using 5 decimals, you're at a resolution of ~1.11 meter, so we're talking millimeter/sub-millimeter-scale.

Would it be an idea to hardcode dumping of GSSHG/DCW to at most 5 decimals?
Disk space is ample these days, but no need to waste it.

PS! Minor issue, I just wanted to vent.

@anbj anbj added the feature request Request a new feature label Jun 7, 2024
@seisman
Copy link
Member

seisman commented Jun 7, 2024

This is controlled by the FORMAT_FLOAT_OUT parameter, e.g.,:

gmt coast -M -W1p -R0/10/0/10 --FORMAT_FLOAT_OUT=%.5f

@seisman
Copy link
Member

seisman commented Jun 7, 2024

It's not documented, so we should improve the documentation. PR is welcomed.

@anbj
Copy link
Contributor Author

anbj commented Jun 7, 2024

Yes, it is, but I imagine most people don't think about this, hence it may be hardcoded. Mentioning this in the docs is a good solution - will make a PR, probably.

@anbj
Copy link
Contributor Author

anbj commented Jun 7, 2024

The source data for dcw has only 5 digits, e.g. :

$ head orig/EU/NO.txt 
> norway 0
5.127303 59.824047
5.139871 59.816860
5.140088 59.813950
5.135608 59.813023
5.131952 59.814030
5.128451 59.819134
5.122212 59.821632
5.127303 59.824047
> norway 1

Still, dumping the polygon gives 10 decimals, the 5 latter not being 0:

$ gmt coast -ENO -M | head
>  Norway Segment 0
5.12749572793	59.8240650838
5.14002533806	59.8168021861
5.14002533806	59.8139777259
5.13557934737	59.8129689901
5.13194171862	59.8139777259
5.12830408988	59.8190214048
5.1222413753	59.8216441179
5.12749572793	59.8240650838
>  Norway Segment 1

How can this be?

@seisman
Copy link
Member

seisman commented Jun 7, 2024

Single precision for floating-point numbers?

@joa-quim
Copy link
Member

joa-quim commented Jun 7, 2024

Well, it is documented in the sense that FORMAT_FLOAT_OUT controls the format of all data that is is written in ascii, for all modules. When it doesn't, like it happened no to long ago with pscoast (I think) that is a bug.

But I noticed something that is worse. Although these data is does not have a high precision in localization, we are degrading it in about ~20 m. A consequence of the binning/scaling algorithm but something to have in mind for future.

@joa-quim
Copy link
Member

joa-quim commented Jun 7, 2024

How can this be?

If you look into the dcw-gmt.nc file with HDF explorer you will see that data is stored as short integers (2 bytes). This was the scheme used originally to compress the GSHHG data to ~45 MB, which was still huge 30 years ago.

@anbj
Copy link
Contributor Author

anbj commented Jun 7, 2024

Ok, so this is an explainable artifact then, I assume, based on your answer.

(I've read that numbers become complicated with all kind of strange rounding effects once you go into the float/long/etc. world, so won't go into that hole right now)

@joa-quim
Copy link
Member

joa-quim commented Jun 7, 2024

The point here is that we are saving floating point data (4 bytes floats are enough) in 2 bytes ints and with that we obviously loose precision. While the situation is not that bad in GSHHG because it uses a binning schema where by knowing the bin we already know the integer part of the bin corner orgin and the 2 bytes (0-65535) can be used to store only the decimal part, the situation in DCW is different. Here we want to store the data as polygons so we cannot use the binning and as a consequence the 2 bytes can provide only a precision of ~0.001 degrees (1 / 65535 = ~1.5e-5; 1.5e-5 * 180 ~= 0.0027...)

@anbj
Copy link
Contributor Author

anbj commented Jun 7, 2024

Thanks, interesting. So just 4 decimals are basically enough?

@joa-quim
Copy link
Member

joa-quim commented Jun 7, 2024

Not sure I understand the question. We cannot choose the number of significant decimals. We have what we have, and if I'm right the precision decreases as we move way from Greenwich and the Equator.

@anbj
Copy link
Contributor Author

anbj commented Jun 7, 2024

Alright, thanks. I might make a PR just noting that one may consider setting FORMAT_FLOAT_OUT in the coast docs.

@anbj anbj self-assigned this Jun 7, 2024
@Esteban82
Copy link
Member

This is the script that creates the DCW file.
There it says:

# Set enough decimals to avoid bad rounding
rm -f gmt.conf
gmt set FORMAT_FLOAT_OUT %.14g

@Esteban82
Copy link
Member

And from what I understand by looking at lines 116 to 150 of the script, I think the 2 byte range is set for each longitude and latitude range of each polygon. In practice this means that larger polygons have lower accuracy.

Explore the file and extract the scales used (dcw-scales.txt)
Here are the most severe cases (ISO code)

Scale Scale Value 1 / Value
AQ_lon:scale 182.150451 0.00549
RU_lon:scale 382.7710674 0.00261
US_lon:scale 543.3403805 0.00184
CA_lon:scale 741.4468027 0.00135
GE_lon:scale 9765.136073 0.00010
IS_lat:scale 19998.35216 0.00005
JM_lat:scale 79588.59491 0.00001

That is, the worst accuracies are for the longitude values of the polygons of Antarctica (AQ), United States, Russia and Canada.

For the longitude of Germany we have a precision of 0.0001. For the latitude of Iceland we have 0.00005.
We serve a precision of 1e-6 (as the original data) for the latitude of Jamaica.

@Esteban82
Copy link
Member

I made these two maps to compare the original data (from the orig dir) and the process data (from DCW).
Ideally the lines should overlap. Differences are visible. But they don't look as bad as I expected.

Antartida
Russia

Full script

origen=/home/federico/Github/GenericMappingTools/dcw-gmt/orig/

gmt begin Russia png
	gmt coast -ERU+pred -R32/35/66/67 -Baf -JM25c
	gmt plot $origen/AS/RU.txt -Wfaint,green
	gmt basemap -L+w20k+o3c+f+u
gmt end


gmt begin Antartida png
	gmt coast -EAQ+pred -R-65.5/-63/-66/-65 -Baf -JM25c
	gmt plot $origen/AN/AQ.txt -Wfaint,green -l"Original Data"
	gmt basemap -L+w20k+o3c+f+u
gmt end

@Esteban82
Copy link
Member

Now I made a zoom and add the GSHHG data set.
If I assume that GSHHG is the truth, then there is no point in improving the accuracy of DCW data when its accuracy is low (for this zoom). "The Digital Chart of the World is a comprehensive 1:1,000,000 scale vector basemap of the world. "

Conclusion. I think DCW should be left as it is.

Russia2

gmt begin Russia2 png
	gmt coast -ERU+pred -R33:20/34/66:15/66.5 -Baf -JM25c
	gmt plot $origen/AS/RU.txt -Wfaint,green
	gmt coast -W
	gmt basemap -L+w5k+o3c+f+u
gmt end

@joa-quim
Copy link
Member

joa-quim commented Jun 8, 2024

No, GSHHG cannot be assumed as the truth. We have permanent complains on hows the coast coastlines do not align with other data or satellite images. GSHHG is very old and suffers from using a different datum than modern data.

All our effort on coastlines/borders front should be concentrated in creating a new full++ coasts file, but the big shit is that we need to recreate a tool that is able to make such file.

@anbj
Copy link
Contributor Author

anbj commented Jun 19, 2024

Dealt with in #8524.

@anbj anbj closed this as completed Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request a new feature
Projects
None yet
Development

No branches or pull requests

4 participants