Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0 used as missing data placeholder #6

Open
fros1y opened this issue Mar 9, 2024 · 3 comments
Open

0 used as missing data placeholder #6

fros1y opened this issue Mar 9, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@fros1y
Copy link

fros1y commented Mar 9, 2024

I've noticed that, for many fields, 0 is being used as a missing data representation. For latitude and longitude, however, this is particularly problematic, since 0N0E is a real place! (https://en.wikipedia.org/wiki/Null_Island). Testing for zero equality is also troublesome, since equality is ill-defined over floating point values (such as latitude and longitude).

Can the API return None or some other marker for these missing values instead of 0, or does the underlying data have the same ambiguity?

@ChuBL ChuBL added the bug Something isn't working label Mar 10, 2024
@ChuBL
Copy link
Owner

ChuBL commented Mar 10, 2024

This is a good point. The 0 values are rooted in the databases, and I have passed this issue on to the developing teams. Hopefully, we can eliminate these annoying 0s in future versions.

@ChromiteExabyte
Copy link

ChromiteExabyte commented Apr 13, 2024

For placeholders in the MySQL database, it is appropriate to replace 0 with the special datatype null or with an empty string ''. The choice depends on what is "meant" by the database

  • If a value is not known and is sure not to exist, it is most accurate to have an empty string '' as a field value.
  • If a value is not known but is presumed to exist, it is most accurate to have the NULL datatype as a field datatype.

Unknown lat/long values are most accurately NULL; while it is certain that localities have a spatial reference, it is not known for that record.

My understanding is Mindat's aim is to be a repository of mineral properties / attributes / etc first. Under "Open Geoscience Data", there are tools for "GeoCODES and "DataONE" for locality data.

Cleansing data is a key part of the data science and data handling; there are many memes regarding the subject. For now, users can script out solutions: "if lat = 0, assign NULL to lat".

Source:

MySQL Reference Manual, Section B.3.4.3 Problems with NULL Values
https://dev.mysql.com/doc/refman/8.3/en/problems-with-null.html

@ChuBL
Copy link
Owner

ChuBL commented Apr 13, 2024

For placeholders in the MySQL database, it is appropriate to replace 0 with the special datatype null or with an empty string ''. The choice depends on what is "meant" by the database

  • If a value is not known and is sure not to exist, it is most accurate to have an empty string '' as a field value.
  • If a value is not known but is presumed to exist, it is most accurate to have the NULL datatype as a field datatype.

Unknown lat/long values are most accurately NULL; while it is certain that localities have a spatial reference, it is not known for that record.

My understanding is Mindat's aim is to be a repository of mineral properties / attributes / etc first. Under "Open Geoscience Data", there are tools for "GeoCODES and "DataONE" for locality data.

Cleansing data is a key part of the data science and data handling; there are many memes regarding the subject. For now, users can script out solutions: "if lat = 0, assign NULL to lat".

Source:

MySQL Reference Manual, Section B.3.4.3 Problems with NULL Values https://dev.mysql.com/doc/refman/8.3/en/problems-with-null.html

Noted, thank you for the advice and reference. I will forward your message to the Mindat database administrators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants