Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Min/Max Values in CATALOG don't match the actual data #2

Open
markveillette opened this issue May 20, 2020 · 0 comments
Open

Min/Max Values in CATALOG don't match the actual data #2

markveillette opened this issue May 20, 2020 · 0 comments

Comments

@markveillette
Copy link
Collaborator

There are rare cases in the dataset where the data_min and data_max columns in the catalog don't match the min/max measured from the actual (decoded) images.

For example, event R19011212048075 for img_type='ir069'. This entry in the CATALOG.csv is

id                                                  R19011212048075
file_name         ir069/2019/SEVIR_IR069_RANDOMEVENTS_2019_0101_...
file_index                                                      821
img_type                                                      ir069
time_utc                                        2019-01-12 12:00:00
minute_offsets    -120:-115:-110:-105:-100:-95:-90:-85:-80:-75:-...
episode_id                                                      NaN
event_id                                                        NaN
event_type                                                      NaN
llcrnrlat                                                   38.9436
llcrnrlon                                                  -92.3178
urcrnrlat                                                   42.0725
urcrnrlon                                                  -87.3715
proj              +proj=laea +lat_0=38 +lon_0=-98 +units=m +a=63...
size_x                                                          192
size_y                                                          192
height_m                                                     384000
width_m                                                      384000
!data_min                                                   -23540.1
!data_max                                                     22.877
pct_missing                                                       0
Name: 39505, dtype: object

The minimum value in this case is -23540.1 degrees C, which is strange value. And if we actually look at the minimum in the image stored in SEVIR, we see a value of -18312, which decodes to -183.12. That's different than what's reported above.

Explanation

Looking at the data, this happens when there are a few bad pixels in the image, typically in very high and thick clouds:

bad_ir069

Data is converted to int16 before being written to .h5, however the min/max values entered in the CATALOG are recorded before this casting is done. In cases of bad pixels, these values get very large (as what happened in this case), and the true minimum of the data causes and int16 overflow when scaled. So the pixel value stored for these bad pixels in SEVIR is garbage (as is the value stored in the CATALOG).

Unfortunately, this cannot be fixed easily without recreating the whole dataset. A good practice would be in preprocessing to clip pixels to a physically reasonable range computed by filtering out outliers like this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant