Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental FITS reader for CPU backend #4591

Merged
merged 62 commits into from
Mar 24, 2023

Conversation

aderylo
Copy link
Contributor

@aderylo aderylo commented Jan 20, 2023

Category: New feature

Description: Implement fits reader on cpu backend

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

Comment on lines 27 to 32
gds_data_root = '/scratch/'
if not os.path.isdir(gds_data_root):
gds_data_root = os.getcwd() + "/scratch/"
if not os.path.isdir(gds_data_root):
os.mkdir(gds_data_root)
assert os.path.isdir(gds_data_root)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need that for fits. It is used for the GPU Direct Storage solution that works only on the real file system, as our tests run inside the docker environment and its tmp file system doesn't work with GDS so we need to mount the external file system and use it for tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, addressed by removing this fragment.

@klecki klecki assigned klecki and unassigned mzient Jan 20, 2023
Copy link
Contributor

@klecki klecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments related to the operator API and some code trimming suggestions.

dali/operators/reader/fits_reader_op.cc Outdated Show resolved Hide resolved
dali/operators/reader/fits_reader_op.cc Outdated Show resolved Hide resolved
dali/operators/reader/fits_reader_op.cc Outdated Show resolved Hide resolved
dali/operators/reader/fits_reader_op.cc Outdated Show resolved Hide resolved
}
}

DALI_REGISTER_OPERATOR(readers__Fits, FitsReaderCPU, CPU);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we should start by placing this in experimental module, at least untill we get some more functionality:

Suggested change
DALI_REGISTER_OPERATOR(readers__Fits, FitsReaderCPU, CPU);
DALI_REGISTER_OPERATOR(experimental__readers__Fits, FitsReaderCPU, CPU);

This will mean that you can access the operator as experimental.readers.fits, so you would need to adjust the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed, followed through with the suggestion.

dali/operators/reader/fits_reader_op.cc Outdated Show resolved Hide resolved
dali/operators/reader/fits_reader_op.cc Show resolved Hide resolved
fits_movabs_hdu(current_file, hdu_index_, NULL, &status);

if (hdu_index_ < hdunum) {
hdu_index_++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested approach would mean, that we potentially read one sample consisting of multiple HDUs.

Comment on lines 68 to 90
try {
fits::ParseHeader(header, current_file);
} catch (const std::runtime_error& e) {
DALI_FAIL(e.what() + ". File: " + filename);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be enough to run it directly and let the exception be handled further. I think only the DALI_ENFORCE in this function may throw.

Suggested change
try {
fits::ParseHeader(header, current_file);
} catch (const std::runtime_error& e) {
DALI_FAIL(e.what() + ". File: " + filename);
}
fits::ParseHeader(header, current_file);


fits_get_img_param(src, MAX_DIMS, &img_type, &n_dims, dims, &status);
parsed_header.type_info = &TypeFromFitsImageType(img_type);
parsed_header.compressed = (fits_is_compressed_image(src, &status) == 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we error out on the compressed files until we merge the option to decompress them?

@aderylo aderylo force-pushed the fits-reader branch 2 times, most recently from 0804f9a to 954de00 Compare January 26, 2023 09:08
Comment on lines 48 to 55
int batch_size = GetCurrBatchSize();
const auto& file_0 = GetSample(0);
DALIDataType output_type = file_0.get_type();
int ndim = file_0.get_shape().sample_dim();
TensorListShape<> sh(batch_size, ndim);

// TODO: implement checking that all images have same dimensions
// also all general calculations for all images such as roi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the initial case it might be ok to assume uniform batch but maybe there shouldn't be such constraint in general and we should support image of different sizes.

Also, you need to peek the shape of the sample and set it for all element of sh so the executor can correctly resize the buffers. Now it will probably have some random contents (we just resized the internal vector to batch_size * ndim elements), so this is probably a bug.

Copy link
Contributor

@klecki klecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bunch of comments, but I think we are close to finishing with this.

dali/util/fits.cc Outdated Show resolved Hide resolved
dali/util/fits.cc Outdated Show resolved Hide resolved
}

if (status)
fits_report_error(stderr, status);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably obtain the error message and report it with DALI_FAIL


void FitsLoader::ReadSample(FitsFileWrapper& target) {
auto filename = files_[current_index_++];
fitsfile* current_file = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be safe for the resource management if we use UniqueHandle for managing the opened fits file: include/dali/core/unique_handle.h

See for example how to implement it here: include/dali/core/cuda_event.h

we could implement it like:

class DLL_PUBLIC FitsHandle: public UniqueHandle<fitsfile*, FitsHandle> {
 public:
  DALI_INHERIT_UNIQUE_HANDLE(fitsfile*, FitsHandle)
  constexpr FitsHandle() = default;

  /** @brief Opens the FITS file with fits_open_file*/
  static FitsHandle OpenFile(char *filename, int mode); 
 
  
  /** @brief Calls fits_close_file on the file handle */
  static void DestroyHandle(fitsfile*);
};

It will automatically call close on destruction so we can safely throw exceptions around.

dali/operators/reader/loader/fits_loader.cc Show resolved Hide resolved
using Operator<Backend>::spec_;

// rewrite this part given new approach with multiple outputs
// do i
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a question here :)

dali/operators/reader/fits_reader_op.cc Outdated Show resolved Hide resolved
Comment on lines 58 to 50
all_numpy_types = set([
np.bool_, np.byte, np.ubyte, np.short, np.ushort, np.intc, np.uintc, np.int_, np.uint,
np.longlong, np.ulonglong, np.half, np.float16, np.single, np.double, np.longdouble, np.csingle,
np.cdouble, np.clongdouble, np.int8, np.int16, np.int32, np.int64, np.uint8, np.uint16,
np.uint32, np.uint64, np.intp, np.uintp, np.float32, np.float64, np.float_, np.complex64,
np.complex128, np.complex_
])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's limit ourselves to some subset of the image types supported by FITS:
u8, i16, i32, i64, u32, u64.

Comment on lines 42 to 55

def FitsReaderPipeline(path, batch_size, device="cpu", file_list=None, files=None,
file_filter="*.fits", num_threads=1, device_id=0,
hdu_indices=None, dtype=None):
pipe = Pipeline(batch_size=batch_size, num_threads=num_threads, device_id=device_id)
data = fn.readers.fits(device=device,
file_list=file_list,
files=files,
file_root=path,
file_filter=file_filter,
shard_id=0,
num_shards=1)
pipe.set_outputs(data)
return pipe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nitpicks, I recommend using @pipeline_def decorator and you need to add the experimental module:

Suggested change
def FitsReaderPipeline(path, batch_size, device="cpu", file_list=None, files=None,
file_filter="*.fits", num_threads=1, device_id=0,
hdu_indices=None, dtype=None):
pipe = Pipeline(batch_size=batch_size, num_threads=num_threads, device_id=device_id)
data = fn.readers.fits(device=device,
file_list=file_list,
files=files,
file_root=path,
file_filter=file_filter,
shard_id=0,
num_shards=1)
pipe.set_outputs(data)
return pipe
@pipeline_def
def FitsReaderPipeline(path, device="cpu", file_list=None, files=None,
file_filter="*.fits", hdu_indices=None, dtype=None):
data = fn.experimental.readers.fits(device=device,
file_list=file_list,
files=files,
file_root=path,
file_filter=file_filter,
shard_id=0,
num_shards=1)
return data

This function would accept arguments for Pipeline constructor as kwargs, so you can still pass the batch_size, num_threads and device_id.

Comment on lines 38 to 40
def delete_fits_file(filename):
if os.path.isfile(filename):
os.remove(filename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused, you could probably take a look into setting up a Unittest setUp and tearDown functions for this purpose.

dali/util/fits.cc Outdated Show resolved Hide resolved
.AddOptionalArg("hdu_indices",
R"(HDU indexes to read. If not provided, first HDU after primary
will be yielded (i.e. will default to hdu_indices=[2]).)",
std::vector<int>{2})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try the GetRepeatedArgument method.

@klecki
Copy link
Contributor

klecki commented Mar 17, 2023

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7619795]: BUILD STARTED

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Signed-off-by: aderylo <a.m.derylo@gmail.com>
Signed-off-by: aderylo <a.m.derylo@gmail.com>
Signed-off-by: aderylo <a.m.derylo@gmail.com>
Signed-off-by: aderylo <a.m.derylo@gmail.com>
Signed-off-by: aderylo <a.m.derylo@gmail.com>
Signed-off-by: aderylo <a.m.derylo@gmail.com>
Signed-off-by: aderylo <a.m.derylo@gmail.com>
Signed-off-by: aderylo <a.m.derylo@gmail.com>
Signed-off-by: aderylo <a.m.derylo@gmail.com>
Signed-off-by: aderylo <a.m.derylo@gmail.com>
Signed-off-by: aderylo <a.m.derylo@gmail.com>
Comment on lines 79 to 83
DALI_ENFORCE(status == 0, make_string("Failed to read a file: ", filename));


// close the file handle
fits_close_file(current_file, &status);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Krzysztof's comment about the UniqueHandle. This looks like a handle's leak on unsuccessful fits_read_img.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

fits_read_img(current_file, header.datatype_code, 1, nelem, &nulval,
static_cast<uint8_t*>(target.data[output_idx].raw_mutable_data()), &anynul,
&status);
DALI_ENFORCE(status == 0, make_string("Failed to read a file: ", filename));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consideration:

I wonder if it's not more user-friendly if we forward a message from the fits itself here.

make_string("Failed to read a file: ", filename, ". ", GetFitsErrorMessage(status))

where GetFitsErrorMessage is like HandleFitsError but simply returns the string. (You could then use it inside HandleFitsError too.)

Comment on lines 97 to 99
fits_get_img_dim(src, &n_dims, &status); /* get NAXIS value */
std::vector<int64_t> dims(n_dims, 0); /* create vector for storing img dims*/
fits_get_img_size(src, n_dims, &dims[0], &status); /* get NAXISn values */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how paranoid we should be here, but I guess it wouldn't hurt to validate the ndim and shape for being non-negative if that can be an arbitrary value in a malicious file.

dali/util/fits.cc Outdated Show resolved Hide resolved
auto& sample = GetSample(sample_idx);
for (int output_idx = 0; output_idx < num_outputs; output_idx++) {
output_desc[output_idx].shape.set_tensor_shape(sample_idx, sample.data[output_idx].shape());
output_desc[output_idx].type = sample.data[output_idx].type();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you (or one of the base classes?) validate the dtypes against what we actually get in the header?

const auto& sample_0 = GetSample(0);

for (int output_idx = 0; output_idx < num_outputs; output_idx++) {
int ndim = sample_0.data[output_idx].shape().sample_dim();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the ndim's in the actual headers differ between files?

Copy link
Contributor Author

@aderylo aderylo Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea. In any case, I've added check for consistency of dimensions and types between files on corresponding outputs.

Copy link
Member

@stiepan stiepan Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how the reader acts if

  1. some dtypes in the header don't match the declarated dtypes
  2. ndims differ between files
    3*. there are no files to read (due to empty dir, empty list file, restritive glob)

I belive the third point should be handled by the base class, so it is nice to have but for the first two I think we should have tests that assert we fail in orderly manner (take a look at assert_raises).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 & 3 should raise errors, 1 is not handled yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Nontheless, it would be good to have a simple test cases that validate it is really the case. For now and future changes.

Signed-off-by: aderylo <a.m.derylo@gmail.com>
Signed-off-by: aderylo <a.m.derylo@gmail.com>
Signed-off-by: aderylo <a.m.derylo@gmail.com>
…cfitsio api

Signed-off-by: aderylo <a.m.derylo@gmail.com>
Signed-off-by: aderylo <a.m.derylo@gmail.com>
…om provided arguments

Signed-off-by: aderylo <a.m.derylo@gmail.com>
@klecki
Copy link
Contributor

klecki commented Mar 23, 2023

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7686640]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7686640]: BUILD FAILED

Signed-off-by: aderylo <a.m.derylo@gmail.com>
@klecki
Copy link
Contributor

klecki commented Mar 23, 2023

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7688979]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7688979]: BUILD PASSED

Comment on lines +44 to +45
4. Number of outputs per sample corresponds to the length of ``hdu_indices`` argument. By default,
first HDU with data is read from each file, so the number of outputs defaults to 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, can be done as a follow-up

Suggested change
4. Number of outputs per sample corresponds to the length of ``hdu_indices`` argument. By default,
first HDU with data is read from each file, so the number of outputs defaults to 1.
The number of outputs per sample corresponds to the length of ``hdu_indices`` argument. By default,
first HDU with data is read from each file, so the number of outputs defaults to 1.

parsed_header.hdu_type = hdu_type;
parsed_header.datatype_code = ImgTypeToDatatypeCode(img_type);
parsed_header.type_info = &TypeFromFitsDatatypeCode(parsed_header.datatype_code);
parsed_header.compressed = (fits_is_compressed_image(src, &status) == 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't validate this particular status ever, do we?

Comment on lines +65 to +80
sample.data[output_idx].shape().sample_dim() == ndims[output_idx],
make_string("Inconsistent data: All samples in the batch must have the same number of "
"dimensions for outputs with the same idx. "
"Got \"",
sample_0.filename, "\" with ", ndims[output_idx], " dimensions and \"",
sample.filename, "\" with ", sample.data[output_idx].shape().sample_dim(),
" dimensions on ouput with idx = ", output_idx));

DALI_ENFORCE(
sample.data[output_idx].type() == output_dtypes[output_idx],
make_string("Inconsistent data: All samples in the batch must have the same data type "
"for outputs with the same idx. "
"Got \"",
sample_0.filename, "\" with data type ", output_dtypes[output_idx],
" and \"", sample.filename, "\" with data type ",
sample.data[output_idx].type()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@stiepan
Copy link
Member

stiepan commented Mar 24, 2023

It looks good, it seems that it works, great job!

As a follow-up, please keep in mind: 1. a few test cases on incorrect inputs would be helpful, 2. enforcing the dtypes if we accept them 3. there is one status of fits call that is not validated (it seems like a special case when the status is not returned as an output; if the call cannot fail according to fits docs then please ignore that remark).

@klecki klecki changed the title Reader and loader for fits files on CPU backend Add experimental FITS reader for CPU backend Mar 24, 2023
@klecki klecki merged commit a4f7d59 into NVIDIA:main Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants