Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Root slashes in resource data prevents complete file handling #181

Open
dylanmcreynolds opened this issue Aug 11, 2020 · 9 comments
Open

Comments

@dylanmcreynolds
Copy link
Contributor

dylanmcreynolds commented Aug 11, 2020

This is a little bit of pilot error, but it was easy to do and took a long time to debug.

I'm writing an ingestor to import rsoxs scattering data into databroker. When I compose the resource, I accidentally placed a forward slash in front of my resource_path below:

        ccd_resource = run_bundle.compose_resource(
            spec='FITS',
            root=images_root,
            resource_path="/foo/bar.fits",
            resource_kwargs={})

This prevents the file from being found, but os.path.lib here ignores the first parameter if it sees the forward slash in the second parameter:

resource_path = os.path.join(root, resource_path)

Should this be documented? Should there be check for a forward slash in resource_path and remove it? I dunno. But even if we don't do anything this issue might help someone in the future.

@danielballan
Copy link
Member

I tripped over this as well at some point this year but failed to open an issue. Thanks for flagging this. I think this should be prevented in document validation, and fixed retroactively on old data via a migration.

@dylanmcreynolds
Copy link
Contributor Author

dylanmcreynolds commented Aug 12, 2020

I think this should be prevented in document validation, and fixed retroactively on old data via a migration.

Ah, yes, that's a better strategy.

@dylanmcreynolds
Copy link
Contributor Author

Thinking more on this, I see your to you notion that get_handler not change the intent of what was put into the resource document, especially since success or failure depends on configuration in the future. On the other hand, here's a case that we know will never work.... a root was configured and found in a root_map, and a forward slash means it will never work.

How about a strongly-worded log message? That alone could have saved me a lot of time.

@danielballan
Copy link
Member

danielballan commented Aug 13, 2020

In that case should we just raise a clear exception? The application can catch it and log it if it wants to, but at the library layer my impulse is to fail informatively rather than log and then fail cryptically.

@dylanmcreynolds
Copy link
Contributor Author

I would totally agree with that but I'm a little nervous. Will that cause innocent queries on the run that don't need to access the image to fail just because someone screwed up the file mapping?

@danielballan
Copy link
Member

Fair point. I guess the order of operations then should be:

  1. Add log message as you suggest.
  2. Make document-creation functions pickier to stop generated problematic Resources.
  3. Fix historical documents.
  4. Change that log message to an error message.

Sound right?

@danielballan
Copy link
Member

Actually, I might have a slightly better idea. When we attempt the create the handler

# Apply root_map.
resource_path = resource['resource_path']
original_root = resource.get('root', '')
root = self.root_map.get(original_root, original_root)
if root:
resource_path = os.path.join(root, resource_path)
msg = (f"Error instantiating handler "
f"class {handler_class} "
f"with Resource document {resource}. ")
if root != original_root:
msg += (f"Its 'root' field was "
f"mapped from {original_root} to {root} by root_map.")
else:
msg += (f"Its 'root' field {original_root} was "
f"*not* modified by root_map.")
error_to_raise = EventModelError(msg)
handler = _attempt_with_retries(
func=handler_class,
args=(resource_path,),
kwargs=resource['resource_kwargs'],
intervals=[0] + self.retry_intervals,
error_to_catch=IOError,
error_to_raise=error_to_raise)
return handler

if we find ourselves about to error out and die anyway, can we at that point check whether the original resource_path was absolute, and raise a specific error message?

That's a bit complex, but I don't like the situation where you get a non-specific exception raises and the useful information is logged earlier, somewhere else, so it maybe worth going to these lengths to avoid that.

@dylanmcreynolds
Copy link
Contributor Author

I just deleted a whole reply because now I guess I understand error_to_raise better. So, in my situation, I raised an error_to_raise, and you're proposing to add text to that? Sure, seems right.

@danielballan
Copy link
Member

Yeah, to be precise I might even add a custom exception type rather than adding text so it can be caught/handled specifically if need be---an AbsoluteResourcePath exception class or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants