Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy data from frontend to frontend #351

Merged
merged 7 commits into from Nov 20, 2020
Merged

copy data from frontend to frontend #351

merged 7 commits into from Nov 20, 2020

Conversation

JoranAngevaare
Copy link
Member

@JoranAngevaare JoranAngevaare commented Nov 17, 2020

What is the problem / what does the code in this PR do
Some frontends are harder to interface with than others. Using this new function, one is able to copy data from one frontend to another. This is especially useful in at least these two concrete examples:

  1. Querying the database is in Mongo.py is expensive. Better save it to disk as that is quicker and prevents people from overloading the database.
  2. The mongo backend is a nightmare to use outside of dali, it's much easier to transfer when backcasted to a normal DataDirectory frontend.

Can you briefly describe how it works?
Strax magic! I wrote it such that it can be used either trying to write to all the other frontends or to a specific one (

Can you give a minimal working example (or illustrate with a figure)?

for sf in st.storage[1:]:
    print(sf)
    print(st._is_stored_in_sf(run_id, target, sf))
st._copy_to_frontend(run_id, target, target_frontend_id=3)
for sf in st.storage[1:]:
    print(sf)
    print(st._is_stored_in_sf(run_id, target, sf))
## output
strax.storage.files.DataDirectory, path: /data/real_data/
True
strax.storage.files.DataDirectory, path: /data/_tests8/
False
strax.storage.files.DataDirectory, path: /data/_tests9/
False
# After copy
strax.storage.files.DataDirectory, path: /data/real_data/
True
strax.storage.files.DataDirectory, path: /data/_tests8/
False
strax.storage.files.DataDirectory, path: /data/_tests9/
True

Future requests
One might think that we may just as well then erase the original copy. This is is however not per se trivial in strax (yet), I might think about if we need this at some point.

strax/context.py Outdated
@@ -1361,6 +1358,106 @@ def _apply_function(self, data, targets):
data = function(data, targets)
return data

def _copy_to_frontend(self, run_id, target,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should not be _-ed. If the reviewer has an opinion I'm interested

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the one hand I agree with you that this function should just be used by a very limited amount of people who know what they are doing and hence make the function not "visible". On the other hand I think it is very unlikely that anyone will use this function except for the experts. My/our concerns are very limited, and the underscore usually indicates that the function is a private member, I agree with your statement to remove it.

Copy link
Collaborator

@WenzDaniel WenzDaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good. Thanks a lot.

strax/context.py Outdated
raise strax.DataNotAvailable(f'Cannot copy {run_id} {target} since it '
f'does not exist')
if len(strax.to_str_tuple(target)) > 1:
raise ValueError('_copy_to_frontend only works for ')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message is missing something

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoop, thanks will add

target_sf = self.storage
elif len(self.storage) > target_frontend_id:
# only write to selected other frontend
target_sf = [self.storage[target_frontend_id]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if it would not be a bit neater to add a fornt_end_id to the properties of every forntned. In that case one could precisely distinguish between frontends. However, this would also require some additional checks, e.g. that the same id is not specified twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the gain? Now it's just the index in st.storage which is a list. Using a number is quite simple.

Could try to match the __repr__ or the input but that sounds more convoluted. Could do though because you than don't by accident fill the wrong sf.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think it is not really needed and I am fine with the PR as it is.

Could do though because you than don't by accident fill the wrong sf.

This was also my concern, but I am not sure if the "risk" justifies the extra work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I'm not that sure if it makes sense. the repr is quite a long string and just referring to the class name will also not be sufficient.

There is not that much real risk as we are in the end checking if the sf is susceptible at all anyways

strax/context.py Outdated
@@ -1361,6 +1358,106 @@ def _apply_function(self, data, targets):
data = function(data, targets)
return data

def _copy_to_frontend(self, run_id, target,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the one hand I agree with you that this function should just be used by a very limited amount of people who know what they are doing and hence make the function not "visible". On the other hand I think it is very unlikely that anyone will use this function except for the experts. My/our concerns are very limited, and the underscore usually indicates that the function is a private member, I agree with your statement to remove it.

Copy link
Contributor

@darrylmasson darrylmasson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Daniel beat me to a few comments

for sf in self.storage:
try:
sf.find(key, **self._find_options)
if self._is_stored_in_sf(run_id, target, sf):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible more compact alternate could be return any(map(partial(self._is_stored_in_sf, run_id, target), self.storage)), but this boolean-optimizes (as it were).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about something like this but I decided against it. I wrote it like this as it returns as soon as a single one is true, you get the True flag. If one writes a SF that takes long to see if something is stored, you may safe some time.

if source_sf is None and raise_error:
raise ValueError('This cannot happen, we just checked that this '
'run should be stored?!?')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing return?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh my... thanks

strax/context.py Outdated
# Keep frontends that:
# 1. already have the data; and
# 2. take the data; and
# 3. are readonly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"not"

@JoranAngevaare JoranAngevaare merged commit 9bbe2e1 into master Nov 20, 2020
@JoranAngevaare JoranAngevaare deleted the copy_to_sf branch November 20, 2020 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants