Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SVCS-133] Reject complicated self-overwrites #270

Conversation

AddisonSchiller
Copy link
Contributor

@AddisonSchiller AddisonSchiller commented Sep 22, 2017

refs: https://openscience.atlassian.net/browse/SVCS-133

Purpose

As per the instructions on the linked ticket, it is possible to silently cause files to delete themselves during intra_copy/moves. This ticket is to fix this issue on the providers it affects.
Effected providers:
*Box
*Dropbox
*GoogleDrive
*Owncloud

Summary of Changes

Added Code to check the full path, or identifiers of the source and destination. If they overlap (are the same) the move/copy is rejected with a 409 and an error message that "files cannot overwrite themselves"

Added tests to the affected providers.
Corrected tests that were improperly erring due to the addition.

Tests and QA Notes

There are testing notes on the ticket for how to properly recreate the issue.
For the 4 providers listed, copy/move with replace/keep both and rename should all be tested extensively. They should be tested with and without the setup for the bug in place

Light testing on other addons/providers should also be done.

@coveralls
Copy link

coveralls commented Sep 22, 2017

Coverage Status

Coverage increased (+0.09%) to 82.67% when pulling 6faec8d on AddisonSchiller:feature/reject-complicated-self-overwrites into 0621297 on CenterForOpenScience:develop.

@AddisonSchiller AddisonSchiller changed the title Reject complicated self-overwrites [SVCS-133] Reject complicated self-overwrites Oct 5, 2017
@coveralls
Copy link

coveralls commented Oct 31, 2017

Coverage Status

Coverage increased (+0.03%) to 89.229% when pulling 57d7577 on AddisonSchiller:feature/reject-complicated-self-overwrites into cc68aca on CenterForOpenScience:develop.

@coveralls
Copy link

coveralls commented Nov 17, 2017

Coverage Status

Coverage increased (+0.02%) to 89.965% when pulling 78d1974 on AddisonSchiller:feature/reject-complicated-self-overwrites into 26bf209 on CenterForOpenScience:develop.

Copy link
Member

@TomBaxter TomBaxter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into an issue while testing that I think will require a significant change. If I do a move with a conflict=keep. Then the original file is deleted, contrary to the meaning of "keep". In order to catch this, the test for sameness would need to occur before the call to handle_naming in core.provider.move . With that in mind, can we make the if src_path.* == dest_path.*: tests in each provider be it's own method.. So that it can be called from core.provider? Or better yet, can it be generalized for all providers in a method inside of core.provider?

With that in hand, we can catch before getting to handle_naming

If same path and copy/keep: let it go, or error
If same path and copy/replace: error
If same path and move/keep: switch it to copy/keep, or error
If same path and move/replace: error

@@ -1,4 +1,5 @@
import aiohttp
from http import HTTPStatus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think http goes in a section to itself at the top for "Standard Library" libs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Fixing an issue where when a child component is hooked up
To a project, and they share the same provider, it is
possible to silently delete files
@AddisonSchiller AddisonSchiller force-pushed the feature/reject-complicated-self-overwrites branch from 950f3d9 to 5db8f08 Compare November 22, 2017 16:11
@AddisonSchiller
Copy link
Contributor Author

CRR completed:
The whole code has basically been refactored for increased functionality. Old tests removed and new tests added.
This change can now possibly effect other providers/addons, and they should also be tested a bit as well. See new QA notes.

@coveralls
Copy link

coveralls commented Nov 22, 2017

Coverage Status

Coverage increased (+0.02%) to 89.961% when pulling 5db8f08 on AddisonSchiller:feature/reject-complicated-self-overwrites into 26bf209 on CenterForOpenScience:develop.

Copy link
Member

@TomBaxter TomBaxter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Three comments regarding the location of the override check. I think it needs to occur regardless of whether handle_naming is True.
Tried to address the confusion you mentioned in one of your comments.
Suggestion for adding to one of your doc strings.

# In handle_naming so we don't run it multiple times
# This does not mean the `dest_path` is a folder, at this point it could just be the parent
# of the actual dest_path, or have a blank name
if not dest_path.is_file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this shouldn't be checked regardless of whether or not handle_naming is True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done for all 3 👍

# In handle_naming so we don't run it multiple times
# This does not mean the `dest_path` is a folder, at this point it could just be the parent
# of the actual dest_path, or have a blank name
if not dest_path.is_file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this shouldn't be checked regardless of whether or not handle_naming is True.

@@ -269,7 +281,29 @@ def request(self, *args, **kwargs):
'conflict': conflict,
'got_rename': rename is not None,
})

if not dest_path.is_file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the right spot to me, Does this need to be duplicated in if handle_naming:?

if src_path.is_dir and dest_path.is_file:
# Cant copy a directory to a file
raise ValueError('Destination must be a directory if the source is')

# This is confusing. `dest_path` at this point can refer to the parent or root of
# the file we want to move/copy. So even if moving/copying a file, this code will run
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think of at least one case where dest_path.is_file will be True. In the case of overwriting an existing file with new content. Something that cannot be done through osf.io but can be directly through the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This made me realize that I should do some more testing on that sort of situation. All looked good though.

""" Return wether a move or copy operation will result in a self-overwrite.

.. note::
Defaults to False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment, noting that this is meant to be overridden by providers that need to check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

complete

rename or src_path.name,
folder=src_path.is_dir
)
if self.will_self_overwrite(dest_provider, src_path, temp_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider if this isn't it's own method but just src_path.NAME == dest_provider.NAME and dest_path.full_path == src_path.full_path or some other general comparison like src_path.identifier == dest_path.identifier you avoid having to modify individual providers. Some combination of identifier, name and path comparisons should probably stop this in the handler before it gets to the provider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we don't know if there will be a conflict before it gets to the provider. dest_path is generic sometimes when it hits move or copy, and we do not have an idea of what the real path will be until we run revalidate_path on the dest_path with src_path.name or rename.

The reason it was done as a new function in the providers is that many providers do not suffer from this bug, and running a general check on those could return True and reject a perfectly valid operation.

This is mostly due to the nature of the bug, where often the file is
src_path = "/folder1/folder2/file.txt"
and
dest_path = "/folder2/file.txt"

While they refer to the same file, oftentimes they will not look like the same file due to which folder a provider is attached to.

@Johnetordoff
Copy link
Contributor

@AddisonSchiller commits messages should have some indication of the changes being made and should generally be written in the imperative.

@AddisonSchiller
Copy link
Contributor Author

@Johnetordoff Its going to get rebased in a little bit with a different message?

@Johnetordoff
Copy link
Contributor

Well remember that for the future, it doesn't have to be elaborate just vaguely searchable. People might pick this up before you can rebase it.

@AddisonSchiller
Copy link
Contributor Author

@Johnetordoff Next time a message like this is better sent over flow dock.

@coveralls
Copy link

coveralls commented Nov 29, 2017

Coverage Status

Coverage increased (+0.03%) to 89.97% when pulling ba7d01a on AddisonSchiller:feature/reject-complicated-self-overwrites into 26bf209 on CenterForOpenScience:develop.

@felliott
Copy link
Member

@TomBaxter is taking this ticket over. There seems to be a lot of overlap between this and #274 so they may end up getting merged into one super-PR. Leaving this open for now for reference until a final PR is merged.

@NyanHelsing
Copy link
Contributor

New PR #341

Copy link
Contributor

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR replaced and closed 🚫

@cslzchen cslzchen closed this May 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants