-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clear downloads #228
Clear downloads #228
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a sane pattern. Just curious about how far you are taking this. Do you plan to just make it a global "remove everything" or do you think you'd like to offer the option of just removing specific folders?
@@ -204,6 +212,9 @@ def organizations_etl(self, | |||
self.add_new_organizations() | |||
self.add_new_posts() | |||
|
|||
if clear_downloads: | |||
shutil.rmtree(self.organizations_folder) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reginafcompton Are you planning to add this to the other download folders, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this question, otherwise looks regular to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, i just saw your comment in the description! fire away with the other folders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all right - the way it should work: by default, import_data
removes all files from all download directories. @evz - I think we should go folder-by-folder, rather than do a "global" remove-everything-everytime, in the event that an import fails and I need the downloads for debugging (or I simply do not want to wait for the script to download everything again).
@hancush - ready aim fire
I diverged slightly from my original solution. I implemented an optional The import for people expects downloaded data about organizations. (N.b., This has always been the case, but we did not have safeguards in place for handling the people endpoints in isolation). So, rather than deleting the downloads, folder-by-folder, as the import happens, the script just deletes everything at the end. If someone is running a big import and wants to keep everything, then they can do that with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
This PR implements an option for clearing the contents of the downloads directory.
It handles several related issues, mainly: datamade/nyc-council-councilmatic#155 and #226
I emulated the pattern used for
import_only
anddownload_only
, because I wanted to preserve the option of NOT clearing the downloads. Why? Sometimes, one must run a big import, and preserving the contents of the downloads for some interval of time can be helpful.Does this make sense? If so, I can add the pattern to people, bills, and events (or think about refactoring for DRYer code).
@evz and @hancush - could you take a look at this before I go any further?