Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear downloads #228

Merged
merged 5 commits into from
Jan 29, 2019
Merged

Clear downloads #228

merged 5 commits into from
Jan 29, 2019

Conversation

reginafcompton
Copy link
Contributor

@reginafcompton reginafcompton commented Jan 28, 2019

This PR implements an option for clearing the contents of the downloads directory.

It handles several related issues, mainly: datamade/nyc-council-councilmatic#155 and #226

I emulated the pattern used for import_only and download_only, because I wanted to preserve the option of NOT clearing the downloads. Why? Sometimes, one must run a big import, and preserving the contents of the downloads for some interval of time can be helpful.

Does this make sense? If so, I can add the pattern to people, bills, and events (or think about refactoring for DRYer code).

@evz and @hancush - could you take a look at this before I go any further?

@reginafcompton reginafcompton changed the title Implement pattern for deleting contents of downloads Clear downloads Jan 28, 2019
Copy link
Contributor

@evz evz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a sane pattern. Just curious about how far you are taking this. Do you plan to just make it a global "remove everything" or do you think you'd like to offer the option of just removing specific folders?

@@ -204,6 +212,9 @@ def organizations_etl(self,
self.add_new_organizations()
self.add_new_posts()

if clear_downloads:
shutil.rmtree(self.organizations_folder)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reginafcompton Are you planning to add this to the other download folders, too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this question, otherwise looks regular to me!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i just saw your comment in the description! fire away with the other folders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all right - the way it should work: by default, import_data removes all files from all download directories. @evz - I think we should go folder-by-folder, rather than do a "global" remove-everything-everytime, in the event that an import fails and I need the downloads for debugging (or I simply do not want to wait for the script to download everything again).

@hancush - ready aim fire

@reginafcompton
Copy link
Contributor Author

reginafcompton commented Jan 29, 2019

I diverged slightly from my original solution. I implemented an optional keep_downloads argument.

The import for people expects downloaded data about organizations. (N.b., This has always been the case, but we did not have safeguards in place for handling the people endpoints in isolation). So, rather than deleting the downloads, folder-by-folder, as the import happens, the script just deletes everything at the end.

If someone is running a big import and wants to keep everything, then they can do that with the keep_downloads option.

Copy link
Contributor

@evz evz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants