-
Notifications
You must be signed in to change notification settings - Fork 3
First ever pull request #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,16 +5,23 @@ | |
from urllib.request import urlopen | ||
|
||
|
||
def download(inventory, dest_dir, max_download=4): | ||
"""Given an inventory file for a particular day, and a destination | ||
directory, download all files in the inventory that are not | ||
def download(inventory, dest_dir = os.path.join(os.getcwd(),'Data'), max_download=4): | ||
"""Given an inventory file (or dataframe as returned by inventory) | ||
for a particular day, and a destination directory (default to current | ||
Comment on lines
+9
to
+10
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah... so you are trying to make this function polymorphic. That makes the type of inventory variable. The rationale for the old design was that we always wanted the inventory in a file because that allows one inventory fetch to cause many file downloads (for backfill). Are you saying that you want to merge the inventory and download steps? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just wanted to have it be more flexible, I have been effectively merging the inventory and download steps while I was playing around with it. I didn't think what I added limited the ability for one inventory fetch to have many downloads to a file. |
||
directory\Data), download all files in the inventory that are not | ||
already in the destination directory with the same size. | ||
|
||
While at it, don't download more than `max_download` files in this | ||
call. | ||
""" | ||
|
||
try: | ||
inv_df = feather.read_feather(inventory) | ||
except: | ||
feather.write_feather(inventory, 'temp.feather') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This line confuses me. I thought that inventory holds a file name and inv_df contains the (possibly empty) data structure. Here, you seem to be writing the file name to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When I first ran the download I tried to pass some non-"feathered" data, This was to check if it was feathered and if not feather it. This just made for a smoother transition so don't have to make sure the feathering is in your script, it will do it for you. |
||
inv_df = feather.read_feather('temp.feather') | ||
os.remove('temp.feather') | ||
|
||
inv_df = feather.read_feather(inventory) | ||
downloads = 0 | ||
|
||
if not os.path.exists(dest_dir): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our data is already in a lower-case
data
directory. Should we change that name (and all current workflows) or should we change this?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this, The object of this was to just add the data directory as the default