Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions pipeline/mrms_download.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,23 @@
from urllib.request import urlopen


def download(inventory, dest_dir, max_download=4):
"""Given an inventory file for a particular day, and a destination
directory, download all files in the inventory that are not
def download(inventory, dest_dir = os.path.join(os.getcwd(),'Data'), max_download=4):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our data is already in a lower-case data directory. Should we change that name (and all current workflows) or should we change this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this, The object of this was to just add the data directory as the default

"""Given an inventory file (or dataframe as returned by inventory)
for a particular day, and a destination directory (default to current
Comment on lines +9 to +10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... so you are trying to make this function polymorphic.

That makes the type of inventory variable. The rationale for the old design was that we always wanted the inventory in a file because that allows one inventory fetch to cause many file downloads (for backfill).

Are you saying that you want to merge the inventory and download steps?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to have it be more flexible, I have been effectively merging the inventory and download steps while I was playing around with it. I didn't think what I added limited the ability for one inventory fetch to have many downloads to a file.

directory\Data), download all files in the inventory that are not
already in the destination directory with the same size.

While at it, don't download more than `max_download` files in this
call.
"""

try:
inv_df = feather.read_feather(inventory)
except:
feather.write_feather(inventory, 'temp.feather')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line confuses me. I thought that inventory holds a file name and inv_df contains the (possibly empty) data structure. Here, you seem to be writing the file name to temp.feather

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first ran the download I tried to pass some non-"feathered" data, This was to check if it was feathered and if not feather it. This just made for a smoother transition so don't have to make sure the feathering is in your script, it will do it for you.

inv_df = feather.read_feather('temp.feather')
os.remove('temp.feather')

inv_df = feather.read_feather(inventory)
downloads = 0

if not os.path.exists(dest_dir):
Expand Down