Conversation
|
Two comments about the code snippet:
I believed it is raised in the These comments are not related to your PR though. Just happen to show up in the code snippet. |
|
@rouille I'm familiar with that warning, it's something that's not very hard to track down and fix. |
|
I think the warning can be fixed by changing L.106 of transform_profile.py from: to: |
Nice, that works - no warning when I re-ran the same test. |
| filename = [os.path.basename(line.rstrip()) for line in stdout.readlines()] | ||
| version = [f[f.rfind("_") + 1 : -4] for f in filename] | ||
| return version | ||
| resp = requests.get(f"{BASE_URL}/version.json") |
There was a problem hiding this comment.
Do we want users to be able to use their own profiles that only exist locally? I'm thinking that in the previous Docker implementation, users could just add a new profile (like a new demand profile to test) to the ScenarioDatadirectory and be able to use it, whereas now they'll either need to add it to our blob storage or create their own.
There was a problem hiding this comment.
I think we are just downloading a JSON files enclosing the profile version here. Otherwise the logic that we had before is still in place, i.e., we only download if there is not local copy. Also in plug, I would say the user can have its own profile and use them in the various script via the set_base_profile method.
There was a problem hiding this comment.
What would get added to the ScenarioList if the user brought their own profile? "untracked"?
There was a problem hiding this comment.
Do we want users to be able to use their own profiles that only exist locally?
Good point, there isn't a natural way to use a custom profile if we are strictly checking the version.json in blob storage. I think we'd need to do something like cache that file locally so it can be edited, or check both a local copy and the one in blob storage, etc. The profiles can still be put in the ScenarioData folder locally, but they would have a version that is considered valid to actually use it.
There was a problem hiding this comment.
What would get added to the ScenarioList if the user brought their own profile? "untracked"?
Yeah it would get added to the user's ScenarioList. So their environment would be consistent, but couldn't be easily "merged" with another one (e.g. our server environment). I was going to suggest something like adding full references to files in the scenario list, but can't do much if the sources aren't public, e.g. in some other blob storage. Not sure if this is an issue at this point?
There was a problem hiding this comment.
Since DataAccess differentiates between configurations, we could add something like data_access.get_local_profiles(kind) that returns nothing for SSHDataAccess but returns any additional profiles for LocalDataAccess. We could call and append the results in InputData.
Depending on how fancy we want to get, we could also pass in the profiles already found in blob storage to both a) mark which profiles have been cached if a user wants to be careful of their internet usage and b) filter out only the extra profiles not in blob storage.
There was a problem hiding this comment.
Cool, I ended up thinking the same thing (my other ideas didn't pan out). Not sure I understand the part about passing in the profiles though?
There was a problem hiding this comment.
I was first thinking of passing in the profiles like data_access.get_local_profiles(kind, blob_profiles), but looking at it now, there's no reason we can't do that filtering/marking at the InputData level instead of at the DataAccess level.
There was a problem hiding this comment.
Actually, sorry, I remembered what I was thinking. You would want to pass in the blob profiles so that in the SSHDataAccess instance, you could return the locally cached profiles but exclude anything not in the blob profiles list. In the LocalDataAccess instance, you would return both cached and local-only profiles.
There was a problem hiding this comment.
Ah thanks, that makes sense. Let me push a commit with how I'm thinking of implementing that.
|
I think with 1f9291f, we have the best of both worlds. Did you have a chance to test it with both the client/server installation and the containers? |
So far only a quick test in the container, before and after adding a local version.json. That worked as expected, and the client/server behavior should be a subset of that (only checks blob storage). Planning to do a bit more, and look into adding unit tests, which was the original reason for creating the |
| raise NotImplementedError | ||
|
|
||
| def get_profile_version(self, grid_model, kind): | ||
| return ProfileHelper.get_profile_version_cloud(grid_model, kind) |
There was a problem hiding this comment.
We implicitly return the versions stored on the cloud. I see how it is useful but is that intuitive in the DataAccess class? Should we return instead an instance of ProfileHelper and call the functions in the child classes?
There was a problem hiding this comment.
I was thinking since blob storage is the source of truth, it makes sense as the default. Another way would be to raise NotImplemented here and have the child classes contain their own implementations (SSHDataAccess is just inheriting this, while LocalDataAccess appends the local versions if there are any). Would that be more intuitive?
There was a problem hiding this comment.
You are right. It makes sense it is the default.
| version = scenario_info["base_" + field_name] | ||
| file_name = field_name + "_" + version + ".csv" | ||
| grid_model = scenario_info["grid_model"] | ||
| from_dir = f"raw/{grid_model}" |
There was a problem hiding this comment.
Does this forward slash cause issues for Windows users when saving to a local directory (like in line 25)?
There was a problem hiding this comment.
Good call, I should probably do an os.path.join here
There was a problem hiding this comment.
I should probably get this on my personal Windows machine to test, but I think we might run into problems calling the blob storage then. We're using the same from_dir for both the url and local directory in the download_file method.
There was a problem hiding this comment.
D'oh, will fix. We can look into automating windows tests, maybe in github actions or using windows containers.
| :return: (*list*) -- available profile version. | ||
| """ | ||
|
|
||
| version_file = os.path.join(server_setup.LOCAL_DIR, "version.json") |
There was a problem hiding this comment.
Would it be easier just to list the files in the directory and filter out the ones that match the {kind}_{version}.csv format? Then again, making a user add a new profile to version.json is more intentional, so they're less likely to accidentally add something without understanding the risks.
There was a problem hiding this comment.
Yeah this was kind of a trade off - it's less code to reuse the json format and provides at least one way for a user to customize. Figured it's ok for now, but definitely open to future improvements.
There was a problem hiding this comment.
Makes sense! We could also probably get some feedback from our users/collaborators to see what they think about usability. But agreed, this looks good for now.
rouille
left a comment
There was a problem hiding this comment.
Thanks. This is a great feature.
Purpose
Use blob storage as the source for profiles, downloading as needed.
What the code is doing
Added a version.json file to support listing available versions. The profiles are in the
raw/usa_tamufolder in the blob container to mimic the structure elsewhere. Similar to before, we download if it doesn't exist locally.Testing
Deleted local copies of the profiles and ran a simulation using plug, which shows the files being downloaded (during the call to
prepare_simulation_input).Usage Example/Visuals
Example query (no change in usage, other than switching to static method)
Snippet of files being downloaded
Time estimate
20 mins