Skip to content

Create an abstraction for the SSH connection to the data store#330

Merged
danielolsen merged 4 commits intodevelopfrom
daniel/abstract_ssh
Nov 13, 2020
Merged

Create an abstraction for the SSH connection to the data store#330
danielolsen merged 4 commits intodevelopfrom
daniel/abstract_ssh

Conversation

@danielolsen
Copy link
Copy Markdown
Contributor

@danielolsen danielolsen commented Nov 12, 2020

Purpose

Create an abstraction for data transfer between userspace and a central 'data store'. Previously, we had one data store on a remote server, which we interact with via a SSH connection, but we want to abstract this to be able to easily handle either a remote data store or a local one. See https://github.com/Breakthrough-Energy/RenewableEnergyProject/issues/318

What the code is doing

  • In powersimdata/utility/transfer_data.py: creating a new abstract DataAccess class and a SSHDataAccess class that implements the methods needed to interact with the server.
  • In powersimdata/scenario/scenario.py, refactoring the Scenario object to init with a SSHDataAccess object, rather than a direct SSH connection. This sets us up for a logic branch in the init to use either the SSHDataAccess object or the planned local equivalent, with the method of controlling which one gets created TBD (environment variable, configuration file, etc).
  • Refactoring our existing objects to accept a DataAccess object and use its methods rather than accepting an SSH connection object.
    • powersimdata/input/input_data.py
    • powersimdata/input/transform_profile.py
    • powersimdata/output/output_data.py
    • powersimdata/scenario/analyze.py
    • powersimdata/scenario/create.py
    • powersimdata/scenario/delete.py
    • powersimdata/scenario/execute.py
    • powersimdata/scenario/move.py
    • powersimdata/scenario/state.py
  • Updating our tests to use the new objects/methods:
    • powersimdata/input/tests/test_transform_profile.py
    • powersimdata/utility/tests/test_transfer_data.py

Testing

Existing unit tests pass when calling python -m pytest -m "not db" (this includes "integration"-marked tests). End-to-end tests will be performed once we merge/rebase the changes to the ScenarioListManager and ExecuteListManager into this branch.

Time estimate

Half an hour. The changes are mostly small, but they are in a lot of different places. We may also want to re-organize things, since much of what's currently in powersimdata/utility could maybe fit more logically in powersimdata/data_access.

Comment thread powersimdata/scenario/delete.py Outdated
Comment thread powersimdata/utility/transfer_data.py Outdated
Comment thread powersimdata/scenario/delete.py Outdated
Comment thread powersimdata/scenario/delete.py Outdated
Comment thread powersimdata/scenario/execute.py Outdated
Comment thread powersimdata/scenario/move.py Outdated
Comment thread powersimdata/scenario/move.py Outdated
Comment thread powersimdata/scenario/move.py Outdated
@danielolsen danielolsen force-pushed the daniel/abstract_ssh branch 4 times, most recently from 2cb6b63 to 75ebf14 Compare November 12, 2020 23:28
@danielolsen danielolsen marked this pull request as ready for review November 12, 2020 23:37
@danielolsen
Copy link
Copy Markdown
Contributor Author

I have tested this end-to-end. Scenarios can be created, prepared, launched, extracted, and deleted successfully.

Copy link
Copy Markdown
Collaborator

@rouille rouille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks.

Copy link
Copy Markdown
Collaborator

@BainanXia BainanXia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An integration test as @danielolsen did on my end has been passed. One minor thing to mention: with the current implementation, every time we check status of a scenario, a copy_from of the file 'ExecuteList.csv' is ran. The logic behind this is here as pointed out by @danielolsen . One concern is if the file grows big in the future, we may allow the user to choose whether download a fresh copy of the most up-to-date version of the table or simply check the status of a specific entry from the table on the server that the user is interested in at that time. This is definitely out of the scope of this PR.

@danielolsen
Copy link
Copy Markdown
Contributor Author

danielolsen commented Nov 13, 2020

Another dream feature: a simple interface to clear the ScenarioData folder. When analyzing lots of scenarios, the size of that folder grows really fast. Much faster than a CSV file ;)

@rouille
Copy link
Copy Markdown
Collaborator

rouille commented Nov 13, 2020

Another dream feature: a simple interface to clear the ScenarioData folder.

We can have in your object a clean_data method that uses the os library to rm -rf in ScenarioData

Then: s.data_access.clean_data()

@dmuldrew
Copy link
Copy Markdown
Collaborator

dmuldrew commented Nov 13, 2020

Outside of a docker container, I'd be concerned about the rm -rf deleting unintended files due to a bug, user error, etc. Unless we deliberately hide our cache folder, outside users might add other files to the ScenarioData folder as well.

@danielolsen
Copy link
Copy Markdown
Contributor Author

@rouille @dmuldrew check out the latest commit. I think I implemented this in a pretty foolproof way.

from powersimdata.utility.transfer_data import DataAccess
da = DataAccess()
da.clear_local_cache()

@rouille
Copy link
Copy Markdown
Collaborator

rouille commented Nov 13, 2020

@rouille @dmuldrew check out the latest commit. I think I implemented this in a pretty foolproof way.

from powersimdata.utility.transfer_data import DataAccess

da = DataAccess()

da.clear_local_cache()

I was proposing something in that vein. I might have scared @dmuldrew with rm -rf

@BainanXia
Copy link
Copy Markdown
Collaborator

BainanXia commented Nov 13, 2020

@danielolsen Is the clear cache feature mandate or optional? Sometimes, we would like to keep those files for fast access. It would be great to have keyword (scenario number)options in the file names that we would like to remove (I'm just sitting here and dreaming ;)).

@rouille
Copy link
Copy Markdown
Collaborator

rouille commented Nov 13, 2020

@danielolsen Is the clear cache feature mandate or optional? Sometimes, we would like to keep those files for fast access, right?

You need to invoke it.

@danielolsen
Copy link
Copy Markdown
Contributor Author

@BainanXia It is a method that is never called in our codebase. Just a convenient helper function.

@dmuldrew
Copy link
Copy Markdown
Collaborator

If it's optional, I suppose that's alright. It'd be sad to hear from people that our code unexpectedly removed files

@danielolsen danielolsen merged commit 4c2fdbe into develop Nov 13, 2020
@danielolsen danielolsen deleted the daniel/abstract_ssh branch November 13, 2020 02:40
@ahurli ahurli mentioned this pull request Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants