-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileStore Backend Support #17
Comments
This would be awesome, especially for the local backends like tinydb or file-storage |
I think the first step would be to decide upon a web api. Ideally, I think this would somehow be codified in the form of an abstract python class which would then be subclassed by each database backend. Any thoughts? I'm happy to take a look through the current implementation and propose an interface, but I think it might be more appropriate for main contributors to do this kind of "deep design" thinking. |
My idea was to make either an abstract class that would the PyMongoDataAccess inherit from (if that makes any sense in a language like Python where we can do "duck-typing" - but perhaps it does for the sake of intelligibility). Of course, any help would be appreciated. Basically, the important methods of the current interface are def get_runs(self, sort_by=None, sort_direction=None,
start=0, limit=None, query={"type": "and", "filters": []}):
def get_run(self, run_id):
def connect(self): # not as important for other implementations - only referenced in bootstrap.py The approach Sacredboard currently uses is to initialize the data gateway and put it into the If anyone creates a proof-of-concept for other backends using the here provided hints, I'm happy to consult it, but cannot start developing it on my own now because of my other duties. I was also thinking whether we might not try replacing (or extending?) the current pseudo-API with something nicer, such as GraphQL. But would be just a nice-to-have. |
I've implemented a basic interface called DataStorage and PyMongoDataAccess inherits from it. I've also started implementing a FileStorage class which simply does 1 - Because the data format doesn't jive with the mongodb format, or To create a run I simply load the At pointers at this point? Obviously it would be nice to get it to run. I'm also a noob when it comes to writing pythonic code so will happily take code style comments. Here's the commit on my fork: gideonite@8cfac39 |
Thanks! One of the solutions to achieve it is to add this to datastorage.py: class DataIterator():
def __init__(self, count, iterable):
self.iterable = iterable
self._count = count
def count(self):
"The number of items"
return self._count
def __iter__(self):
return iter(self.iterable) and to modify your get_runs method: def get_runs(self, sort_by=None, sort_direction=None,
start=0, limit=None, query={"type": "and", "filters": []}):
all_run_ids = os.listdir(self.path_to_dir)
blacklist = set(["_sources"])
def run_iterator():
for id in all_run_ids:
if id in blacklist:
continue
yield self.get_run(id)
return DataIterator(len([id for id in all_run_ids if id not in blacklist]), run_iterator()) But even then there is a problem as the loaded JSON does not contain information about data types. Therefore, date and times, such as the experiment start time and last heartbeat time, are being loaded as strings and they need to be converted to appropriate Python representation using datetime.datetime.strptime('2017-05-28T12:18:33.091144', '%Y-%m-%dT%H:%M:%S.%f') I hope this should be enough for making it work (without sorting, pagination and filtering). Regarding coding style, if you have the tox package installed, you can run What I personally found useful is to write tests together with the main code. You can find inspiration in test_mongodb.py (It uses a library that mocks MongoDB, so the test behaves almost as if it was talking to a real database). |
Great feedback! Sorry for the delay in responding. I've made some changes that (I think) addresses these issues. Namely, I created a FileStoreCursor object. This brings up another point which is that the object oriented programmer in me says to create another abstraction to match "DataStorage" called "DataCursor." I don't know if this is pythonic, but I don't know how else to guarantee that others that want to extend this code for their own datastore know what to implement. I've had some troubles with Here's the latest: |
Great job!
But I'm glad you asked, it should be probably one of my priorities now. |
I tried it now. Nice that the list actually shows something. Basically, the object you are building with |
Ah ok, now I understand a bit more about the context of the project. Makes sense. I see your point now. Perhaps the more pythonic way of doing this, instead of heavy handedly forcing conformity via interfaces/abstract classes and the like, is to implement the required functionality in tests. My only motivation for using the term "cursor" is to match the return type of the MongoDB interface. Ok, I will make sure that the data type conforms to #40 and implement a test to make sure that the right keys are returned, etc. It will probably be some kind of integration test in which I create some experiments in |
It's OK to have abstract classes in Python where appropriate, such as the DataStorage class you created. It is good that it tells developers what methods they have to implement. When you are talking about integration tests, do you mean it would involve running sacred too? That would be amazing but if the test fails, it might not easy to tell whether the problem was caused by Sacred or Sacredboard. So I would see this as an additional level of testing. |
Currently only mongodb is supported as a backend.
However sacred is able to use other databases as well. This could be supported in sacredboard as well.
The text was updated successfully, but these errors were encountered: