FileStore Backend Support #17

chovanecm · 2017-01-28T19:10:28Z

Currently only mongodb is supported as a backend.
However sacred is able to use other databases as well. This could be supported in sacredboard as well.

cicobalico · 2017-03-10T18:00:30Z

This would be awesome, especially for the local backends like tinydb or file-storage

gideonite · 2017-05-17T17:10:03Z

I think the first step would be to decide upon a web api. Ideally, I think this would somehow be codified in the form of an abstract python class which would then be subclassed by each database backend.

Any thoughts? I'm happy to take a look through the current implementation and propose an interface, but I think it might be more appropriate for main contributors to do this kind of "deep design" thinking.

chovanecm · 2017-05-17T20:57:35Z

My idea was to make either an abstract class that would the PyMongoDataAccess inherit from (if that makes any sense in a language like Python where we can do "duck-typing" - but perhaps it does for the sake of intelligibility).
The data format accepted and returned by PyMongoDataAccess would be then considered as a reference for other implementations - which is basically what @gideonite suggested.
I've tried to document the format of parameters requested by the MongoDB backend and I should do the same with return values.
Unfortunately, I've prioritised other features and aspects of the project (such as decoupling of the HTML view from its model to support its testability), so I've "postponed" this - not mentioning that new features like filtering (currently in the development branch) could be more difficult to implement with backends like file-storage.

Of course, any help would be appreciated. Basically, the important methods of the current interface are

def get_runs(self, sort_by=None, sort_direction=None,
                 start=0, limit=None, query={"type": "and", "filters": []}):
def get_run(self, run_id):
def connect(self): # not as important for other implementations - only referenced in bootstrap.py

The approach Sacredboard currently uses is to initialize the data gateway and put it into the current_app.config["data"] variable in bootstrap.py
It's definitely not the best solution, but might be enough for now.

If anyone creates a proof-of-concept for other backends using the here provided hints, I'm happy to consult it, but cannot start developing it on my own now because of my other duties.

I was also thinking whether we might not try replacing (or extending?) the current pseudo-API with something nicer, such as GraphQL. But would be just a nice-to-have.

gideonite · 2017-05-28T12:06:23Z

I've implemented a basic interface called DataStorage and PyMongoDataAccess inherits from it. I've also started implementing a FileStorage class which simply does get_run and get_runs. I'm getting some weird DataTables errors though when I startup the web app. I assume this is because either:

1 - Because the data format doesn't jive with the mongodb format, or
2 - I haven't implemented any filtering.

To create a run I simply load the run.json and add set run["config"] = load_config("config.json").

At pointers at this point? Obviously it would be nice to get it to run. I'm also a noob when it comes to writing pythonic code so will happily take code style comments.

Here's the commit on my fork: gideonite@8cfac39

chovanecm · 2017-05-28T13:24:19Z

Thanks!
You can run sacredboard with debugging enabled (--debug) and navigate to 127.0.0.1:5000/api/runs
There you can see that the returned object has to implement the "count" method to see the total number of records (DataTables uses this for pagination etc).

One of the solutions to achieve it is to add this to datastorage.py:

class DataIterator():
    def __init__(self, count, iterable):
        self.iterable = iterable
        self._count = count

    def count(self):
        "The number of items"
        return self._count

    def __iter__(self):
        return iter(self.iterable)

and to modify your get_runs method:

 def get_runs(self, sort_by=None, sort_direction=None,
                 start=0, limit=None, query={"type": "and", "filters": []}):
        all_run_ids = os.listdir(self.path_to_dir)
        blacklist = set(["_sources"])

        def run_iterator():
            for id in all_run_ids:
                if id in blacklist:
                    continue
                yield self.get_run(id)
        return DataIterator(len([id for id in all_run_ids if id not in blacklist]), run_iterator())

But even then there is a problem as the loaded JSON does not contain information about data types. Therefore, date and times, such as the experiment start time and last heartbeat time, are being loaded as strings and they need to be converted to appropriate Python representation using

    datetime.datetime.strptime('2017-05-28T12:18:33.091144', '%Y-%m-%dT%H:%M:%S.%f')

I hope this should be enough for making it work (without sorting, pagination and filtering).

Regarding coding style, if you have the tox package installed, you can run
tox -e flake8
It's quite pedantic, so there is still a lot of work to do even in my code. When everything is fine (line breaks, spaces etc) it starts checking the documentation strings.

What I personally found useful is to write tests together with the main code. You can find inspiration in test_mongodb.py (It uses a library that mocks MongoDB, so the test behaves almost as if it was talking to a real database).

gideonite · 2017-05-31T19:24:30Z

Great feedback! Sorry for the delay in responding.

I've made some changes that (I think) addresses these issues. Namely, I created a FileStoreCursor object. This brings up another point which is that the object oriented programmer in me says to create another abstraction to match "DataStorage" called "DataCursor." I don't know if this is pythonic, but I don't know how else to guarantee that others that want to extend this code for their own datastore know what to implement.

I've had some troubles with tox because I use Anaconda instead of virtualenv. I couldn't find a developer wiki page and at this point it might be a good idea to create one. We can put some of the info that's been mentioned in this thread such as tox, --debug, etc. so that future developers can quickly see how to get started. I'm interested to know your development setup.

Here's the latest:
develop...gideonite:filestore

chovanecm · 2017-05-31T20:07:20Z

Great job!
Don't worry about the delay - I'm currently busy with my university anyway.
I'll try to have a look at it tomorrow (I'm in Europe), so now only in brief:

The solution with FileStoreCursor basically corresponds to my DataIterator proposed earlier. If you look at the code, it does not depend on the FileStorage - it is generic enough to be on the same level of abstraction as DataStorage because it's reusable also by the MongoDB layer and possibly by any other (let's name it DataIterator). <<<< UPDATE: DataIterator would be better than DataCursor
I had problems with Anaconda too. That's why I started using the Windows Subsystem for Linux (aka Bash for Windows) in Windows 10. But as it's a beta, it sometimes simply stops working. So perhaps a Linux virtual machine (e.g. in VirtualBox) is better.
I am sorry about the missing documentation. In fact, Sacredboard was created as a part of my master's thesis so I spent more time writing the thesis than a proper documentation :-( Chapters 6 and 7 are shortly describing the development environment and commands (now I noticed I didn't mention the --debug option).
In two weeks, I am going to defend the thesis and to take the "final state examination", which is necessary to graduate here in the Czech Republic. I probably won't have much time by then because I need to basically go through almost all courses I took to prepare for the exam.
But I was thinking about either creating a wiki page or generating Sphinx documentation (it is a kind of standard for Python projects). That's why I started with completing code documentation docstrings because they would become a part of the generated doc.

But I'm glad you asked, it should be probably one of my priorities now.

chovanecm · 2017-06-01T06:46:34Z

I tried it now. Nice that the list actually shows something.
Now I see I should definitely add tests for testing interaction with the backend (#40) so that developers on the backend side always know whether the data they return are in the expected format.

Basically, the object you are building with _create_run should correspond to the object key in the example "/api/run/id" in #40.
So I think in order for even the Run detail view to work, you need to set run["_id"] and run["info"].
Again, thanks for doing this. It is helping not only with the functionality but also realising what's necessary to document and test.

gideonite · 2017-06-02T16:42:05Z

Ah ok, now I understand a bit more about the context of the project. Makes sense.

I see your point now. Perhaps the more pythonic way of doing this, instead of heavy handedly forcing conformity via interfaces/abstract classes and the like, is to implement the required functionality in tests.

My only motivation for using the term "cursor" is to match the return type of the MongoDB interface.

Ok, I will make sure that the data type conforms to #40 and implement a test to make sure that the right keys are returned, etc. It will probably be some kind of integration test in which I create some experiments in /tmp and make sure the resulting json has the right keys.

chovanecm · 2017-06-02T17:29:55Z

It's OK to have abstract classes in Python where appropriate, such as the DataStorage class you created. It is good that it tells developers what methods they have to implement.
In the case of FileStoreCursor (aka DataIterator), I was just thinking the class could be put directly on the datastore level so that developers of other database backends can use it too.

When you are talking about integration tests, do you mean it would involve running sacred too? That would be amazing but if the test fails, it might not easy to tell whether the problem was caused by Sacred or Sacredboard. So I would see this as an additional level of testing.
I think testing of your module could be solved for instance by generating a few reports by Sacred, copying the directory e.g. to tests/data/fixtures/filestore and loading them from there.
You can perhaps also try to decouple loading the JSON files from building the Run objects as it looks like the latter part could be reused e.g. for the TinyDB backend (Maybe by extracting the _create_run method somewhere else. But once I have time again, I can try to design a better solution for it).

chovanecm added draft enhancement non-functional requirement labels Jan 28, 2017

chovanecm removed the draft label May 28, 2017

gideonite mentioned this issue Jun 5, 2017

abstract data interface #43

Merged

chovanecm changed the title ~~Different database backends~~ FileStore Backend Support Jun 30, 2017

chovanecm mentioned this issue Jun 30, 2017

Disable filtering capabilities for backends that don't support it #47

Open

chovanecm added the Epic label Jun 30, 2017

chovanecm closed this as completed Jan 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FileStore Backend Support #17

FileStore Backend Support #17

chovanecm commented Jan 28, 2017

cicobalico commented Mar 10, 2017

gideonite commented May 17, 2017

chovanecm commented May 17, 2017

gideonite commented May 28, 2017

chovanecm commented May 28, 2017

gideonite commented May 31, 2017

chovanecm commented May 31, 2017 •

edited

Loading

chovanecm commented Jun 1, 2017

gideonite commented Jun 2, 2017

chovanecm commented Jun 2, 2017 •

edited

Loading

FileStore Backend Support #17

FileStore Backend Support #17

Comments

chovanecm commented Jan 28, 2017

cicobalico commented Mar 10, 2017

gideonite commented May 17, 2017

chovanecm commented May 17, 2017

gideonite commented May 28, 2017

chovanecm commented May 28, 2017

gideonite commented May 31, 2017

chovanecm commented May 31, 2017 • edited Loading

chovanecm commented Jun 1, 2017

gideonite commented Jun 2, 2017

chovanecm commented Jun 2, 2017 • edited Loading

chovanecm commented May 31, 2017 •

edited

Loading

chovanecm commented Jun 2, 2017 •

edited

Loading