Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core library can't discover derived treants by itself. #100

Closed
kain88-de opened this issue Aug 29, 2016 · 13 comments · Fixed by #145
Closed

core library can't discover derived treants by itself. #100

kain88-de opened this issue Aug 29, 2016 · 13 comments · Fixed by #145

Comments

@kain88-de
Copy link
Contributor

Objects inheriting from a Treant define their own state files. This leads to inconsistencies. For example
if I have a folder with a Sim object I can't discover it until I imported MDSynthesis. Then the following python session can occur.

import datreant.core as dtr

dtr.discover('.')  # empty bundle

import mdsynthesis

dtr.discover('.')  # finds a  sims now.

I can now treat the bundle like any other bundle just some mdsynthesis specific operations are missing.

It feels weird to me in the usage that the return value of dtr.discover depends on the global modules that I have loaded.

This is a problem in datreant.cli where I would have to import all packages that extend datreant.core to find their respective state file, this means it would only work with official datreant packages.

@dotsdl
Copy link
Member

dotsdl commented Aug 29, 2016

@kain88-de I don't see an easy alternative to the existing behavior, since in order for discover to know what class definition to use for, say, a Sim.<uuid>.json state file, that definition needs to exist, so mdsynthesis must get imported somewhere.

Under the current behavior Treant subclasses will automatically be visible to datreant.core.discover upon being defined, but not before. Do you have ideas for how this could be different and satisfy the needs of the CLI?

@kain88-de
Copy link
Contributor Author

For me the bigger problem besides the CLI tools is that the return of dtr.discover depends on what I have imported even though the underlying filesystem hasn't been changed. I have come to really like the idea that a function always returns the same output given the same input.

@kain88-de
Copy link
Contributor Author

When I understand correctly datreant was though of a package that can be used to
create a database on top of the filesystem. To create a entry in the database I
have to create a Treant. In that object I can add tags and categories to help me
sort in the entries. With limbs it is also possible data that is specially
processed. This is actually already everything that is needed to create a
database in my view. All of this also works fine if there is only a single
state-file and only datreant.core interacts with the state file.

For me this makes datreant similar to other database software like
SQL-implementations. Like with SQL I can now implement other applications on top
of that by using the database.

But datreant is designed to change the database implementation by introducing
new state-files. That design makes it hard to write general tools to interact
with the database. Only the specialized applications will ever be able to
interact with a database. So with the datreant.cli I can only interact with the
pure Treant objects, if someone wants to also interact with the database created
by an application he has to rewrite that by himself again.

Another issue is that it can now be possible to create several state files in
one folder. This is like creating several databases in the same file-system sub
folders. That might cause problems, as described here. Besides that I can still interact with the other state files when I expiicitely load the other state file. This is very confusing.

@kain88-de
Copy link
Contributor Author

A short description might be that datreant tries to be a tool and low-level library to create a tool at the same time. This is technically possible to implement but leads to confusing behavior in the end.

Is there a technical/practical reason why a Sim object creates it's own state file? Why couldn't it just create a Treant file as well and store special hidden attributes to define that it is a sim object?

@orbeckst
Copy link
Contributor

I don't know the internals of mds well enough but I do like @kain88-de 's reasoning: just use treant state files and throw special stuff into a "top level" "mdsynthesis" attribute in the json. dtr.core only accesses a subset and ignores everything else. Higher level tools like mds take the core stuff and their own things.

@orbeckst
Copy link
Contributor

orbeckst commented Aug 29, 2016

I don't see an easy alternative to the existing behavior, since in order for discover to know what class definition to use for, say, a Sim..json state file, that definition needs to exist, so mdsynthesis must get imported somewhere.

I think this would switch to duck-typing:

S = mds.Sim("name")

will throw an exception if the treant json file does not contain the stuff that it expects from a Sim. If directory "name" is empty, create a Sim. Maybe add force=True kw so that it adds Sim specific stuff to an existing state file instead of throwing an error.

Not sure how this would interact with current workflows.

In contrast,

T = datreant.core.Treant("name")

will always work; T will not make anything available that does not belong to a vanilla treant.

@dotsdl
Copy link
Member

dotsdl commented Sep 21, 2016

After a discussion with @orbeckst (a while ago), here's a possible solution. There may be issues with it, but I think it's a clean resolution to this and partially to #101. It has a few aspects:

  1. Instead of state files having the first part of their name corresponding to the type of Treant they were originally created to be, we could just make them all Treant.<uuid>.json. Every Treant subtype is literally a Treant, not just internally in the state file, but externally by way of its name.
  2. Using datreant.core.discover() always returns Treant objects. It doesn't matter what else has been imported.
  3. Using mdsynthesis.discover() always returns Sim objects, these being directories with a Treant.<uuid>.json file inside. Internally in the state file, mdsynthesis adds the entries in the JSON dict it needs to store Sim-related things, and this won't conflict with any other packages also used on the same Treant. Likewise, using mdsynthesis.Bundle only ever gives Sims.
  4. We get rid of datreant.core.Group, because it has no clear place under this scheme, and to my knowledge makes little sense today compared to just using Bundle.

Are there holes in this idea? One problem is that it means specialized packages can only implement one type of specialized Treant (e.g. Sim) if they want to simply have their own versions of datreant.core.Bundle and datreant.core.discover without additional backflips. Not sure if there are other negatives.

@kain88-de
Copy link
Contributor Author

Instead of state files having the first part of their name corresponding to the type of Treant they were originally created to be, we could just make them all Treant..json. Every Treant subtype is literally a Treant, not just internally in the state file, but externally by way of its name.

Like it

Using datreant.core.discover() always returns Treant objects. It doesn't matter what else has been imported.

Also like it.

Using mdsynthesis.discover() always returns Sim objects, these being directories with a Treant..json file inside. Internally in the state file, mdsynthesis adds the entries in the JSON dict it needs to store Sim-related things, and this won't conflict with any other packages also used on the same Treant. Likewise, using mdsynthesis.Bundle only ever gives Sims.

I don't understand this. So a mds.discover() will ONLY find Sim objects and not generic Treants?
Or does it add the mdsynthesis fields to every Treant it finds?

We get rid of datreant.core.Group, because it has no clear place under this scheme, and to my knowledge makes little sense today compared to just using Bundle.

Never used it.

Generally like the idea but I think we should have some discussions about the changes needed for the state file and check if we they match all our requirements how we plan Treants and specialized Treants to behave.

@kain88-de
Copy link
Contributor Author

Using mdsynthesis.discover() always returns Sim objects, these being directories with a Treant..json file inside. Internally in the state file, mdsynthesis adds the entries in the JSON dict it needs to store Sim-related things, and this won't conflict with any other packages also used on the same Treant. Likewise, using mdsynthesis.Bundle only ever gives Sims.

So this will automatically convert any treant found into a special sim treant? Not a bad idea. How does the state file ensure then that different special extensions don't interfere with one another? This would actually be nice so I can so assuming I have another specialization called Foo I could do a foo.discover() and mds.discover to get a view of the same Treants with the same categories and tags but slightly different behavior.

@orbeckst
Copy link
Contributor

orbeckst commented Oct 3, 2016

Oliver Beckstein
email: orbeckst@gmail.com

Am Oct 2, 2016 um 13:44 schrieb Max Linke notifications@github.com:

How does the state file ensure then that different special extensions don't interfere with one another?

Special treats have one top level entry that acts as their reserved namespace. For instance, MDSynthesis stores all its own data under 'mdsynthesis'. A package foo would claim top level 'foo'.

@dotsdl
Copy link
Member

dotsdl commented Feb 21, 2017

Working on this now. I'm going to cut a 0.7.1 release with the current state of develop, then make a PR removing Group and slimming down the library. This will include everything needed to solve this issue, short of changes that need to happen elsewhere (like in MDSynthesis).

@kain88-de
Copy link
Contributor Author

Btw do we have a solution when I want to find only existing Sims? For when I do a discover in a folder with mixed Treant and Sim objects in them. Now I only want to have the sim-object. I guess that would be a g.groupby('_hidden_node_types')['Sim'] operation or?

@dotsdl
Copy link
Member

dotsdl commented Feb 22, 2017

Under the new scheme described in datreant/MDSynthesis#65, mdsynthesis will have its own discover method that will only return a Bundle with Sim objects, and these would only be Treants with the mdsynthesis namespace already present in their state files. This will give you the result you want.

By contrast, datreant.core.discover will give you any and all Treants it finds, regardless of their state file contents.

Something we could add to the Treant class is something like a statespaces property that gives back all the top-level names present in the state file. Alternatively we could expose a property that returns the fully deserialized state file data structure so that one could use the information inside however one likes, but not sure if this could encourage users to do things by digging into the internals of the state files instead of relying on API. Not sure of the best approach here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Release 0.8.0
In discussion
Development

Successfully merging a pull request may close this issue.

3 participants