Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Folder Structure for Source and Munged under siegetank #6

Open
jhprinz opened this issue Oct 5, 2014 · 13 comments
Open

Folder Structure for Source and Munged under siegetank #6

jhprinz opened this issue Oct 5, 2014 · 13 comments

Comments

@jhprinz
Copy link
Collaborator

jhprinz commented Oct 5, 2014

I got a little lost in the typical FAH hierarchical structure/project organization:
So that is the default or preferred folder structure for fah and siegetank projects?

In the siegetank API tutorial to sync folder structure is:

target_folder/
    <stream0.id>_data/
        <part_0.frame>/
            frames.xtc
        <part_1.frame>/
        <part_2.frame>/
        ...
    <stream1.id>_data/
    <stream2.id>_data/
    ...

On top of this I fould the idea of RUNS which seem to make sense for FAH but since siegetank is very flexible maybe the meaning has changed. I assume that either

  • A project is in siegetank called a target
  • A target has several simulations called streams
  • Streams are sorted into RUNS

or

  • A project has several targets and
  • Each target has streams which represent simulations of the exact same .pdb / topology
  • Streams are sorted into RUNS that represent ??? Different stages in the setup process until the simulation goes to production?

So, I propose for the siegetank-synced (unprocessed) folder structure

<target.short_name>_<target.id>_/
    RUNS/
        RUN0_<run0.short_name>/
            STREAM0_<stream0.id>/
                <part_0.first_frame>/
                    frames.xtc
                    ...
                <part_1.first_frame>/
                <part_2.first_frame>/
                ...
            STREAM1_<stream1.id>/
            STREAM2_<stream2.id>/
    ...

where target.short_name represents the project name as in FAH projects. This way it is similar to the FAH order RUN##/CLONE##/ but contains useful extra information like the stream UUID

Then for the munged folder in /data/choderalab/fah/munged/

<target.id>_<target.short_name>/
    all-atoms/
        run0-stream0_<stream0.id>.h5
        run0-stream1_<stream1.id>.h5
        ...
        run1-stream0_<stream0.id>.h5
        ...
    no-solvent/
        run0-stream0_<stream0.id>.h5
        run0-stream1_<stream1.id>.h5
        ...
        run1-stream0_<stream0.id>.h5
        ...

We can also

  • remove all ids which is more compatible, but less human-readable.
  • exchange STREAM for CLONE to be more compatible
@kyleabeauchamp
Copy link
Collaborator

I think:

Project = Target

Each pair of (run, clone) is a single stream. I don't think will be an automatic staging process on ST.

Some of these questions cannot be fully resolved until ST implements more features from FAH (E.g. points), as that will play a role in how things are set up and organized.

@jhprinz
Copy link
Collaborator Author

jhprinz commented Oct 5, 2014

I agree:

Target should be a Project and it already contains the basic information like a description. Then simulations / stream are attached which are not organized in any way.
This means, we can (for now) impose one without interference. The problem is that internally this might become a little messy like having all files for RUNS/CLONES in one folder.

I see that this might change if the organization of ST changes.

So, for now I would keep the RUN / STREAM ordering.

What is the actual idea of RUNS in FAH? Where these meant for several iterations or for the test phase, etc?

@kyleabeauchamp
Copy link
Collaborator

In FAH, RUNS correspond to different starting conformations. CLONES refer to different velocities.

@kyleabeauchamp
Copy link
Collaborator

Also, in FAH, one is generally supposed to ensure that the different RUNS have the same number of atoms / topology / etc. Otherwise, the points will vary between the RUNS.

@jhprinz
Copy link
Collaborator Author

jhprinz commented Oct 5, 2014

Luckily we do not have these restrictions. All streams can be totally different which means we have to be more careful staying organized.

What are points in FAH?

@kyleabeauchamp
Copy link
Collaborator

Whenever possible, we may still want to enforce these restrictions, because consistency with FAH is important.

FAH workunits award points to donors. It is the currency for doing our computations.

@jhprinz
Copy link
Collaborator Author

jhprinz commented Oct 5, 2014

Okay "points", I thought about point like checkpoints...

Wasn't the idea of siegetank to be more flexible? It seemed quite useful, but we don't want to break compatibility. That would make more harm than good...

@kyleabeauchamp
Copy link
Collaborator

My point is that eventually siegetank is going to be plugged into FAH, so we need to adopt procedures that will be compatible with FAH operation.

@jhprinz
Copy link
Collaborator Author

jhprinz commented Oct 5, 2014

Okay, then we should really wait, once there are more features in ST. For now I will start building something that we can use and adapt later. Changes should be easily made.

@kyleabeauchamp
Copy link
Collaborator

I agree. I was just saying that we should avoid creating excessive heterogeneity within different streams of a single target, as that's "allowed but undesirable" within the current ST API.

All I'm saying is don't use a single target to simulate both HP35 and src kinase, as that may cause issues down the road.

@VijayPande
Copy link

Several is plugged into fah via the latest client

Thanks,

Vijay

Sent from my phone. Sorry for any brevity or unusual tone.

On Oct 5, 2014, at 3:14 PM, kyleabeauchamp notifications@github.com wrote:

My point is that eventually siegetank is going to be plugged into FAH, so we need to adopt procedures that will be compatible with FAH operation.


Reply to this email directly or view it on GitHub.

@jchodera
Copy link
Member

jchodera commented Oct 5, 2014

Oh! Is the latest client being rolled out already?

I think everyone in the lab is excited for how much easier it is to
programmatically set up and manage ST jobs.

@VijayPande
Copy link

PS THe latest client is under testing still. We can push on Joe and Yutong on that one to push it out.

Thanks,
Vijay

Sent from my Phone. Sorry for the brevity or unusual tone.

On Oct 5, 2014, at 3:29 PM, Vijay S. Pande pande@stanford.edu wrote:

Several is plugged into fah via the latest client

Thanks,

Vijay

Sent from my phone. Sorry for any brevity or unusual tone.

On Oct 5, 2014, at 3:14 PM, kyleabeauchamp notifications@github.com wrote:

My point is that eventually siegetank is going to be plugged into FAH, so we need to adopt procedures that will be compatible with FAH operation.


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants