Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how do you go from a bed file to a segmentation? #3

Open
nsheff opened this issue Nov 15, 2018 · 7 comments

Comments

@nsheff
Copy link
Member

commented Nov 15, 2018

One thing to think about is coming with a toolset to help people map bed files onto existing segmentations.

@nsheff nsheff transferred this issue from another repository Nov 27, 2018

@oddodaoddo

This comment has been minimized.

Copy link
Collaborator

commented Nov 27, 2018

In my mind the provider should have a set of API calls that do this - the hub is just a conduit. In essence, some kind of arithmetic on segmentations that gets fed bed files. The result of such a call would be "coverage percentage" or some other metric of fit of bed file vs segmentation(s). In fact, there could be a separate API that "picks" the best segmentation for a bed file and returns the "coverage metric".

@nsheff

This comment has been minimized.

Copy link
Member Author

commented Nov 28, 2018

yeah, that's great.

The hub could let you do this across a set of distributed segmentation providers.

So, this specifies a query we need to implement that is specific to segmentation providers (not necessary for data providers that do not provide segmentations).

We could also consider (maybe via user option) allowing it to select mix-and-match segments, from various segmentations, or something like that.

@oddodaoddo

This comment has been minimized.

Copy link
Collaborator

commented Nov 28, 2018

I think this ticket should have an equivalent in the episb-provider section/repo? We/I should sit down and come up with a set of APIs that could be callable on the provider to give it real functionality. The above is one example.

@nsheff

This comment has been minimized.

Copy link
Member Author

commented Nov 29, 2018

Referred to as the match task in: databio/episb-provider#9

@nsheff

This comment has been minimized.

Copy link
Member Author

commented Dec 11, 2018

from @oddodaoddo:

To me, a scientist could start with an experiment, which is basically a set of regions with annotation values attached to each region.

On the opposite side, our system has sets of regions (segmentation providers) but it also should store experiments already done, which are annotation values linked to regions (or linking to segmentation providers regions?). At some level we need to allow retrieval of raw/real experiments, should someone want to do something with these experiments against their own experiment(s). We could "normalize" this by pretending that an experiment is annotation values linking to segmentation providers regions but that could distort/hide/change the actual "real" experiment that happened. Essentially the difference between storing the real experiment where annotation value A1 came with region R1 (e.g. 1000-2000) and storing the "normalized" experiment where annotation value A1 is linked to segmentation provider's accepted region RA (e.g. 1100-1800).

@nsheff

This comment has been minimized.

Copy link
Member Author

commented Dec 11, 2018

This brings up a new question: should we also store non-normalized regions? To me the answer is no.

The "real" concept isn't really a good way to think about it... all the experiments are approximations anyway. The mapping to the segmentation regions should be considered the real regions once it's done.

@nsheff nsheff reopened this Jan 29, 2019

@oddodaoddo

This comment has been minimized.

Copy link
Collaborator

commented Jan 29, 2019

Wow, github picks up on ANY #number reference and links automatically...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.