Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for functional dependencies within a variable table #471

Closed
alldefector opened this issue Jan 28, 2016 · 13 comments
Closed

Add support for functional dependencies within a variable table #471

alldefector opened this issue Jan 28, 2016 · 13 comments
Assignees

Comments

@alldefector
Copy link
Contributor

This is the front-end support for the multinomial variable type in the sampler.

This is critical for classification targets such as entity linking, where the table schema is <object, class> and for each value of 'object', only one 'class' can be true in any state. If we don't support this constraint, we'd have to use the pairwise exclusivity rule to emulate it -- which would result in dramatic blowups in the factor graph.

@chrismre
Copy link
Contributor

This should have really been in the code a year ago @feiranwang. @zhangce Can you fix this?

@netj
Copy link
Contributor

netj commented Jan 28, 2016

For functional dependencies, DeepDive can recognize columns of variable relations without @key annotations if it is present on any column. However, It's not clear to me how exactly such dependencies should be grounded for the sampler. Should they be turned into categorical variables, or attach oneIsTrue factors to them? Can someone clarify a bit for me?

@alldefector
Copy link
Contributor Author

The OneIsTrue factor type doesn't seem to be implemented as a constraint, so categorical variable seems like the way to go. For grounding, I'd do a GROUP BY @key to get an array of tuple IDs for each group; and each such array would be a categorical variable in the sampler.

@netj
Copy link
Contributor

netj commented Jan 28, 2016

Right, unless we turn it into categorical, there's probably no difference in the blowup. This sounds also doable at the DDlog level, by desugaring such cases. I think you'll want to correlate such categorical variables with other boolean ones, but I believe that is currently not possible. I think we'll need to add a few more ways to mix them as well to make this actually useful, implications, etc. Please correct me if I'm wrong. With my limited exposure to such use cases, the current Categorical/Multinomial support doesn't really typecheck with the rest in my head, so I wish someone could clarify everything with a full blown example.

@alldefector
Copy link
Contributor Author

Very good point! Yes, looks like we do have to keep the identities of the underlying boolean variables in those groups so that they can correlate with other variables. I don't know how categorical variables are currently implemented in the sampler. For things to work, the underlying boolean variables must be referenceable in the factors.

Conceptually it'd be cleaner to say all variables are boolean and it's possible enforce one-is-true constraints among a group of variables. As for implementation, I don't know if it's easier for the Variable class or the Factor class to handle such constraints...

@feiranwang
Copy link
Contributor

@alldefector Sorry I'm not quite following here. It seems this can be done using multinomial variables or boolean with one-is-true constraints. Could you give a concrete example that requires more support in the system?

@alldefector
Copy link
Contributor Author

Yes, it is the one-is-true constraint. However, the sampler doesn't seem to support this constraint currently. There is a factor type by the same name, but it's not a constraint.

Also, there is no front-end support in DeepDive / DDLog to take advantage of it once we do have such support in the sampler.

@ajratner
Copy link
Contributor

Vanilla gibbs sampling gets broken if you add in a "hard" constraint (i.e.
infinite / near-infinite weight) like OneIsTrue, because ergodicity gets
broken (you can't reach certain states from certain other states). However
you could just do block gibbs, where you would sample from all the boolean
variables connected by this OneIsTrue factor (conditioned on all the other
vars' values at that point) in a single gibbs step.

Of course this problem goes away with actual categorical variables.
However Jaeho and I were briefly chatting about some of the challenges
involved in integrating categorical variables, including dynamic scoping of
individual categorical variables...

I actually think implementing this simple block gibbs scheme might be
easier?

On Thu, Jan 28, 2016 at 4:44 PM alldefector notifications@github.com
wrote:

Yes, it is the one-is-true constraint. However, the sampler doesn't seem
to support this constraint currently. There is a factor type by the same
name, but it's not a constraint.


Reply to this email directly or view it on GitHub
#471 (comment)
.

@alldefector
Copy link
Contributor Author

@chrismre
Copy link
Contributor

ancient systems... dastardly.

On Thu, Jan 28, 2016 at 5:15 PM alldefector notifications@github.com
wrote:

That'd be great! FWIW, these ancient systems probably had such blocking
from the get-go:

http://research.microsoft.com/en-us/um/cambridge/projects/infernet/docs/Gibbs%20sampling.aspx
https://alchemy.cs.washington.edu/tutorial/tutorial.pdf (page 4)
http://i.stanford.edu/hazy/tuffy/doc/tuffy-manual.pdf (page 8)


Reply to this email directly or view it on GitHub
#471 (comment)
.

@chrismre
Copy link
Contributor

chrismre commented Feb 5, 2016

Bump. Can someone explain what's going on here? @netj

@chrismre
Copy link
Contributor

chrismre commented Feb 5, 2016

Also, @netj and @feiranwang, it should be super easy to declare categorical that are linked (say for our old entity linking design, they need to be able to express

LinksToOne(candidate, entity) as a categorical random variable (each candidate maps to one entity).

This should be super easy (@key is fine). @zhangce, I know you're traveling but your thoughts are welcome when you get back on line :)

@ajratner I don't want to redo the full constraints--no one seem to use them, and they have lots of code complexity... maybe in the summer :)

@alldefector
Copy link
Contributor Author

OK, we do have categorical vars now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants