New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: ipu not using all constraints #33

Closed
Eh2406 opened this Issue Jan 4, 2017 · 12 comments

Comments

Projects
None yet
3 participants
@Eh2406
Contributor

Eh2406 commented Jan 4, 2017

Some constraints given to ipu are ignored. I just noted that the number of columns in _FrequencyAndConstraints.ncols is not equal to len(person_freq.columns) + len(household_freq.columns) This means that some of the constraints are not being used.

Now trying the 5 whys method:

Why 1?

Because the keys of OrderedDict must be unique, and the column names of person_freq and household_freq are sequential numbers. So the person constraints over write the household constraints.

Why 2?

Why are they sequential numbers, they are not in the tests? In the unit test the columns have unique meaningful names.
Because we set them to sequential numbers. That is how it has been since @fscottfoti added it to git.

And that is as far as I go.

How do we fix?

Do we stop using a Dict as a backing for _FrequencyAndConstraints, what consequences does that have? Do we stop changing the index we send, what consequences does that have?

We noted this while we are finalizing our synthesized population so we could use a prompt fix.

@janowicz

This comment has been minimized.

Show comment
Hide comment
@janowicz

janowicz Jan 5, 2017

Contributor

Thanks for catching this @Eh2406! Since you guys are probably most recently immersed in this code, do you have a sense for which of the two solution routes you mention seems most promising? If you think you're in a position to issue a pull request for this issue, go for it. I'll probably have capacity to dig into this too on Friday or early next week

Contributor

janowicz commented Jan 5, 2017

Thanks for catching this @Eh2406! Since you guys are probably most recently immersed in this code, do you have a sense for which of the two solution routes you mention seems most promising? If you think you're in a position to issue a pull request for this issue, go for it. I'll probably have capacity to dig into this too on Friday or early next week

@Eh2406

This comment has been minimized.

Show comment
Hide comment
@Eh2406

Eh2406 Jan 5, 2017

Contributor

I'd lean toward removing h_constraint.index = h_jd.cat_id and p_constraint.index = p_jd.cat_id I could not figure out why it is necessary, but have not yet tried. @fscottfoti added it and probably had a good reason. alsw we need to check whether this is a user facing change.

I'd definitely add a assert self.ncols == len(person_freq.columns) + len(household_freq.columns) to and a test to ensure it does not sneak back.

Contributor

Eh2406 commented Jan 5, 2017

I'd lean toward removing h_constraint.index = h_jd.cat_id and p_constraint.index = p_jd.cat_id I could not figure out why it is necessary, but have not yet tried. @fscottfoti added it and probably had a good reason. alsw we need to check whether this is a user facing change.

I'd definitely add a assert self.ncols == len(person_freq.columns) + len(household_freq.columns) to and a test to ensure it does not sneak back.

@fscottfoti

This comment has been minimized.

Show comment
Hide comment
@fscottfoti

fscottfoti Jan 5, 2017

Member

I'd guess it is necessary but I sure can't remember at this point. Also, why do you say it requires the index to be sequential integers? Aren't these lines doing the opposite - i.e. using the index that's passed in so the index doesn't have to be sequential integers?

Member

fscottfoti commented Jan 5, 2017

I'd guess it is necessary but I sure can't remember at this point. Also, why do you say it requires the index to be sequential integers? Aren't these lines doing the opposite - i.e. using the index that's passed in so the index doesn't have to be sequential integers?

@Eh2406

This comment has been minimized.

Show comment
Hide comment
@Eh2406

Eh2406 Jan 5, 2017

Contributor

*_constraint.index is a descriptive name per row. *_jd.cat_id is sequential integers. So the code replaces names with integers. The bug is that names work, but integers don't. I will try removing those lines and see if it works this afternoon.

Contributor

Eh2406 commented Jan 5, 2017

*_constraint.index is a descriptive name per row. *_jd.cat_id is sequential integers. So the code replaces names with integers. The bug is that names work, but integers don't. I will try removing those lines and see if it works this afternoon.

@fscottfoti

This comment has been minimized.

Show comment
Hide comment
@fscottfoti

fscottfoti Jan 5, 2017

Member

Definitely seems like it's worth a try to remove and see if it still works for you.

Member

fscottfoti commented Jan 5, 2017

Definitely seems like it's worth a try to remove and see if it still works for you.

@Eh2406

This comment has been minimized.

Show comment
Hide comment
@Eh2406

Eh2406 Jan 5, 2017

Contributor

I have started work on the bug fix branch https://github.com/SEMCOG/synthpop/tree/bug_fix

Contributor

Eh2406 commented Jan 5, 2017

I have started work on the bug fix branch https://github.com/SEMCOG/synthpop/tree/bug_fix

@Eh2406

This comment has been minimized.

Show comment
Hide comment
@Eh2406

Eh2406 Jan 5, 2017

Contributor

So It is not so simple. Removing that h_constraint.index = h_jd.cat_id and p_constraint.index = p_jd.cat_id did not cause the the tests to pass, the sequential integers come from https://github.com/SEMCOG/synthpop/blob/semcog/synthpop/synthesizer.py#L85 i.e. https://github.com/SEMCOG/synthpop/blob/semcog/synthpop/categorizer.py#L88

Contributor

Eh2406 commented Jan 5, 2017

So It is not so simple. Removing that h_constraint.index = h_jd.cat_id and p_constraint.index = p_jd.cat_id did not cause the the tests to pass, the sequential integers come from https://github.com/SEMCOG/synthpop/blob/semcog/synthpop/synthesizer.py#L85 i.e. https://github.com/SEMCOG/synthpop/blob/semcog/synthpop/categorizer.py#L88

@Eh2406

This comment has been minimized.

Show comment
Hide comment
@Eh2406

Eh2406 Jan 10, 2017

Contributor

@janowicz How is it going? Have you made progres?

Contributor

Eh2406 commented Jan 10, 2017

@janowicz How is it going? Have you made progres?

@janowicz

This comment has been minimized.

Show comment
Hide comment
@janowicz

janowicz Jan 11, 2017

Contributor

I have not gotten to this yet. Wasn't sure if you were already taking it on with the bug_fix branch above. I have time tomorrow to do some work on this.

Contributor

janowicz commented Jan 11, 2017

I have not gotten to this yet. Wasn't sure if you were already taking it on with the bug_fix branch above. I have time tomorrow to do some work on this.

@Eh2406

This comment has been minimized.

Show comment
Hide comment
@Eh2406

Eh2406 Jan 11, 2017

Contributor

I added an assert to demonstrate the bug. But was unable to get the test to pass with the assert in place. and that is as far as I got.

Contributor

Eh2406 commented Jan 11, 2017

I added an assert to demonstrate the bug. But was unable to get the test to pass with the assert in place. and that is as far as I got.

@janowicz

This comment has been minimized.

Show comment
Hide comment
@janowicz

janowicz Jan 11, 2017

Contributor

Cool, thanks! I'll work on this today

Contributor

janowicz commented Jan 11, 2017

Cool, thanks! I'll work on this today

@janowicz

This comment has been minimized.

Show comment
Hide comment
@janowicz

janowicz Jan 13, 2017

Contributor

Closing this issue for now thanks to #34. We can re-open as needed.

Contributor

janowicz commented Jan 13, 2017

Closing this issue for now thanks to #34. We can re-open as needed.

@janowicz janowicz closed this Jan 13, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment