Skip to content

[PULL REQUEST] New methodology for n-dimensional rounding#179

Closed
Eric-Liu-SANDAG wants to merge 17 commits intomainfrom
nd-rounding
Closed

[PULL REQUEST] New methodology for n-dimensional rounding#179
Eric-Liu-SANDAG wants to merge 17 commits intomainfrom
nd-rounding

Conversation

@Eric-Liu-SANDAG
Copy link
Copy Markdown
Contributor

@Eric-Liu-SANDAG Eric-Liu-SANDAG commented Dec 13, 2025

Describe this pull request. What changes are being made?

A collection of changes related to n-dimensional rounding. Specifically, attempting to solve it such that it's guaranteed to converge and ideally runs faster

What issues does this pull request address?

Additional context

Some of the work done on this branch was shifted into a different branch and has already been merged into the main branch via #176. Therefore, the main purpose of this PR is to fully document the work done and to ensure that when the branch is deleted, the work will not be lost

Eric-Liu-SANDAG and others added 17 commits September 22, 2025 12:18
Includes:
* IPF implementation using numpy (much faster than IPFN!)
* Utilities for creating various random test data
* Stochastic (aka fuzzy) and PuLP methodologies for ND integerization, both minimally tested on at least 3-D data
`gq_other` went from ~54 seconds to ~14 seconds. At this point, the solving is taking about half the time, aka ~7 seconds, so I'm not sure there's much more to optimize
File is too large for GH, so it's in SQL Server as `[ws].[dbo].[Group_Quarters_Institutional_Correctional_Facilities_PULP_CBC_CMD]`
@Eric-Liu-SANDAG
Copy link
Copy Markdown
Contributor Author

This PR contains quite a few changes, which will be fully summarized below. Hopefully, this means the changes are easier to search up in the future in case the work is required again.

@Eric-Liu-SANDAG
Copy link
Copy Markdown
Contributor Author

ipf.py

This contents of this file have mostly been merged into main already via #176. Changes that were not merged are limited to testing code

@Eric-Liu-SANDAG
Copy link
Copy Markdown
Contributor Author

random_data.py

This file contains helper functions to create deterministic random data of the specified shape as well as some marginals to play around with. The three random functions used are uniform random, low skewed (80% of the values over .1 are randomly assigned new values between 0 and .1), and sparse (values are randomly set to zero, based on an input fraction (default fraction of 70%)).

@Eric-Liu-SANDAG
Copy link
Copy Markdown
Contributor Author

Eric-Liu-SANDAG commented Dec 16, 2025

nd_rounding.py

This file contains a bunch of different methods for solving ND rounding and some testing code. The methods include:

  1. A stochastic method (nd_controlling_fuzzy() and _nd_controlling_fuzzy_step()). In every iteration, the method selects a certain percentage (default 50%) of the current rounding error, and probabilistically assigns corrections until all rounding error is zero. The weight used to assign corrections is to multiply the rounding error along each dimension, so the coordinate $(x, y, z)$ would have the weight of $RE_x * RE_y * RE_z$, where $RE_n$ means the rounding error summed along that axis at point $n$. This method worked perfectly for large and dense data, but tended to run into dead ends when working on sparse data. The next method was a small attempt to address this issue
  2. A third-party solver (nd_controlling_pulp_solver() and nd_controlling_pulp_solver_2d()). These functions are basically the same, except one is for the 2D case only. These use the PuLP python library and various free third-party solving software to find ND rounding solutions. Most of the code in these functions is to actually set up the problem, using variables pulp.LpVariable() and equations pulp.LpAffineExpression(). After set up, solve() is called and the outputs are coerced back into the original format. This methods always works, but slows down significantly on larger datasets. Not only does the solving slow down, but the actual setup of creating variables/equations can get really slow.
  3. A combination of 1 and 2 (nd_controlling_mixed() and nd_controlling_mixed_safe()). The idea here is to use the extreme speed of method 1 and the actual solving capabilities of method 2 to do things quickly with a guaranteed solve. In other words, do stochastic until some point, then try PuLP to actually solve. The first uses a total rounding error threshold, default value of 1000. The second tried the stochastic method until failure, then undoes the steps, trying PuLP each time, until one finally works to solve. I found that neither of these solutions were very helpful in speeding things up. For the first method, it would still invariably on some input data no matter what threshold was chosen. On the second method, in the worst case, we would end up setting up and attempting to solve many different PuLP systems of equations, which was extremely slow

Overall, it was determined that barring some major speedup on larger datasets, solution 2 was workable, but only on group quarters data

@Eric-Liu-SANDAG
Copy link
Copy Markdown
Contributor Author

All .csv files

A bunch of data pulled from midway through the ASE module of the Estimates Program. This data was used to test the IPF and ND rounding on actual ASE data

@Eric-Liu-SANDAG
Copy link
Copy Markdown
Contributor Author

environment.yml

File was updated to include both pulp==3.3.0, which is the code to run various third-party solvers, and pulp[open_py]==3.3.0, which actually installs all the free third-party solvers

@Eric-Liu-SANDAG
Copy link
Copy Markdown
Contributor Author

.gitignore

File was updated to ignore all *.npy files, which is a compressed version of an numpy np.ndarray. This was used for method number 3 from nd_rounding.py in order to "rollback" from an unsolvable state to a previous state

@Eric-Liu-SANDAG
Copy link
Copy Markdown
Contributor Author

@GregorSchroeder any further thoughts or questions before I close the PR and delete the branch?

@Eric-Liu-SANDAG Eric-Liu-SANDAG deleted the nd-rounding branch December 17, 2025 19:36
@Eric-Liu-SANDAG
Copy link
Copy Markdown
Contributor Author

Just to confirm, the branch has been deleted so you cannot actually access the branch. However, the commits and changes still live on in this PR, so we can access the code again if necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant