Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a method to fetch attributes for multiple columns #60

Open
simonbowly opened this issue Oct 11, 2023 · 6 comments
Open

Add a method to fetch attributes for multiple columns #60

simonbowly opened this issue Oct 11, 2023 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@simonbowly
Copy link
Member

Proposed by @jacksimpsoncartesian. Say we have a dataframe including multiple series of variables:

>>> env = gp.Env()
>>> model = gp.Model()
>>> df = (
...     pd.DataFrame({
...         "obj": [4, 3, 2, 1],
...         "ub": [1, 2, 3, 4],
...     })
...     .gppd.add_vars(model, ub="ub", name="x")
...     .gppd.add_vars(model, obj="obj", name="y")
... )
>>> model.update()
>>> df
   obj  ub                  x                  y
0    4   1  <gurobi.Var x[0]>  <gurobi.Var y[0]>
1    3   2  <gurobi.Var x[1]>  <gurobi.Var y[1]>
2    2   3  <gurobi.Var x[2]>  <gurobi.Var y[2]>
3    1   4  <gurobi.Var x[3]>  <gurobi.Var y[3]>

we can already extract solution values series-wise, so the canonical way to append this new data as columns would be to use .assign():

>>> results = (
...     df.assign(x_X=lambda df: df['x'].gppd.X)
...     .assign(y_Y=lambda df: df['y'].gppd.X)
...     .assign(x_RC=lambda df: df['x'].gppd.RC)
... )
>>>
>>> results
   obj  ub                              x                              y  x_X  y_Y  x_RC
0    4   1  <gurobi.Var x[0] (value 0.0)>  <gurobi.Var y[0] (value 0.0)>  0.0  0.0   0.0
1    3   2  <gurobi.Var x[1] (value 0.0)>  <gurobi.Var y[1] (value 0.0)>  0.0  0.0   0.0
2    2   3  <gurobi.Var x[2] (value 0.0)>  <gurobi.Var y[2] (value 0.0)>  0.0  0.0   0.0
3    1   4  <gurobi.Var x[3] (value 0.0)>  <gurobi.Var y[3] (value 0.0)>  0.0  0.0   0.0

A useful feature could be a method or accessor to extract multiple attributes for multiple columns in a single call. For example:

  • df.gppd.X: extract the X attribute for all columns, and/or
  • df.gppd.get_attrs(["X", "RC"]) extract both attributes for all columns.

Some open questions for how to design this:

  • Should a missing attribute from one column cause a failure? Or should a mix of data and variables be gracefully handled? A pandonic option might be an errors=raise/ignore keyword.
  • Should new columns be appended, or should this method be a transform from a dataframe of variables to a dataframe of results? I think the latter makes more sense; if one method can extract all the necessary result data, then dropping the variables (which are no longer needed) seems like it would encourage a cleaner data-in data-out API for models.
  • How should new columns be named?
@simonbowly simonbowly added the enhancement New feature or request label Oct 11, 2023
@simonbowly simonbowly self-assigned this Oct 11, 2023
@jacksimpsoncartesian
Copy link

Thanks so much Simon, I haven't used the assign approach before so I'll test that out tomorrow.

I think you're right that returning the dataframe with the variable columns replaced with values would be the cleanest, however if you think there are users who may want to retain the variable columns, it isn't that big a deal if it created new columns with a name flagging that they're the values of the variables columns.

Would it be possible to do something similar when extracting the duals for constraint columns in in the dataframe? I'm often extracting a large number of different duals from these models and inserting them back into the dataframes.

Thanks again!

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Oct 11, 2023

IMHO, the design of the application using gurobipy-pandas should separate data, decision variables, constraints, and solutions. What they share in common are indices.

So if you had a DataFrame with columns x, y and z, each holding decision variables, and they shared the same index, then you'd put the solution in a different DataFrame - same index, and it could have the same names. For different attributes, you could postpend the name of the attribute.

So I'm not supportive of this example:

>>> env = gp.Env()
>>> model = gp.Model()
>>> df = (
...     pd.DataFrame({
...         "obj": [4, 3, 2, 1],
...         "ub": [1, 2, 3, 4],
...     })
...     .gppd.add_vars(model, ub="ub", name="x")
...     .gppd.add_vars(model, obj="obj", name="y")
... )
>>> model.update()
>>> df
   obj  ub                  x                  y
0    4   1  <gurobi.Var x[0]>  <gurobi.Var y[0]>
1    3   2  <gurobi.Var x[1]>  <gurobi.Var y[1]>
2    2   3  <gurobi.Var x[2]>  <gurobi.Var y[2]>
3    1   4  <gurobi.Var x[3]>  <gurobi.Var y[3]>

I prefer this style:

>>> import gurobipy as grb
>>> import gurobipy_pandas as gppd
>>> import pandas as pd
>>> data_df = pd.DataFrame({"obj": [4,3,2,1], "ub": [1,2,3,4]})
>>> data_df
   obj  ub
0    4   1
1    3   2
2    2   3
3    1   4
>>> m=grb.Model()
### REMOVED LICENSE STUFF
>>> vars_df = pd.DataFrame({"x": gppd.api.add_vars(m, data_df, name="x", ub="ub"), 
...      "y": gppd.api.add_vars(m, data_df, name="y", obj="obj")})
>>> m.optimize()
Gurobi Optimizer version 10.0.0 build v10.0.0rc2 (win64)

CPU model: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz, instruction set [SSE2|AVX|AVX2]
Thread count: 8 physical cores, 16 logical processors, using up to 16 threads

### REMOVED LICENSE INFO THAT HAS MY EMAIL ADDRESS
Optimize a model with 0 rows, 8 columns and 0 nonzeros
Model fingerprint: 0x42a99770
Coefficient statistics:
  Matrix range     [0e+00, 0e+00]
  Objective range  [1e+00, 4e+00]
  Bounds range     [1e+00, 4e+00]
  RHS range        [0e+00, 0e+00]
Presolve removed 0 rows and 8 columns
Presolve time: 0.00s
Presolve: All rows and columns removed
Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0    0.0000000e+00   0.000000e+00   0.000000e+00      0s

Solved in 0 iterations and 0.01 seconds (0.00 work units)
Optimal objective  0.000000000e+00
>>> sol_df = pd.DataFrame({f"{v}_{attr}" : vars_df[f"{v}"].gppd.get_attr(attr) 
...    for v in vars_df.columns for attr in ["x", "RC"]})
>>> sol_df
   x_x  x_RC  y_x  y_RC
0  0.0   0.0  0.0   4.0
1  0.0   0.0  0.0   3.0
2  0.0   0.0  0.0   2.0
3  0.0   0.0  0.0   1.0
>>>

So there is now data_df, vars_df and sol_df that has data, variables and solution all separated.

@simonbowly
Copy link
Member Author

Would it be possible to do something similar when extracting the duals for constraint columns in in the dataframe? I'm often extracting a large number of different duals from these models and inserting them back into the dataframes.

Sure, this is also just an attribute, so it can be extracted the same way from a series of constraints. See for example the "slack" attribute in the Projects-Teams example.

@simonbowly
Copy link
Member Author

Sure, I know your opinion on that @Dr-Irv, but the question here is really whether to take the existing functionality on Series:

>>> pd.DataFrame(dict(
...     x_x=vars_df["x"].gppd.X,
...     x_RC=vars_df["x"].gppd.RC,
...     y_x=vars_df["y"].gppd.X,
...     y_RC=vars_df["y"].gppd.RC,
... ))
   x_x  x_RC  y_x  y_RC
0  0.0   0.0  0.0   4.0
1  0.0   0.0  0.0   3.0
2  0.0   0.0  0.0   2.0
3  0.0   0.0  0.0   1.0

and wrap it up in an accessor on the DataFrame which handles (potentially) mixed column types and attributes.

@simonbowly
Copy link
Member Author

I think the series accessors combined with existing pandas methods may be enough for now. Both are quite concise, and adding a new method which needs to handle various combinations of columns with different attributes could wind up with hard to explain behaviour. So, I will leave this one alone for now.

@jacksimpsoncartesian
Copy link

Thanks so much Simon, really appreciate you taking the time to review this - happy to use the existing methods to update my code - that should save me a lot of code which is what I was after!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants