Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interfaces for network models #47

Closed
simonbowly opened this issue Apr 21, 2023 · 3 comments
Closed

Interfaces for network models #47

simonbowly opened this issue Apr 21, 2023 · 3 comments

Comments

@simonbowly
Copy link
Member

simonbowly commented Apr 21, 2023

Meta issue to make sure we have consistent APIs between network-y mods #39 #36

Networkx is very clean for this, as we can have node and edge attributes and all the data for a mod is bundled into one python object. We probably want to avoid networkx/igraph/others as dependencies, but only if we can find just as clean an API for network models using the existing numpy/scipy/pandas dependencies.

For weighted or unweighted graphs we use scipy sparse arrays (e.g. mwis and matching):

# Graph adjacency matrix (upper triangular) as a sparse matrix.
g = nx.cubical_graph()
adjacency_matrix = sp.triu(nx.to_scipy_sparse_array(g))

# Vertex weights
weights = np.array([2**i for i in range(8)])

# Compute maximum weighted independent set.
mwis = maximum_weighted_independent_set(adjacency_matrix, weights)

however this doesn't quite suit networks which need multiple attributes per edge. We could pass multiple arrays, but this is unweildy as the user has to build two scipy objects with duplicated data for the edge pattern. The alignment between sparse indexes and positions in the demand array is not too nice either, as it's restricted to 0 ... n-1 node labels.

cost = ... # sparse array
capacity = ... # sparse array
demand = ... # dense array
min_cost_flow(cost, capacity, demand)

I also wrote a network flow function for use in bipartite matching, which uses dense arrays to pass all the edge source/target indices, costs & capacities. It's meant as a low-level utility for other mods to use though, it would be fast but not a nice API:

def solve_min_cost_flow(
env: gp.Env,
edge_source: np.ndarray,
edge_target: np.ndarray,
capacity: np.ndarray,
cost: np.ndarray,
demand: np.ndarray,
):

So I think pandas is the better alternative here, with input data as below. It's easy to do a direct formulation of a flow problem using gurobipy-pandas as well, see networkflow.md

edge_data = pd.DataFrame([
    {"source": 0, "target": 1, "capacity": 16, "cost": 0},
    {"source": 0, "target": 2, "capacity": 13, "cost": 0},
    {"source": 1, "target": 2, "capacity": 10, "cost": 0},
    {"source": 2, "target": 1, "capacity": 4, "cost": 0},
    {"source": 1, "target": 3, "capacity": 12, "cost": 0},
    {"source": 3, "target": 2, "capacity": 9, "cost": 0},
    {"source": 2, "target": 4, "capacity": 14, "cost": 0},
    {"source": 4, "target": 3, "capacity": 7, "cost": 0},
    {"source": 3, "target": 5, "capacity": 20, "cost": 0},
    {"source": 4, "target": 5, "capacity": 4, "cost": 0},
])

node_data = pd.DataFrame([
    {"node": 0, "demand": -23},
    {"node": 5, "demand": 23}
])

@torressa @stevedwards what do you think (for the final version, ok to continue hacking with networkx for the time being)? It's also fine to use networkx in the docs examples, either to generate networks or produce figures. @maliheha did this for MWIS; her examples generate a networkx graph and convert to scipy using networksx/numpy functions. It's similarly straightforward for pandas, see to_pandas_edgelist which gives the same structure as edge_data above from a graph with edge attributes.

@torressa
Copy link
Member

torressa commented Apr 21, 2023

I am happy with using pandas as our main external interface for problems with many edge attributes. It's cleaner than adding a new dep or using a function with many inputs.

To make it super easy for people to use I think we should aim to transform stuff for people with different preferences to use (we can add these as optional deps: i.e. only define the function when the import is available).

For graph-based mods, we could have, for example (thanks to Python's lack of overloading),

  • graph_mod(data: pd.DataFrame, ...)
  • graph_mod_sp(G: sp.sparray, ...) (This can be the default one when pandas one is not needed)
  • graph_mod_nx(G: nx.DiGraph, ...) (optional dep)

This will really help peeps in the graph community try these out. Of course, as @ruthmair pointed out, some of these libraries will already have algorithms for these problems in place, but sometimes they will not!

@simonbowly
Copy link
Member Author

Sounds good. You can overload the same function name if you want; just check the argument type. I've done that in bipartite matching. It looks ok but might start to look odd when e.g. networkx requires only one object as an argument but pandas/scipy require two.

@torressa torressa mentioned this issue Apr 25, 2023
7 tasks
@simonbowly simonbowly added this to the First public release milestone May 8, 2023
@simonbowly
Copy link
Member Author

We have a more consistent approach now: use scipy and pandas as much as possible, but also provide some documented way to pass a networkx graph. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants