Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stable sort has major performance degradation #129

Open
deitch opened this issue May 25, 2023 · 12 comments · Fixed by #131
Open

stable sort has major performance degradation #129

deitch opened this issue May 25, 2023 · 12 comments · Fixed by #131
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@deitch
Copy link

deitch commented May 25, 2023

The new ToplogicalSort(), based on its StableTopologicalSort(), has really large performance degradation.

A program that used to take 42s, now takes 220-300s (sometimes more), with the vast majority of the time in StableToplogicalSort(). The current usage is the OSS wolfictl, but I will try to create a reproduction that is focused just on this.

The flame graphs below are from running that with the exact same inputs. The only difference is if I use v0.21.0 or v0.22.0

The first graph is calling StableToplogicalSort() with v0.22.0:

graph-stable-sort-v0220

The second is calling TopologicalSort() with v0.22.0 (unchanged, which isn't surprising, as it just wraps StableTopologicalSort():

graph-topological-v0220

The third is calling TopologocialSort() with v0.21.0:

graph-v0210

In the last, the sort doesn't even take enough time to be bothered into the list.

It might not be the sort itself, as other things might have changed between v0.21.0 and v0.22.0 🤷‍♂️

@deitch
Copy link
Author

deitch commented May 25, 2023

In the meantime, it should be easy to replicate with a graph that is [string, string], create a graph with maybe 600 nodes and a few edges among them, make sure it isn't cyclical, and then call TopologicalSort(). But I will try to recreate it.

@dominikbraun dominikbraun added bug Something isn't working help wanted Extra attention is needed labels May 27, 2023
@dominikbraun
Copy link
Owner

dominikbraun commented May 27, 2023

Thank you for filing this issue. I did some research and decided to proceed as follows:

I will create a patch release that reverts TopologicalSort to the old implementation so that it remains unaffected by the performance issues of StableTopolocialSort.

Then I'm going to take a look at the implementation of StableTopologicalSort and its performance penalties. I'm pretty sure the frequent calls to sort.Slice are the problem, and I will have to dig into its performance characteristics and time complexity first. Maybe there is a more time-efficient solution.

I've created a program that will create a disconnected directed graph consisting of simple vertex chains and run StableTopologicalSort on it.

package main

import (
	"github.com/dominikbraun/graph"
)

const (
	numberOfVertices = 12
	lengthOfChains   = 4
)

func main() {
	if numberOfVertices%lengthOfChains != 0 {
		panic("numberOfVertices must be divisible by lengthOfChains")
	}

	numberOfRows := numberOfVertices / lengthOfChains

	g := graph.New(graph.IntHash, graph.Directed())

	for i := 1; i <= numberOfVertices; i++ {
		_ = g.AddVertex(i)
	}

	vertex := 1

	for i := 1; i <= numberOfRows; i++ {
		for j := 1; j <= lengthOfChains; j++ {
			if j < lengthOfChains {
				if err := g.AddEdge(vertex, vertex+1); err != nil {
					panic(err)
				}
			}
			vertex++
		}
	}

	_, _ = graph.StableTopologicalSort(g, func(a, b int) bool {
		return a < b
	})
}

The number of vertices and the chain length can be set using numberOfVertices and lengthOfChains. Note that numberOfVertices has to be a multiple of lengthOfChains. Maybe this is enough for testing the performance.

Once this is fixed, I'll release another patch release containing the optimized implementation.

@deitch
Copy link
Author

deitch commented May 28, 2023

I created a slight variant that has a much larger graph, and includes generating a profile file:

package main

import (
	"fmt"
	"log"
	"os"
	"runtime/pprof"

	"github.com/dominikbraun/graph"
)

const (
	numberOfVertices = 500
	lengthOfChains   = 125
	profFile         = "/tmp/cpu.prof"
)

func main() {
	if numberOfVertices%lengthOfChains != 0 {
		panic("numberOfVertices must be divisible by lengthOfChains")
	}

	numberOfRows := numberOfVertices / lengthOfChains

	g := graph.New(graph.IntHash, graph.Directed())

	for i := 1; i <= numberOfVertices; i++ {
		_ = g.AddVertex(i)
	}

	vertex := 1

	for i := 1; i <= numberOfRows; i++ {
		for j := 1; j <= lengthOfChains; j++ {
			if j < lengthOfChains {
				if err := g.AddEdge(vertex, vertex+1); err != nil {
					panic(err)
				}
			}
			vertex++
		}
	}

	f, err := os.Create(profFile)
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()
	pprof.StartCPUProfile(f)
	defer pprof.StopCPUProfile()

	_, _ = graph.StableTopologicalSort(g, func(a, b int) bool {
		return a < b
	})
	fmt.Printf("saved output profile to %s\n", profFile)
}

Even with a 500-node graph, it still is giving only 250ms to sort the graph. How strange that mine give orders of magnitude more. And I was careful here to profile only starting before the sort, not before the creation.

I wonder if it could have to do with the node type? I will try that next.

@deitch
Copy link
Author

deitch commented May 28, 2023

FYI, I tried adding graph.Acyclic() and graph.PreventCycles(), didn't have a material impact

@deitch
Copy link
Author

deitch commented May 28, 2023

I modified it to give different string sizes, and different chain sizes. Here are some interesting outputs.

number of vertices: 500
length of chains: 125
string 10 sort time was 1.972775917s
string 20 sort time was 1.175261542s
int sort time was 278.591834ms

number of vertices: 500
length of chains: 25
string 10 sort time was 1.518682292s
string 20 sort time was 1.608444458s
int sort time was 1.466619s

number of vertices: 500
length of chains: 25
string 10 sort time was 1.618567583s
string 20 sort time was 2.099522875s
int sort time was 1.544102917s

number of vertices: 500
length of chains: 10
string 10 sort time was 1.839373541s
string 20 sort time was 1.667635875s
int sort time was 3.910998084s

We can try and get that into a table, but basically:

  • long chains provide better performance that shorter chains in int
  • long vs short seems a bit similar for string

No idea what is going on. I can try and convert this into a general testing regimen program.

here is the basic one I am using:

package main

import (
	"fmt"
	"log"
	"math/rand"
	"os"
	"runtime/pprof"
	"time"

	"github.com/dominikbraun/graph"
)

const (
	numberOfVertices = 500
	lengthOfChains   = 10
	profFile         = "/tmp/cpu.prof"
)

var letters = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")

func randSeq(n int) string {
	b := make([]rune, n)
	for i := range b {
		b[i] = letters[rand.Intn(len(letters))]
	}
	return string(b)
}

func main() {
	if numberOfVertices%lengthOfChains != 0 {
		panic("numberOfVertices must be divisible by lengthOfChains")
	}

	stringg10, stringComp10 := stringGraph(10)
	stringg20, stringComp20 := stringGraph(20)
	intg, intComp := intGraph()

	f, err := os.Create(profFile)
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()
	pprof.StartCPUProfile(f)
	defer pprof.StopCPUProfile()

	string10Start := time.Now()
	_, _ = graph.StableTopologicalSort(stringg10, stringComp10)
	string10Duration := time.Since(string10Start)
	string20Start := time.Now()
	_, _ = graph.StableTopologicalSort(stringg20, stringComp20)
	string20Duration := time.Since(string20Start)
	intStart := time.Now()
	_, _ = graph.StableTopologicalSort(intg, intComp)
	intDuration := time.Since(intStart)

	fmt.Printf("number of vertices: %d\n", numberOfVertices)
	fmt.Printf("length of chains: %d\n", lengthOfChains)
	fmt.Printf("string 10 sort time was %s\n", string10Duration)
	fmt.Printf("string 20 sort time was %s\n", string20Duration)
	fmt.Printf("int sort time was %s\n", intDuration)
	fmt.Printf("saved output profile to %s\n", profFile)
}

func stringGraph(size int) (graph.Graph[string, string], func(a, b string) bool) {
        rand.Seed(time.Now().UnixNano())

        numberOfRows := numberOfVertices / lengthOfChains

        g := graph.New(graph.StringHash, graph.Directed(), graph.Acyclic(), graph.PreventCycles())

        var vertices []string
        for i := 0; i < numberOfVertices; i++ {
                vertex := randSeq(size)
                vertices = append(vertices, vertex)
                _ = g.AddVertex(vertex)
        }

        vertex := 0

        for i := 0; i < numberOfRows; i++ {
                for j := 0; j < lengthOfChains; j++ {
                        if vertex < len(vertices) - 1 {
                                if err := g.AddEdge(vertices[vertex], vertices[vertex+1]); err != nil {
                                        panic(err)
                                }
                        }
                        vertex++
                }
        }
	return g, func(a, b string) bool {
		return a < b
	}
}


func intGraph() (graph.Graph[int, int], func(a, b int) bool) {
        numberOfRows := numberOfVertices / lengthOfChains

        g := graph.New(graph.IntHash, graph.Directed(), graph.Acyclic(), graph.PreventCycles())

        for i := 1; i <= numberOfVertices; i++ {
                _ = g.AddVertex(i)
        }

	vertex := 1

	for i := 1; i <= numberOfRows; i++ {
		for j := 1; j <= lengthOfChains; j++ {
			if j < lengthOfChains {
				if err := g.AddEdge(vertex, vertex+1); err != nil {
					panic(err)
				}
			}
			vertex++
		}
	}


	return g, func(a, b int) bool {
		return a < b
	}
}

@deitch
Copy link
Author

deitch commented May 29, 2023

FWIW, I tried replacing the less argument with a sort.Interface arg so I could call sort.Sort() instead of sort.Slice(). It seems to shave off about 15%, not insignificant, but not a ton either.

Here is part of the updated dag.go, in case it is helpful.

func TopologicalSort[K comparable, T any](g Graph[K, T]) ([]K, error) {
        return StableTopologicalSort(g, nil)
}

func StableTopologicalSort[K comparable, T any](g Graph[K, T], sorter func([]K)  sort.Interface) ([]K, error) {
        if !g.Traits().IsDirected {
                return nil, fmt.Errorf("topological sort cannot be computed on undirected graph")
        }

        predecessorMap, err := g.PredecessorMap()
        if err != nil {
                return nil, fmt.Errorf("failed to get predecessor map: %w", err)
        }

        queue := make([]K, 0)

        for vertex, predecessors := range predecessorMap {
                if len(predecessors) == 0 {
                        queue = append(queue, vertex)
                }
        }

        order := make([]K, 0, len(predecessorMap))
        visited := make(map[K]struct{})

        if sorter != nil {
                sort.Stable(sorter(queue))
        }

        for len(queue) > 0 {
                currentVertex := queue[0]
                queue = queue[1:]

                if _, ok := visited[currentVertex]; ok {
                        continue
                }

                order = append(order, currentVertex)
                visited[currentVertex] = struct{}{}

                for vertex, predecessors := range predecessorMap {
                        delete(predecessors, currentVertex)

                        if len(predecessors) == 0 {
                                queue = append(queue, vertex)
                                if sorter != nil {
                                        sort.Stable(sorter(queue))
                                }
                        }
                }
        }
       // skipping lots of lines
}

You can replace sort.Stable() with sort.Sort() for the other implementation.

In case the updated graphs are helpful, here they are:

sort.Stable:

graph-sort-stable

sort.Sort:

graph-sort-sort

sort.Slice (current implementation):

graph-sort-slice

Finally, I decided to try the experimental new generic sorting package at https://pkg.go.dev/golang.org/x/exp/slices#SortFunc

graph-slices-new

The signatures for the last one aren't changed at all, just the content of StableToplogicalSort():

import (
        "golang.org/x/exp/slices"
)

// ...
        if less != nil {
                slices.SortFunc(queue, less)
        }

instead of sort.Slice() in the 2 places is it called.

These all seem to help. It isn't too bad getting it from 208s to 173-179s, and then down to 158s. For a true performance test, I would need to run each one hundreds of times. I hope this helps.

@dominikbraun
Copy link
Owner

graph v0.22.1 reverts the implementation of TopologicalSort.

@deitch
Copy link
Author

deitch commented Jun 5, 2023

Do we want to pursue any of those Sort alternatives?

@dominikbraun
Copy link
Owner

@deitch Sorry for the late feedback, I've been busy in my new job the last couple of days. I've re-opened the issue because I'd like to keep track of the performance and maybe go for a ideal or close-to-ideal implementation using a binary heap for the internal queue, as stated in #131.

@deitch
Copy link
Author

deitch commented Jun 6, 2023

Congratulations. That's a good reason. Share your LinkedIn profile here for a minute (can delete later), and I'll connect.

@dominikbraun
Copy link
Owner

@deitch
Copy link
Author

deitch commented Jun 6, 2023

Connection sent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants