Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mat: format complex dense matrices #1707

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Conversation

wamuir
Copy link

@wamuir wamuir commented Aug 12, 2021

Please take a look.

This PR adds a CFormat function in gonum/mat to format complex dense matrices; this function is intended to complement the current Format function. Relevant issue #641 and a relevant PR #1414, although that PR appears to have stalled. Further, PR #1414 takes the approach of wholesale duplicating format.go, which may not be preferred.

Summary:

Following #1414 (comment), an objective was to avoid code duplication while implementing a formatter for complex matrices. To summarize the proposed approach: value formatters are added for formatting floats and complex numbers, which are then used during matrix formatting. Since these value formatters satisfy the fmt.Formatter interface, fmt.State can be passed, offering simplification in several areas (e.g., re-construction of a format string is no longer necessary). As it would be inefficient to spin up a a new formatter for each element in a matrix, a value formatter holds a buffer for reuse.

To tackle widthing, a state type is proposed (satisfying fmt.State) that can be fit to a matrix and that stores information on column widths, using the existing widther type. As before, iterating through matrix elements is required. After a call to At, a state will respond to subsequent Width calls with the relevant column width. Recommendations toward improving implementation are appreciated.

Notwithstanding the need for additional internal types, this feels simpler to me on the whole and I believe this approach should make extending to new syntaxes or data types easier, if desired. Thanks for reviewing -- will address feedback/comments.

Bench:

I've included a few benchmarks below. These compare the current and proposed implementation when formatting 10x10, 100x100 and 1000x1000 dense matrices.

implementation benchmark iter time bytes alloc
current BenchmarkFormat10 36134 33063 ns/op 2693 B/op 16 allocs/op
current BenchmarkFormat100 358 3747403 ns/op 1672025 B/op 137 allocs/op
current BenchmarkFormat1000 3 334034817 ns/op 163557800 B/op 1066 allocs/op
proposed BenchmarkFormat10 22086 54554 ns/op 2886 B/op 19 allocs/op
proposed BenchmarkFormat100 170 7072273 ns/op 1670752 B/op 49 allocs/op
proposed BenchmarkFormat1000 2 697675398 ns/op 163542896 B/op 80 allocs/op

@codecov-commenter
Copy link

codecov-commenter commented Aug 13, 2021

Codecov Report

Merging #1707 (5e086de) into master (85ca896) will increase coverage by 0.03%.
The diff coverage is 92.59%.

❗ Current head 5e086de differs from pull request most recent head 6ce02c8. Consider uploading reports for the commit 6ce02c8 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1707      +/-   ##
==========================================
+ Coverage   73.26%   73.29%   +0.03%     
==========================================
  Files         510      511       +1     
  Lines       60132    60152      +20     
==========================================
+ Hits        44056    44090      +34     
+ Misses      13584    13581       -3     
+ Partials     2492     2481      -11     
Impacted Files Coverage Δ
mat/format_value.go 84.41% <84.41%> (ø)
mat/format.go 98.67% <98.21%> (+8.88%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 85ca896...6ce02c8. Read the comment docs.

@sbinet
Copy link
Member

sbinet commented Aug 13, 2021

just a drive-by comment:
could you run the benchmarks before/after the proposed optimization a couple of times (say 10-20) and show the result of benchstat?
with the limited statistics you presented in the commit message, I get this rather poor statistical power result:

$> benchstat ./ref.txt ./new.txt 
name        old time/op    new time/op    delta
Format10      33.1µs ± 0%    54.6µs ± 0%   ~     (p=1.000 n=1+1)
Format100     3.75ms ± 0%    7.07ms ± 0%   ~     (p=1.000 n=1+1)
Format1000     334ms ± 0%     698ms ± 0%   ~     (p=1.000 n=1+1)

name        old alloc/op   new alloc/op   delta
Format10      2.69kB ± 0%    2.89kB ± 0%   ~     (p=1.000 n=1+1)
Format100     1.67MB ± 0%    1.67MB ± 0%   ~     (p=1.000 n=1+1)
Format1000     164MB ± 0%     164MB ± 0%   ~     (p=1.000 n=1+1)

name        old allocs/op  new allocs/op  delta
Format10        16.0 ± 0%      19.0 ± 0%   ~     (p=1.000 n=1+1)
Format100        137 ± 0%        49 ± 0%   ~     (p=1.000 n=1+1)
Format1000     1.07k ± 0%     0.08k ± 0%   ~     (p=1.000 n=1+1)

ie:

$> go test -run=NONE -bench=Format -benchmem -count=20 |& tee ref.txt
## apply optimizations
$> go test -run=NONE -bench=Format -benchmem -count=20 |& tee new.txt
$> benchstat ./ref.txt ./new.txt

thanks.

PS: benchstat is available here:

@kortschak
Copy link
Member

@wamuir Thank you for doing this. I've only taken a brief look, but it looks like nice work so far. I hope to be able to have a deeper look over the weekend.

mat/format_test.go Outdated Show resolved Hide resolved
mat/format.go Outdated Show resolved Hide resolved
mat/format_value.go Outdated Show resolved Hide resolved
mat/format_value.go Outdated Show resolved Hide resolved
mat/format_value.go Outdated Show resolved Hide resolved
mat/format.go Outdated Show resolved Hide resolved
mat/format.go Outdated Show resolved Hide resolved
mat/format.go Outdated Show resolved Hide resolved
mat/format_value.go Outdated Show resolved Hide resolved
mat/format_value.go Outdated Show resolved Hide resolved
@wamuir
Copy link
Author

wamuir commented Aug 13, 2021

just a drive-by comment:
could you run the benchmarks before/after the proposed optimization a couple of times (say 10-20) and show the result of benchstat?
with the limited statistics you presented in the commit message, I get this rather poor statistical power result:

$> benchstat ./ref.txt ./new.txt 
name        old time/op    new time/op    delta
Format10      33.1µs ± 0%    54.6µs ± 0%   ~     (p=1.000 n=1+1)
Format100     3.75ms ± 0%    7.07ms ± 0%   ~     (p=1.000 n=1+1)
Format1000     334ms ± 0%     698ms ± 0%   ~     (p=1.000 n=1+1)

name        old alloc/op   new alloc/op   delta
Format10      2.69kB ± 0%    2.89kB ± 0%   ~     (p=1.000 n=1+1)
Format100     1.67MB ± 0%    1.67MB ± 0%   ~     (p=1.000 n=1+1)
Format1000     164MB ± 0%     164MB ± 0%   ~     (p=1.000 n=1+1)

name        old allocs/op  new allocs/op  delta
Format10        16.0 ± 0%      19.0 ± 0%   ~     (p=1.000 n=1+1)
Format100        137 ± 0%        49 ± 0%   ~     (p=1.000 n=1+1)
Format1000     1.07k ± 0%     0.08k ± 0%   ~     (p=1.000 n=1+1)

ie:

$> go test -run=NONE -bench=Format -benchmem -count=20 |& tee ref.txt
## apply optimizations
$> go test -run=NONE -bench=Format -benchmem -count=20 |& tee new.txt
$> benchstat ./ref.txt ./new.txt

thanks.

PS: benchstat is available here:

* [golang.org/x/perf/benchstat](https://golang.org/x/perf/benchstat)

Thanks @sbinet, I wasn't aware of benchstat. Cool. Output is below. I added -benchtime=1x, because this appears to otherwise perform tests using the pooled mean? If true, I'd think the dispersion of time/op (±) is otherwise understated. I guess that's a nit on go test for not having a flag to output variance (or a nit on me for not being able to find it).

name old time/op new time/op delta
Format10 44.2µs ±17% 73.9µs ±38% +67.07% (p=0.000 n=20+18)
Format100 3.28ms ± 3% 7.55ms ± 5% +130.21% (p=0.000 n=20+18)
Format1000 329ms ± 1% 764ms ± 2% +132.17% (p=0.000 n=20+20)

Increase in clock time is statistically significant. @kortschak provided review comments, which I'll work through over the weekend and then rerun the bench. One of these comments is to use micro-benchmarking and I expect this will offer greater insight into differences. Addressing his comments may also close the current gap (e.g., his comment regarding fmt.Fprint and reflect).

@wamuir
Copy link
Author

wamuir commented Aug 14, 2021

Most recent commit addresses review comments, except for the switch to sub-benchmarks. I'll need to push a second commit to update the benchmarks. I attempted some further simplification of the format functions and I think this was productive. I also added comments along the way. Updated benchstat output is below.

name old time/op new time/op delta
Format10 44.9µs ±10% 62.9µs ±31% +40.08% (p=0.000 n=18+20)
Format100 3.31ms ± 1% 4.27ms ± 1% +29.20% (p=0.000 n=18+20)
Format1000 326ms ± 1% 426ms ± 1% +30.41% (p=0.000 n=20+20)

@wamuir
Copy link
Author

wamuir commented Aug 14, 2021

name old time/op new time/op delta
Format/10/General 52.6µs ±17% 68.8µs ±18% +30.74% (p=0.000 n=19+19)
Format/10/DotByte 49.5µs ±21% 67.9µs ±17% +37.19% (p=0.000 n=19+20)
Format/10/Excerpt 50.3µs ±11% 66.1µs ±21% +31.40% (p=0.000 n=19+20)
Format/10/Prefix 49.4µs ±10% 66.4µs ±17% +34.54% (p=0.000 n=17+20)
Format/10/Squeeze 53.0µs ±22% 69.7µs ±14% +31.49% (p=0.000 n=20+20)
Format/10/MATLAB 51.5µs ±12% 66.4µs ±12% +28.79% (p=0.000 n=20+19)
Format/10/MATLAB# 43.0µs ±18% 42.9µs ±23% ~ (p=0.967 n=20+20)
Format/10/Python 50.4µs ±13% 64.0µs ± 9% +26.99% (p=0.000 n=20+19)
Format/10/Python# 42.4µs ±37% 43.1µs ±30% ~ (p=0.757 n=20+20)
Format/100/General 3.13ms ± 7% 4.43ms ± 9% +41.42% (p=0.000 n=17+20)
Format/100/DotByte 3.94ms ± 1% 4.35ms ± 8% +10.32% (p=0.000 n=19+20)
Format/100/Excerpt 4.00ms ± 1% 4.54ms ± 6% +13.45% (p=0.000 n=20+20)
Format/100/Prefix 4.00ms ± 1% 5.61ms ± 1% +40.23% (p=0.000 n=20+16)
Format/100/Squeeze 4.42ms ± 1% 5.35ms ±18% +21.01% (p=0.000 n=14+20)
Format/100/MATLAB 4.68ms ±15% 6.46ms ±13% +37.92% (p=0.000 n=20+20)
Format/100/MATLAB# 2.91ms ± 3% 2.90ms ± 2% ~ (p=0.247 n=20+20)
Format/100/Python 6.33ms ±16% 6.38ms ±35% ~ (p=0.890 n=20+20)
Format/100/Python# 3.56ms ±13% 1.69ms ± 1% -52.45% (p=0.000 n=20+19)
Format/1000/General 306ms ± 0% 405ms ± 0% +32.08% (p=0.000 n=19+20)
Format/1000/DotByte 303ms ± 0% 392ms ± 0% +29.54% (p=0.000 n=20+20)
Format/1000/Excerpt 308ms ± 1% 405ms ± 0% +31.76% (p=0.000 n=20+20)
Format/1000/Prefix 307ms ± 0% 404ms ± 0% +31.67% (p=0.000 n=20+20)
Format/1000/Squeeze 307ms ± 0% 405ms ± 0% +31.83% (p=0.000 n=20+20)
Format/1000/MATLAB 307ms ± 0% 405ms ± 0% +31.99% (p=0.000 n=19+20)
Format/1000/MATLAB# 164ms ± 1% 164ms ± 1% ~ (p=0.561 n=19+19)
Format/1000/Python 308ms ± 1% 406ms ± 1% +31.86% (p=0.000 n=20+18)
Format/1000/Python# 164ms ± 1% 164ms ± 1% ~ (p=0.604 n=20+19)
[Geo mean] 3.74ms 4.43ms +18.29%

@vladimir-ch
Copy link
Member

This PR has gone a bit stale. @wamuir are you still interested in submitting this change?

@wamuir
Copy link
Author

wamuir commented Aug 1, 2023

@vladimir-ch Yes. I think this needs another review. I can make revisions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants