Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text interning #21

Open
wants to merge 2 commits into
base: laurent/optimizations
Choose a base branch
from

Conversation

laurenthuberdeau
Copy link
Contributor

@laurenthuberdeau laurenthuberdeau commented Feb 24, 2022

This PR adds interning for the text values using Kmett's interning library: https://hackage.haskell.org/package/intern-0.9.4.

The results show that the improvement is marginal and not worth the complexity added by text interning (mostly around memory usage because the interning table has to be periodically reset for long running processes).

Before:

benchmarking clique/5  
time                 22.36 μs   (22.31 μs .. 22.41 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 22.42 μs   (22.37 μs .. 22.49 μs)
std dev              189.7 ns   (105.6 ns .. 338.8 ns)
                       
benchmarking clique/10 
time                 49.34 μs   (49.14 μs .. 49.56 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 49.37 μs   (49.26 μs .. 49.51 μs)
std dev              415.0 ns   (349.1 ns .. 520.3 ns)
                       
benchmarking clique/25 
time                 303.9 μs   (302.9 μs .. 304.7 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 303.7 μs   (302.9 μs .. 304.7 μs)
std dev              3.156 μs   (2.553 μs .. 4.061 μs)
                       
benchmarking clique/40 
time                 1.600 ms   (1.567 ms .. 1.643 ms)
                     0.996 R²   (0.995 R² .. 0.998 R²)
mean                 1.587 ms   (1.573 ms .. 1.608 ms)
std dev              56.38 μs   (47.58 μs .. 69.58 μs)
variance introduced by outliers: 23% (moderately inflated)
                       
benchmarking cliqueText/5
time                 22.57 μs   (22.54 μs .. 22.61 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 22.57 μs   (22.54 μs .. 22.61 μs)
std dev              112.7 ns   (88.75 ns .. 150.5 ns)
                       
benchmarking cliqueText/10
time                 51.02 μs   (50.89 μs .. 51.21 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 51.12 μs   (50.97 μs .. 51.30 μs)
std dev              543.4 ns   (447.1 ns .. 678.1 ns)
                       
benchmarking cliqueText/25
time                 326.8 μs   (325.6 μs .. 328.5 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 327.6 μs   (326.3 μs .. 329.1 μs)
std dev              4.839 μs   (3.978 μs .. 5.792 μs)
                       
benchmarking cliqueText/40
time                 1.656 ms   (1.650 ms .. 1.663 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.682 ms   (1.675 ms .. 1.689 ms)
std dev              25.50 μs   (21.87 μs .. 30.39 μs)
                       
benchmarking line/5    
time                 77.66 μs   (77.52 μs .. 77.84 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 77.70 μs   (77.64 μs .. 77.80 μs)
std dev              287.2 ns   (234.3 ns .. 390.2 ns)
                       
benchmarking line/10   
time                 264.7 μs   (264.3 μs .. 265.2 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 264.8 μs   (264.5 μs .. 265.0 μs)
std dev              857.7 ns   (695.0 ns .. 1.071 μs)
                       
benchmarking line/25   
time                 1.836 ms   (1.826 ms .. 1.853 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 1.827 ms   (1.819 ms .. 1.841 ms)
std dev              32.77 μs   (19.47 μs .. 52.54 μs)
                       
benchmarking line/40   
time                 5.703 ms   (5.683 ms .. 5.732 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 5.695 ms   (5.684 ms .. 5.710 ms)
std dev              37.23 μs   (26.37 μs .. 52.24 μs)
                       
benchmarking lineText/5
time                 78.06 μs   (77.94 μs .. 78.20 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 78.10 μs   (78.00 μs .. 78.27 μs)
std dev              431.1 ns   (336.9 ns .. 561.0 ns)
                       
benchmarking lineText/10
time                 265.9 μs   (265.4 μs .. 266.4 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 266.4 μs   (266.1 μs .. 266.9 μs)
std dev              1.177 μs   (889.8 ns .. 1.734 μs)
                       
benchmarking lineText/25
time                 1.843 ms   (1.839 ms .. 1.848 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.837 ms   (1.834 ms .. 1.843 ms)
std dev              15.03 μs   (9.223 μs .. 27.56 μs)
                       
benchmarking lineText/40
time                 5.780 ms   (5.744 ms .. 5.824 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 5.751 ms   (5.735 ms .. 5.777 ms)
std dev              60.19 μs   (39.69 μs .. 101.2 μs)
                       
benchmarking loop/50   
time                 465.2 μs   (463.8 μs .. 466.5 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 464.1 μs   (463.2 μs .. 465.0 μs)
std dev              3.211 μs   (2.612 μs .. 3.953 μs)
                       
benchmarking loop/100  
time                 1.361 ms   (1.358 ms .. 1.364 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.364 ms   (1.362 ms .. 1.367 ms)
std dev              8.221 μs   (6.735 μs .. 11.28 μs)
                       
benchmarking loop/500  
time                 21.00 ms   (20.94 ms .. 21.05 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 21.04 ms   (21.00 ms .. 21.07 ms)
std dev              84.67 μs   (63.56 μs .. 113.5 μs)
                       
benchmarking loop/1000 
time                 83.91 ms   (83.20 ms .. 84.55 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 83.88 ms   (83.56 ms .. 84.15 ms)
std dev              487.8 μs   (329.5 μs .. 733.2 μs)
                       
benchmarking tight/50  
time                 208.3 μs   (207.5 μs .. 209.2 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 209.1 μs   (208.4 μs .. 210.2 μs)
std dev              2.944 μs   (1.925 μs .. 5.091 μs)
                       
benchmarking tight/100 
time                 732.6 μs   (730.5 μs .. 735.1 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 737.7 μs   (735.4 μs .. 742.7 μs)
std dev              11.76 μs   (7.147 μs .. 20.15 μs)
                       
benchmarking tight/500 
time                 16.47 ms   (16.26 ms .. 16.62 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 17.22 ms   (16.98 ms .. 17.55 ms)
std dev              660.0 μs   (481.5 μs .. 902.5 μs)
variance introduced by outliers: 12% (moderately inflated)
                       
benchmarking tight/1000
time                 72.02 ms   (71.85 ms .. 72.18 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 71.84 ms   (71.72 ms .. 71.95 ms)
std dev              208.0 μs   (138.9 μs .. 329.0 μs)
                       
benchmarking parse/50  
time                 6.213 ms   (6.144 ms .. 6.282 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 6.196 ms   (6.163 ms .. 6.227 ms)
std dev              99.49 μs   (79.11 μs .. 134.6 μs)
                       
benchmarking parse/100 
time                 12.14 ms   (11.90 ms .. 12.42 ms)
                     0.998 R²   (0.996 R² .. 0.999 R²)
mean                 12.18 ms   (12.07 ms .. 12.30 ms)
std dev              294.8 μs   (249.4 μs .. 376.4 μs)
                       
benchmarking parse/500 
time                 66.92 ms   (66.43 ms .. 67.23 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 66.60 ms   (66.34 ms .. 66.80 ms)
std dev              416.1 μs   (244.2 μs .. 588.6 μs)
                       
benchmarking parse/1000
time                 135.6 ms   (133.9 ms .. 139.4 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 135.0 ms   (134.0 ms .. 136.1 ms)
std dev              1.732 ms   (1.234 ms .. 2.414 ms)
variance introduced by outliers: 11% (moderately inflated)

With interning:

benchmarking clique/5
time                 23.06 μs   (23.00 μs .. 23.11 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 23.33 μs   (23.25 μs .. 23.41 μs)
std dev              276.6 ns   (228.1 ns .. 342.8 ns)

benchmarking clique/10
time                 52.45 μs   (52.29 μs .. 52.57 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 52.70 μs   (52.54 μs .. 52.91 μs)
std dev              639.5 ns   (501.7 ns .. 809.0 ns)

benchmarking clique/25
time                 319.4 μs   (318.3 μs .. 320.6 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 319.7 μs   (318.6 μs .. 321.2 μs)
std dev              4.393 μs   (3.502 μs .. 5.486 μs)

benchmarking clique/40
time                 1.622 ms   (1.604 ms .. 1.645 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 1.605 ms   (1.598 ms .. 1.612 ms)
std dev              23.42 μs   (18.64 μs .. 32.66 μs)

benchmarking cliqueText/5
time                 23.52 μs   (23.45 μs .. 23.59 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 23.52 μs   (23.47 μs .. 23.59 μs)
std dev              192.6 ns   (147.9 ns .. 283.3 ns)

benchmarking cliqueText/10
time                 52.27 μs   (52.11 μs .. 52.41 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 52.23 μs   (52.11 μs .. 52.38 μs)
std dev              447.9 ns   (366.3 ns .. 533.3 ns)

benchmarking cliqueText/25
time                 334.5 μs   (332.5 μs .. 336.4 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 334.1 μs   (332.4 μs .. 335.6 μs)
std dev              5.565 μs   (4.758 μs .. 6.847 μs)

benchmarking cliqueText/40
time                 1.705 ms   (1.698 ms .. 1.712 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.691 ms   (1.684 ms .. 1.696 ms)
std dev              19.54 μs   (16.21 μs .. 24.18 μs)

benchmarking line/5
time                 78.87 μs   (78.75 μs .. 79.03 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 78.82 μs   (78.74 μs .. 78.94 μs)
std dev              317.5 ns   (265.7 ns .. 386.4 ns)

benchmarking line/10
time                 262.9 μs   (262.7 μs .. 263.2 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 263.4 μs   (263.2 μs .. 263.8 μs)
std dev              1.055 μs   (719.2 ns .. 1.753 μs)

benchmarking line/25
time                 1.769 ms   (1.764 ms .. 1.773 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.770 ms   (1.767 ms .. 1.774 ms)
std dev              10.84 μs   (9.263 μs .. 13.69 μs)

benchmarking line/40
time                 5.605 ms   (5.573 ms .. 5.641 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 5.563 ms   (5.548 ms .. 5.579 ms)
std dev              47.71 μs   (37.78 μs .. 61.05 μs)

benchmarking lineText/5
time                 79.11 μs   (78.95 μs .. 79.36 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 79.05 μs   (78.98 μs .. 79.18 μs)
std dev              335.1 ns   (209.2 ns .. 523.7 ns)

benchmarking lineText/10
time                 263.3 μs   (262.8 μs .. 263.7 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 263.5 μs   (263.2 μs .. 263.9 μs)
std dev              1.057 μs   (800.7 ns .. 1.504 μs)

benchmarking lineText/25
time                 1.769 ms   (1.766 ms .. 1.772 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.771 ms   (1.769 ms .. 1.774 ms)
std dev              10.31 μs   (7.899 μs .. 13.90 μs)

benchmarking lineText/40
time                 5.551 ms   (5.541 ms .. 5.561 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 5.545 ms   (5.535 ms .. 5.554 ms)
std dev              30.03 μs   (23.55 μs .. 37.81 μs)

benchmarking loop/50
time                 454.9 μs   (454.5 μs .. 455.4 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 456.5 μs   (455.8 μs .. 457.6 μs)
std dev              2.939 μs   (2.131 μs .. 4.595 μs)

benchmarking loop/100
time                 1.355 ms   (1.353 ms .. 1.358 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.352 ms   (1.349 ms .. 1.355 ms)
std dev              8.908 μs   (7.063 μs .. 12.98 μs)

benchmarking loop/500
time                 20.98 ms   (20.78 ms .. 21.16 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 20.70 ms   (20.63 ms .. 20.79 ms)
std dev              178.9 μs   (119.1 μs .. 262.4 μs)

benchmarking loop/1000
time                 81.85 ms   (81.63 ms .. 82.05 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 82.00 ms   (81.87 ms .. 82.18 ms)
std dev              279.7 μs   (161.3 μs .. 428.0 μs)

benchmarking tight/50
time                 199.8 μs   (199.7 μs .. 200.0 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 200.0 μs   (199.8 μs .. 200.2 μs)
std dev              728.8 ns   (573.4 ns .. 1.036 μs)

benchmarking tight/100
time                 706.2 μs   (704.7 μs .. 707.9 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 706.9 μs   (705.6 μs .. 708.3 μs)
std dev              4.614 μs   (3.753 μs .. 5.966 μs)

benchmarking tight/500
time                 16.36 ms   (16.32 ms .. 16.40 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 16.30 ms   (16.28 ms .. 16.33 ms)
std dev              69.30 μs   (57.14 μs .. 92.13 μs)

benchmarking tight/1000
time                 69.91 ms   (69.28 ms .. 70.25 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 70.41 ms   (70.09 ms .. 70.82 ms)
std dev              639.4 μs   (450.2 μs .. 823.7 μs)

benchmarking parse/50
time                 6.089 ms   (6.068 ms .. 6.108 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 6.065 ms   (6.047 ms .. 6.079 ms)
std dev              48.17 μs   (33.64 μs .. 76.37 μs)

benchmarking parse/100
time                 12.48 ms   (12.42 ms .. 12.54 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 12.49 ms   (12.44 ms .. 12.56 ms)
std dev              158.9 μs   (120.5 μs .. 236.7 μs)

benchmarking parse/500
time                 67.23 ms   (66.89 ms .. 67.55 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 67.20 ms   (66.98 ms .. 67.58 ms)
std dev              475.0 μs   (265.0 μs .. 754.7 μs)

benchmarking parse/1000
time                 135.2 ms   (134.8 ms .. 135.6 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 134.3 ms   (133.5 ms .. 134.6 ms)
std dev              793.0 μs   (284.7 μs .. 1.237 ms)
variance introduced by outliers: 11% (moderately inflated)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant