Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHACL - Larger DCAT benchmark #1375

Closed
hmottestad opened this issue Apr 1, 2019 · 6 comments
Closed

SHACL - Larger DCAT benchmark #1375

hmottestad opened this issue Apr 1, 2019 · 6 comments

Comments

@hmottestad
Copy link
Contributor

Use the DCAT file that Bart mentions to do create a new benchmark.

We are interested in both:

  • how long it takes to validate that single file
  • how long it takes to validate an additive transaction
  • how long it takes to update in a transaction
  • how long it takes to delete something in a transaction
@hmottestad
Copy link
Contributor Author

Current benchmarks for additive transactions are not that great:


Benchmark                                                          Mode  Cnt     Score    Error  Units
ComplexLargeBenchmark.shaclCachePreloaded                          avgt   10  2081.310 ± 24.014  ms/op
ComplexLargeBenchmark.shaclNothingToValidateTransactionsPreloaded  avgt   10    19.176 ±  2.021  ms/op
ComplexLargeBenchmark.shaclParallelCachePreloaded                  avgt   10  1049.082 ± 34.395  ms/op
ComplexLargeBenchmark.shaclParallelPreloaded                       avgt   10  1057.872 ± 21.679  ms/op
ComplexLargeBenchmark.shaclPreloaded                               avgt   10  2074.236 ± 17.923  ms/op

They should all be ~50 ms.

hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Apr 1, 2019
…ex benchmark shacl rules

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
@hmottestad
Copy link
Contributor Author

hmottestad commented Apr 1, 2019

BulkedExternalXXXXXXXX needs to be optimised.

Currently, after some crude optimisation of the BulkedExternalInnerJoin code I got these results:

Benchmark                                                          Mode  Cnt      Score     Error  Units
ComplexLargeBenchmark.noPreloading                                 avgt   10  14901.215 ± 202.276  ms/op
ComplexLargeBenchmark.shaclCachePreloaded                          avgt   10     58.958 ±   3.597  ms/op
ComplexLargeBenchmark.shaclNothingToValidateTransactionsPreloaded  avgt   10     18.150 ±   1.568  ms/op
ComplexLargeBenchmark.shaclParallelCachePreloaded                  avgt   10     47.101 ±   4.696  ms/op
ComplexLargeBenchmark.shaclParallelPreloaded                       avgt   10     52.670 ±  27.603  ms/op
ComplexLargeBenchmark.shaclPreloaded                               avgt   10     60.485 ±   6.346  ms/op


@hmottestad
Copy link
Contributor Author

This is how it stacks up to an empty store:

Benchmark                                                        Mode  Cnt   Score   Error  Units
ComplexBenchmark.shaclCache                                      avgt   10  25.392 ± 0.569  ms/op
ComplexBenchmark.shaclNothingToValidateTransactions              avgt   10  12.722 ± 0.341  ms/op
ComplexBenchmark.shaclParallelCache                              avgt   10  21.101 ± 0.408  ms/op
ComplexBenchmark.shaclParallel                                   avgt   10  21.380 ± 0.192  ms/op
ComplexBenchmark.shacl                                           avgt   10  26.014 ± 0.690  ms/op

hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Apr 1, 2019
Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Apr 2, 2019
…ex benchmark shacl rules

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Apr 2, 2019
Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Apr 2, 2019
…ex benchmark shacl rules and optimisations

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
@hmottestad
Copy link
Contributor Author

hmottestad commented Apr 2, 2019

# Run complete. Total time: 00:39:52

Benchmark                                                          Mode  Cnt      Score    Error  Units
ComplexLargeBenchmark.noPreloading                                 avgt   10   14470.711 ± 92.773  ms/op
ComplexLargeBenchmark.noPreloadingRevalidate                       avgt   10   10439.221 ± 99.382  ms/op
ComplexLargeBenchmark.noPreloadingRevalidateNativeStore            avgt   10   33301.501 ± 966.897  ms/op
ComplexLargeBenchmark.shaclCachePreloaded                          avgt   10      61.326 ±  3.509  ms/op
ComplexLargeBenchmark.shaclNothingToValidateTransactionsPreloaded  avgt   10      19.108 ±  1.616  ms/op
ComplexLargeBenchmark.shaclParallelCacheDeletionPreloaded          avgt   10      22.147 ±  2.534  ms/op
ComplexLargeBenchmark.shaclParallelCachePreloaded                  avgt   10      49.944 ±  2.171  ms/op
ComplexLargeBenchmark.shaclParallelCacheUpdatePreloaded            avgt   10      21.414 ±  2.748  ms/op
ComplexLargeBenchmark.shaclParallelPreloaded                       avgt   10      50.138 ±  4.267  ms/op
ComplexLargeBenchmark.shaclPreloaded                               avgt   10      62.743 ±  3.999  ms/op

hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Apr 2, 2019
Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Apr 3, 2019
* develop:
  inner joing fix
  eclipse-rdf4j/rdf4j#1380 innerjoin
  eclipse-rdf4j/rdf4j#1380 improved maxCount performance
  eclipse-rdf4j/rdf4j#1378 simplified code
  eclipse-rdf4j/rdf4j#1378 simple improvements
  eclipse-rdf4j/rdf4j#1378 benchmarks
  eclipse-rdf4j/rdf4j#1375 updated benchmarks
  eclipse-rdf4j/rdf4j#1375 added large benchmark and adjusted the complex benchmark shacl rules and optimisations

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

# Conflicts:
#	shacl/src/main/java/org/eclipse/rdf4j/sail/shacl/planNodes/BulkedExternalInnerJoin.java
#	shacl/src/test/java/org/eclipse/rdf4j/sail/shacl/benchmark/ComplexBenchmark.java
@hmottestad hmottestad reopened this Apr 6, 2019
hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Apr 6, 2019
Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
@hmottestad
Copy link
Contributor Author

Updated results:

# Run complete. Total time: 01:05:10

Benchmark                                                          Mode  Cnt      Score      Error  Units
ComplexLargeBenchmark.noPreloading                                 avgt   10  14031.845 ±  345.734  ms/op
ComplexLargeBenchmark.noPreloadingRevalidate                       avgt   10  15821.461 ±  462.414  ms/op
ComplexLargeBenchmark.noPreloadingRevalidateNativeStore            avgt   10  41029.357 ± 1349.916  ms/op
ComplexLargeBenchmark.shaclCachePreloaded                          avgt   10     60.737 ±    4.200  ms/op
ComplexLargeBenchmark.shaclNothingToValidateTransactionsPreloaded  avgt   10     19.105 ±    0.846  ms/op
ComplexLargeBenchmark.shaclParallelCacheDeletionPreloaded          avgt   10     22.917 ±    2.300  ms/op
ComplexLargeBenchmark.shaclParallelCachePreloaded                  avgt   10     48.405 ±    3.458  ms/op
ComplexLargeBenchmark.shaclParallelCacheUpdatePreloaded            avgt   10     21.494 ±    6.385  ms/op
ComplexLargeBenchmark.shaclParallelPreloaded                       avgt   10     46.656 ±    2.268  ms/op
ComplexLargeBenchmark.shaclPreloaded                               avgt   10     59.353 ±    2.076  ms/op

hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Apr 7, 2019
Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
@hmottestad
Copy link
Contributor Author

@barthanssens I'll bring the discussion over here instead of in the PR.

Do you have a list of the portals that you collect the data from?

hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Apr 12, 2019
Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Apr 13, 2019
Signed-off-by: Håvard Ottestad <hmottestad@gmail.com> (+4 squashed commits)
Squashed commits:
[74e7de9] small benchmark file

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
[8d52fa1] eclipse-rdf4j/rdf4j#1375 parallel sort if list is big

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
[d2d07bc] fixes for benchmark file that jena reported

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
[8cd93c5] merge conflict fix

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Apr 13, 2019
…nd option for logging performance per shape during validation

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Apr 13, 2019
Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Apr 13, 2019
* develop: (40 commits)
  optimize imports and format
  eclipse-rdf4j/rdf4j#1375 bigger benchmark based on a generated file and option for logging performance per shape during validation
  potential bug fix and performance optimization
  formatter
  nicer query plans
  formatter
  eclipse-rdf4j/rdf4j#1388 configurable caching
  eclipse-rdf4j/rdf4j#1388 moved auto batched flushing
  moved added
  further reduce memory allowance in benchmark
  update benchmark
  eclipse-rdf4j/rdf4j#1388 clean up benchmark
  eclipse-rdf4j/rdf4j#1388 allow for cache less validation
  more utf8 and formatting
  eclipse-rdf4j/rdf4j#1388 periodically flush when loading big files in a single transaction without isolation
  formatting
  eclipse-rdf4j/rdf4j#1388 create benchmark
  clean up use of utf-8 and various other code cleanup
  eclipse-rdf4j/rdf4j#1384 bug fix
  eclipse-rdf4j/rdf4j#1384 optimize
  ...

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

# Conflicts:
#	shacl/pom.xml
#	shacl/src/main/java/org/eclipse/rdf4j/sail/shacl/AST/PropertyShape.java
#	shacl/src/main/java/org/eclipse/rdf4j/sail/shacl/ShaclSailConnection.java
#	shacl/src/test/java/org/eclipse/rdf4j/sail/shacl/benchmark/ComplexBenchmark.java
hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Apr 13, 2019
* develop: (57 commits)
  optimize imports and format
  eclipse-rdf4j/rdf4j#1375 bigger benchmark based on a generated file and option for logging performance per shape during validation
  potential bug fix and performance optimization
  formatter
  nicer query plans
  formatter
  eclipse-rdf4j/rdf4j#1388 configurable caching
  eclipse-rdf4j/rdf4j#1388 moved auto batched flushing
  moved added
  further reduce memory allowance in benchmark
  update benchmark
  eclipse-rdf4j/rdf4j#1388 clean up benchmark
  eclipse-rdf4j/rdf4j#1388 allow for cache less validation
  more utf8 and formatting
  eclipse-rdf4j/rdf4j#1388 periodically flush when loading big files in a single transaction without isolation
  formatting
  eclipse-rdf4j/rdf4j#1388 create benchmark
  clean up use of utf-8 and various other code cleanup
  eclipse-rdf4j/rdf4j#1384 bug fix
  eclipse-rdf4j/rdf4j#1384 optimize
  ...

Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>

# Conflicts:
#	shacl/src/test/java/org/eclipse/rdf4j/sail/shacl/Utils.java
hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Apr 15, 2019
Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Apr 15, 2019
Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
hmottestad added a commit to eclipse/rdf4j-storage that referenced this issue Jul 4, 2019
Signed-off-by: Håvard Ottestad <hmottestad@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant