[WIP] More robust Updates verification by knewmanTE · Pull Request #2536 · TechEmpower/FrameworkBenchmarks

knewmanTE · 2017-01-31T19:27:24Z

Currently, our verification for the /updates endpoints is the same as the /queries endpoints, so it is checking for properly formatted JSON, but doesn't actually do any checks to confirm that the database itself is being updated.

This pull request, which currently works with MySQL, Postgres, and MongoDB, pulls a current copy of the relevant World table and compares the updated values with their previously stored values. It throws a FAIL if none of them have been updated, a PASS if all of them have been updated, and a WARN if more then 5% of them remain unchanged. The reason for this 5% is to allow for the 1 in 10,000 chance that an entry is updated to the same value it previously held.

…base

knewmanTE · 2017-01-31T20:28:18Z

Initial Travis test seems to indicate that Elixer/Phoenix isn't updating the Postgres database properly.

knewmanTE · 2017-01-31T20:57:34Z

Java/curacao fail seems unrelated, given that it doesn't have any database tests. Possibly related to the apt-get key error that we occasionally get on Travis?

knewmanTE · 2017-02-01T01:47:30Z

Okay, so of the tests failing on Travis, I'm seeing the following:

Failing because it isn't updating World table
- Elixer/Phoenix
- Python/aiohttp
- Python/api_hour
Failing because its only partially updating World table
- Javascript/express (only failing on 1 update test, so maybe a fluke)
- Javascript/nodejs-mongodb-raw (all updates failing, except for the 500 queries test, which is updating 17 of the 500 items)
- PHP/php-php5-raw and PHP/php-php7-raw (both are only failing on the empty string, 0, and "foo" parameters, which feels maybe indicative of a larger problem with the test implementation, since the likelihood of failing to update Specifically on those 3 tests feels like more than a coincidence)
- PHP/phreeze (failing on "foo" and 0 queries, so might be a coincidence or might be related to the issue with php-php5-raw and php-php7-raw)
Failing for unrelated reasons
- Lua/octopus
- Perl/dancer
- Perl/plack

One problem that does seem a bit hard to address is what we re seeing in Javascript/express. Because most of the queries are only for 1 or 2 items, even a single instance of a row being set to its same randomNumber will cause a FAIL, because my verification test will see it as 100% of the items for that endpoint were not properly updated. I'm open to any feedback or thoughts for how to handle this. I could prevent FAILs for requests that return only 1 or 2 items, and instead just throw a WARN. The downside of this is that it would mean the only test that really has any weight in determining whether a framework passes or fails is the 501 items, but maybe that's a good thing?

NateBrady23 · 2017-02-01T15:17:01Z

I think that's fine actually, to only do the check with the 5% margin of error on the 500 updates one. With the amount of testing we do with travis, we probably don't want to get the occasional error on the smaller updates. Thinking ahead to continuous benchmarking, we don't want false failing there either.

Ping @msmith-techempower

NateBrady23 · 2017-02-01T15:26:48Z

Correct me if I'm wrong @knewmanTE, it looks like you're verifying whether or not the updates were done by looking at the json response from the framework. Since you've already built the get_current_world_table function, maybe we should verify that the updates happened by checking the database and verifying that the json response is correct. By just looking at the json response, a framework might still not be persisting the update and just sending back a correct response. For instance, I think in the Elixir/phoenix case, they're actually persisting the update but sending back the wrong json response. I think we should validate both.

… on 501 query test

knewmanTE · 2017-02-01T19:13:07Z

Just pushed an update that does a few things differently:

directly reads the database before and after to compare, rather than relying on the response body
only run the verification on the 501 query test (or rather, on the test that pushes the MAX query size, in case we ever change that)
set the thresholds at the following:
- 0 items updated --> FAIL
- <95% items updated --> FAIL
- <99% items updated --> WARN

knewmanTE · 2017-02-01T21:52:51Z

Hmm, looks like a lot of MongoDB tests are failing now saying that the DB wasn't changed. I will investigate. Anyone know of any issues or delays that MongoDB might have with updating the DB immediately?

greenlaw110 · 2017-02-01T23:22:02Z

@knewmanTE I just checked the build log https://travis-ci.org/TechEmpower/FrameworkBenchmarks/jobs/197405309 and it looks like actframework's mongo test also failed. I will check on my local environment

greenlaw110 · 2017-02-01T23:51:35Z

@knewmanTE I did some test on my local environment and found the data has been changed, here is what I've done:

log the World instance before update it:
run a test with 1 record update:
check the log:
check mongodb:

So it proved that the mongodb data has been updated

I think it might be related to the testing procedure

knewmanTE · 2017-02-01T23:56:50Z

@greenlaw110 yeah, my guess is something got a bit goofed up when I switched from comparing the response body to comparing the database itself. Hopefully I'll have a concrete answer and fix to share soon.

knewmanTE · 2017-02-02T00:57:17Z

Okay, after a bit of testing, I'm noticing two potentially related, but maybe separate issues:

First, the after-the-update database that I am loading isn't updating properly. It happens mostly on MongoDB tests, but I am registering no changes between the before and after database. However,this only happening on some tests. For example, Java/dropwizard-mongodb is returning the exact same database as the before-the-update database, whereas Javascript/hapi is updating mostly fine.

The second issue is that Javascript/hapi is consistently only registering about 490-495 of the ~500 updates it should be making. The odds of consistently re-setting 5-10 items to the value they had before is fairly low, as each item has a 1/10,000 chance of that happening, so we should only be seeing a single re-set about 1 out of every 20 runs.

As a sanity check against a race condition occurring between the frameworks setting values in the database and my script reading the values, I set a manual 20 second sleep before fetching the after-the-update database and am still seeing both issues.

So that's where I'm at right now. I'll continue investigating to see if I can discern anything else.

knewmanTE · 2017-02-02T01:03:39Z

Aha! I believe I've at least figured out the Dropwizard issue.

For whatever reason, some of our MongoDB tests use lowercase table names (world, fortune) while others use uppercase table names (World, Fortune). My test is only checking against the lowercase World table, but it's my guess that all of the breaking frameworks are ones that expect an uppercase table name.

Switching my test to use World for dropwizard gives me the same result as the hapi test (only detecting 493/500 updates).

Unfortunately, as far as I know, there isn't an easy way to determine whether a framework is using the world or World table outside of the framework's implementation itself.

knewmanTE · 2017-02-02T01:37:46Z

Okay, I think I've gotten to the bottom of the other issue. I think it might just have to do with weak pseudo-random number generators. For example, in a recent test of dropwizard-mongodb, even though the /update?queries=501 response body contained 500 entries, that is because it is returning a list of JSON objects, which doesn't account for duplicates. If you convert this list into a single JSON object, the duplicates generated by the framework merge together and the resulting object only had 488 unique items.

If this is the case (and I believe it is), my other concern is how to handle WARN vs. PASS responses. If pretty much every test is going to fail to update 500 unique elements, no test will receive a PASS for update, given that I am only issuing a PASS when there are at or very close to 500 successful updates (<1% margin of error). That being said, as long as the results are within the 5% margin of error (which they all seem to be), they will receive a WARN, which counts as an overall Travis pass.

So we could handle this a few different ways:

go back to comparing the response body (allowing for checking duplicate changes)
adjust the thresholds to provide a PASS if you are within 5%, a WARN if you are in a more liberal band (say, 5-10%) and a FAIL otherwise
leave it as is and accept that all/most tests will have a WARN, unless their language has a good pseudo-random number generator.

@nbrady-techempower I'm curious to hear your thoughts on the matter. Also, if you have any ideas on a reasonably efficient way to test against both MongoDB tables that doesn't involve just fetching all 10,000 items from both the world and World tables, I'd love to hear them.

greenlaw110 · 2017-02-02T02:13:54Z

@knewmanTE Good spot on!

I think it might be worth to specify the name of the tables/collections, say we want to make all table name/collection name be lowercase or vice verse.

knewmanTE · 2017-02-02T15:46:56Z

@greenlaw110 I think we tried to remove one set of the tables with #2237, but ran into some sort of issue where a number of MongoDB ORMs were automatically changing a table name to lower or uppercase.

knewmanTE · 2017-02-02T19:26:07Z

Opened #2545 to hopefully address the MongoDB table issue.

NateBrady23 · 2017-02-03T18:49:14Z

@greenlaw110 I don't know if you were working on the dropwizard tests, but it also appears that it's using mongo's findAndModify instead of making the two trips specified in the update test rules.

Edit: Maybe I'm wrong. It looks like it's findById and then findAndModify. Is findAndModify more overhead than just update?

greenlaw110 · 2017-02-03T21:52:39Z

@nbrady-techempower I checked dropwizard mongo db impl code and confirmed it is using mongo's findAndModify feature.

The action handler code:

The dao updateQueries code:

So it will use one trip to complete both find and modify operation. This tech is also used in Act's code:

I can fix Act's impl into

However I am not the author of dropwizard test set. You need to contact them

NateBrady23 · 2017-02-03T22:00:31Z

@greenlaw110 Thanks for looking in to that!!

greenlaw110 · 2017-02-03T22:12:28Z

@nbrady-techempower I think the best way to enforce two trip updates is to have the server send response as:

{"id": 1234, "oldRandomNumber": 3333, "randomNumber": 2222}

It is not possible to get the oldRandomeNumber if app use findAndModify

NateBrady23 · 2017-02-06T18:47:05Z

@knewmanTE I've restarted the perl frameworks because it looked like their package manager resource was temporarily unavailable. Everything with those seem fine now.

knewmanTE · 2017-02-07T18:19:32Z

Starting up a new run now that #2545 has been merged into master.

… collections

knewmanTE · 2017-02-07T21:12:53Z

Alright, still getting a bunch of fails. I just pushed some commits that fix a few things:

Java/vertx-web had the database for its MySQL test listed as Postgres, causing the update verification script to try to load the wrong database.
tweaked the warn/fail thresholds to match my second suggestion here.
add some checks when loading the MongoDB database where we can pull from the id or _id field, since it seems that some frameworks are still doing that thing where they remove the id field when updating an entry.

knewmanTE · 2017-02-08T01:40:38Z

Pushed a fix that should at least address a couple JavaScript bugs.

knewmanTE · 2017-02-08T02:22:27Z

So, most of the fails we're seeing at the moment are with the Update test. The big challenge will be determining which of the following cases we're dealing with:

There is something wrong with the verification test
The verification test is working as intended and because it is in place, we are now seeing all of the frameworks that aren't updating properly
A combination of 1 and 2, where some incorrect implementations are being caught but there is also something wrong with the verification test

I might end up getting kind of busy over the next week or two, so any help checking on broken test implementations or looking through the verification test code is greatly appreciated!

greenlaw110 · 2017-02-14T03:10:20Z

@knewmanTE I have checked the only Java failed case: Java/permeagility and found:

the app comment out the update statement in the update test:
.
I always get 404 when requesting to /updates?queries= endpoint, comment/uncomment doesn't make difference
the app is using OrientDB as per https://sourceforge.net/projects/permeagility/

knewmanTE · 2017-02-14T21:38:58Z

@greenlaw110 thanks for the extra pair of eyes! I had started a discussion with the Permeagility maintainer about his framework, but it doesn't look like he's properly added OrientDB to our suite yet. Currently, it's building an OrientDB database on the same machine as the app server, rather than building it isolated on the DB machine, so I'm okay with that failing since its database implementation needs an overhaul in general.

My Update verification currently only works for Postgres, MySQL, and MongoDB, but it shouldn't be too hard to add additional databases as they are added to the suite.

msmith-techempower · 2018-01-09T19:26:24Z

This has been open over a year... let's close the PR for now and if someone has time, they can pick it up later.

knewmanTE · 2018-01-09T19:35:50Z

hopefully one day we can get this working.

* remove unused test directory * move toolset print statements to functions * include changes from #2536 * During verify updates, check for `world` or `World` table updates (postgres) * make sure postgres is checking both world tables

* remove unused test directory * move toolset print statements to functions * include changes from TechEmpower#2536 * During verify updates, check for `world` or `World` table updates (postgres) * make sure postgres is checking both world tables

knewmanTE added 8 commits January 30, 2017 12:33

added a test to confirm that an update test actually changed the data…

d7a8173

…base

initial test at postgres verification

65b58f5

update verification working for mysql and postgres

d833705

added MongoDB update verification

5792a54

specify DBHOST when connecting to MongoClient

5b512cd

fix MySQL-python package name

0d9a739

add python-dev dependency

06e0c26

don't load World table for Queries requests

0180da1

knewmanTE added the Enhance: Toolset label Jan 31, 2017

knewmanTE changed the title ~~More robust updates verification~~ More robust Updates verification Jan 31, 2017

NateBrady23 mentioned this pull request Feb 1, 2017

Fix updates test for Elixir/phoenix #2539

Merged

directly compare database before and after, and only run verification…

24cc678

… on 501 query test

knewmanTE mentioned this pull request Feb 2, 2017

Remove capitalized mongo tables #2545

Merged

knewmanTE changed the title ~~More robust Updates verification~~ [WIP] More robust Updates verification Feb 2, 2017

knewmanTE mentioned this pull request Feb 3, 2017

correcting returned value in flask-mysql-raw - updates #2551

Closed

Merge branch 'master' into more-robust-updates-verification

fce1678

knewmanTE added 4 commits February 7, 2017 12:39

only add item to mongodb JSON if it has id and randomNumber keys

8d61314

account for frameworks that remove the id key when updating the mongo…

8fb58d1

… collections

tweaked update verification threshold

25b9aef

fixed an error in DB name on vertx-web benchmark_config

6f87f91

knewmanTE added 2 commits February 7, 2017 14:49

fix express mongo test

35ab821

fix nodejs-mongodb-raw update

37c3fde

check for existence of dict keys before trying to access them

de0f0e3

NateBrady23 mentioned this pull request Feb 14, 2017

Act optimize update test #2569

Merged

NateBrady23 mentioned this pull request May 23, 2017

Verify the dart-start and dart-stream results #819

Closed

NateBrady23 mentioned this pull request May 30, 2017

Fix express & nodejs update tests #2833

Merged

msmith-techempower closed this Jan 9, 2018

NateBrady23 added a commit to NateBrady23/FrameworkBenchmarks that referenced this pull request Jan 26, 2018

include changes from TechEmpower#2536

aefe797

NateBrady23 mentioned this pull request Jan 26, 2018

General Toolset Improvements (WIP) #3210

Merged

11 tasks

Conversation

knewmanTE commented Jan 31, 2017

Uh oh!

knewmanTE commented Jan 31, 2017

Uh oh!

knewmanTE commented Jan 31, 2017

Uh oh!

knewmanTE commented Feb 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NateBrady23 commented Feb 1, 2017

Uh oh!

NateBrady23 commented Feb 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

knewmanTE commented Feb 1, 2017

Uh oh!

knewmanTE commented Feb 1, 2017

Uh oh!

greenlaw110 commented Feb 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greenlaw110 commented Feb 1, 2017

Uh oh!

knewmanTE commented Feb 1, 2017

Uh oh!

knewmanTE commented Feb 2, 2017

Uh oh!

knewmanTE commented Feb 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

knewmanTE commented Feb 2, 2017

Uh oh!

greenlaw110 commented Feb 2, 2017

Uh oh!

knewmanTE commented Feb 2, 2017

Uh oh!

knewmanTE commented Feb 2, 2017

Uh oh!

NateBrady23 commented Feb 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greenlaw110 commented Feb 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NateBrady23 commented Feb 3, 2017

Uh oh!

greenlaw110 commented Feb 3, 2017

Uh oh!

NateBrady23 commented Feb 6, 2017

Uh oh!

knewmanTE commented Feb 7, 2017

Uh oh!

knewmanTE commented Feb 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

knewmanTE commented Feb 8, 2017

Uh oh!

knewmanTE commented Feb 8, 2017

Uh oh!

greenlaw110 commented Feb 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

knewmanTE commented Feb 14, 2017

Uh oh!

msmith-techempower commented Jan 9, 2018

Uh oh!

knewmanTE commented Jan 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

knewmanTE commented Feb 1, 2017 •

edited

Loading

NateBrady23 commented Feb 1, 2017 •

edited

Loading

greenlaw110 commented Feb 1, 2017 •

edited

Loading

knewmanTE commented Feb 2, 2017 •

edited

Loading

NateBrady23 commented Feb 3, 2017 •

edited

Loading

greenlaw110 commented Feb 3, 2017 •

edited

Loading

knewmanTE commented Feb 7, 2017 •

edited

Loading

greenlaw110 commented Feb 14, 2017 •

edited

Loading