Skip to content
This repository was archived by the owner on Mar 24, 2026. It is now read-only.

[WIP] More robust Updates verification#2536

Closed
knewmanTE wants to merge 17 commits intoTechEmpower:masterfrom
knewmanTE:more-robust-updates-verification
Closed

[WIP] More robust Updates verification#2536
knewmanTE wants to merge 17 commits intoTechEmpower:masterfrom
knewmanTE:more-robust-updates-verification

Conversation

@knewmanTE
Copy link
Copy Markdown
Contributor

Currently, our verification for the /updates endpoints is the same as the /queries endpoints, so it is checking for properly formatted JSON, but doesn't actually do any checks to confirm that the database itself is being updated.

This pull request, which currently works with MySQL, Postgres, and MongoDB, pulls a current copy of the relevant World table and compares the updated values with their previously stored values. It throws a FAIL if none of them have been updated, a PASS if all of them have been updated, and a WARN if more then 5% of them remain unchanged. The reason for this 5% is to allow for the 1 in 10,000 chance that an entry is updated to the same value it previously held.

@knewmanTE knewmanTE changed the title More robust updates verification More robust Updates verification Jan 31, 2017
@knewmanTE
Copy link
Copy Markdown
Contributor Author

Initial Travis test seems to indicate that Elixer/Phoenix isn't updating the Postgres database properly.

@knewmanTE
Copy link
Copy Markdown
Contributor Author

Java/curacao fail seems unrelated, given that it doesn't have any database tests. Possibly related to the apt-get key error that we occasionally get on Travis?

@knewmanTE
Copy link
Copy Markdown
Contributor Author

knewmanTE commented Feb 1, 2017

Okay, so of the tests failing on Travis, I'm seeing the following:

  • Failing because it isn't updating World table
    • Elixer/Phoenix
    • Python/aiohttp
    • Python/api_hour
  • Failing because its only partially updating World table
    • Javascript/express (only failing on 1 update test, so maybe a fluke)
    • Javascript/nodejs-mongodb-raw (all updates failing, except for the 500 queries test, which is updating 17 of the 500 items)
    • PHP/php-php5-raw and PHP/php-php7-raw (both are only failing on the empty string, 0, and "foo" parameters, which feels maybe indicative of a larger problem with the test implementation, since the likelihood of failing to update Specifically on those 3 tests feels like more than a coincidence)
    • PHP/phreeze (failing on "foo" and 0 queries, so might be a coincidence or might be related to the issue with php-php5-raw and php-php7-raw)
  • Failing for unrelated reasons
    • Lua/octopus
    • Perl/dancer
    • Perl/plack

One problem that does seem a bit hard to address is what we re seeing in Javascript/express. Because most of the queries are only for 1 or 2 items, even a single instance of a row being set to its same randomNumber will cause a FAIL, because my verification test will see it as 100% of the items for that endpoint were not properly updated. I'm open to any feedback or thoughts for how to handle this. I could prevent FAILs for requests that return only 1 or 2 items, and instead just throw a WARN. The downside of this is that it would mean the only test that really has any weight in determining whether a framework passes or fails is the 501 items, but maybe that's a good thing?

@NateBrady23
Copy link
Copy Markdown
Member

I think that's fine actually, to only do the check with the 5% margin of error on the 500 updates one. With the amount of testing we do with travis, we probably don't want to get the occasional error on the smaller updates. Thinking ahead to continuous benchmarking, we don't want false failing there either.

Ping @msmith-techempower

@NateBrady23
Copy link
Copy Markdown
Member

NateBrady23 commented Feb 1, 2017

Correct me if I'm wrong @knewmanTE, it looks like you're verifying whether or not the updates were done by looking at the json response from the framework. Since you've already built the get_current_world_table function, maybe we should verify that the updates happened by checking the database and verifying that the json response is correct. By just looking at the json response, a framework might still not be persisting the update and just sending back a correct response. For instance, I think in the Elixir/phoenix case, they're actually persisting the update but sending back the wrong json response. I think we should validate both.

@knewmanTE
Copy link
Copy Markdown
Contributor Author

Just pushed an update that does a few things differently:

  • directly reads the database before and after to compare, rather than relying on the response body
  • only run the verification on the 501 query test (or rather, on the test that pushes the MAX query size, in case we ever change that)
  • set the thresholds at the following:
    • 0 items updated --> FAIL
    • <95% items updated --> FAIL
    • <99% items updated --> WARN

@knewmanTE
Copy link
Copy Markdown
Contributor Author

Hmm, looks like a lot of MongoDB tests are failing now saying that the DB wasn't changed. I will investigate. Anyone know of any issues or delays that MongoDB might have with updating the DB immediately?

@greenlaw110
Copy link
Copy Markdown
Contributor

greenlaw110 commented Feb 1, 2017

@knewmanTE I just checked the build log https://travis-ci.org/TechEmpower/FrameworkBenchmarks/jobs/197405309 and it looks like actframework's mongo test also failed. I will check on my local environment

@greenlaw110
Copy link
Copy Markdown
Contributor

@knewmanTE I did some test on my local environment and found the data has been changed, here is what I've done:

  1. log the World instance before update it:
    image

  2. run a test with 1 record update:
    image

  3. check the log:
    image

  4. check mongodb:
    image

So it proved that the mongodb data has been updated

I think it might be related to the testing procedure

@knewmanTE
Copy link
Copy Markdown
Contributor Author

@greenlaw110 yeah, my guess is something got a bit goofed up when I switched from comparing the response body to comparing the database itself. Hopefully I'll have a concrete answer and fix to share soon.

@knewmanTE
Copy link
Copy Markdown
Contributor Author

Okay, after a bit of testing, I'm noticing two potentially related, but maybe separate issues:

First, the after-the-update database that I am loading isn't updating properly. It happens mostly on MongoDB tests, but I am registering no changes between the before and after database. However,this only happening on some tests. For example, Java/dropwizard-mongodb is returning the exact same database as the before-the-update database, whereas Javascript/hapi is updating mostly fine.

The second issue is that Javascript/hapi is consistently only registering about 490-495 of the ~500 updates it should be making. The odds of consistently re-setting 5-10 items to the value they had before is fairly low, as each item has a 1/10,000 chance of that happening, so we should only be seeing a single re-set about 1 out of every 20 runs.

As a sanity check against a race condition occurring between the frameworks setting values in the database and my script reading the values, I set a manual 20 second sleep before fetching the after-the-update database and am still seeing both issues.

So that's where I'm at right now. I'll continue investigating to see if I can discern anything else.

@knewmanTE
Copy link
Copy Markdown
Contributor Author

knewmanTE commented Feb 2, 2017

Aha! I believe I've at least figured out the Dropwizard issue.

For whatever reason, some of our MongoDB tests use lowercase table names (world, fortune) while others use uppercase table names (World, Fortune). My test is only checking against the lowercase World table, but it's my guess that all of the breaking frameworks are ones that expect an uppercase table name.

Switching my test to use World for dropwizard gives me the same result as the hapi test (only detecting 493/500 updates).

Unfortunately, as far as I know, there isn't an easy way to determine whether a framework is using the world or World table outside of the framework's implementation itself.

@knewmanTE
Copy link
Copy Markdown
Contributor Author

Okay, I think I've gotten to the bottom of the other issue. I think it might just have to do with weak pseudo-random number generators. For example, in a recent test of dropwizard-mongodb, even though the /update?queries=501 response body contained 500 entries, that is because it is returning a list of JSON objects, which doesn't account for duplicates. If you convert this list into a single JSON object, the duplicates generated by the framework merge together and the resulting object only had 488 unique items.

If this is the case (and I believe it is), my other concern is how to handle WARN vs. PASS responses. If pretty much every test is going to fail to update 500 unique elements, no test will receive a PASS for update, given that I am only issuing a PASS when there are at or very close to 500 successful updates (<1% margin of error). That being said, as long as the results are within the 5% margin of error (which they all seem to be), they will receive a WARN, which counts as an overall Travis pass.

So we could handle this a few different ways:

  1. go back to comparing the response body (allowing for checking duplicate changes)
  2. adjust the thresholds to provide a PASS if you are within 5%, a WARN if you are in a more liberal band (say, 5-10%) and a FAIL otherwise
  3. leave it as is and accept that all/most tests will have a WARN, unless their language has a good pseudo-random number generator.

@nbrady-techempower I'm curious to hear your thoughts on the matter. Also, if you have any ideas on a reasonably efficient way to test against both MongoDB tables that doesn't involve just fetching all 10,000 items from both the world and World tables, I'd love to hear them.

@greenlaw110
Copy link
Copy Markdown
Contributor

@knewmanTE Good spot on!

I think it might be worth to specify the name of the tables/collections, say we want to make all table name/collection name be lowercase or vice verse.

@knewmanTE
Copy link
Copy Markdown
Contributor Author

@greenlaw110 I think we tried to remove one set of the tables with #2237, but ran into some sort of issue where a number of MongoDB ORMs were automatically changing a table name to lower or uppercase.

@knewmanTE
Copy link
Copy Markdown
Contributor Author

Opened #2545 to hopefully address the MongoDB table issue.

@knewmanTE knewmanTE changed the title More robust Updates verification [WIP] More robust Updates verification Feb 2, 2017
@NateBrady23
Copy link
Copy Markdown
Member

NateBrady23 commented Feb 3, 2017

@greenlaw110 I don't know if you were working on the dropwizard tests, but it also appears that it's using mongo's findAndModify instead of making the two trips specified in the update test rules.

Edit: Maybe I'm wrong. It looks like it's findById and then findAndModify. Is findAndModify more overhead than just update?

@greenlaw110
Copy link
Copy Markdown
Contributor

greenlaw110 commented Feb 3, 2017

@nbrady-techempower I checked dropwizard mongo db impl code and confirmed it is using mongo's findAndModify feature.

The action handler code:

image

The dao updateQueries code:
image

So it will use one trip to complete both find and modify operation. This tech is also used in Act's code:

image

I can fix Act's impl into
image

However I am not the author of dropwizard test set. You need to contact them

@NateBrady23
Copy link
Copy Markdown
Member

@greenlaw110 Thanks for looking in to that!!

@greenlaw110
Copy link
Copy Markdown
Contributor

@nbrady-techempower I think the best way to enforce two trip updates is to have the server send response as:

{"id": 1234, "oldRandomNumber": 3333, "randomNumber": 2222}

It is not possible to get the oldRandomeNumber if app use findAndModify

@NateBrady23
Copy link
Copy Markdown
Member

@knewmanTE I've restarted the perl frameworks because it looked like their package manager resource was temporarily unavailable. Everything with those seem fine now.

@knewmanTE
Copy link
Copy Markdown
Contributor Author

Starting up a new run now that #2545 has been merged into master.

@knewmanTE
Copy link
Copy Markdown
Contributor Author

knewmanTE commented Feb 7, 2017

Alright, still getting a bunch of fails. I just pushed some commits that fix a few things:

  • Java/vertx-web had the database for its MySQL test listed as Postgres, causing the update verification script to try to load the wrong database.
  • tweaked the warn/fail thresholds to match my second suggestion here.
  • add some checks when loading the MongoDB database where we can pull from the id or _id field, since it seems that some frameworks are still doing that thing where they remove the id field when updating an entry.

@knewmanTE
Copy link
Copy Markdown
Contributor Author

Pushed a fix that should at least address a couple JavaScript bugs.

@knewmanTE
Copy link
Copy Markdown
Contributor Author

So, most of the fails we're seeing at the moment are with the Update test. The big challenge will be determining which of the following cases we're dealing with:

  1. There is something wrong with the verification test
  2. The verification test is working as intended and because it is in place, we are now seeing all of the frameworks that aren't updating properly
  3. A combination of 1 and 2, where some incorrect implementations are being caught but there is also something wrong with the verification test

I might end up getting kind of busy over the next week or two, so any help checking on broken test implementations or looking through the verification test code is greatly appreciated!

@greenlaw110
Copy link
Copy Markdown
Contributor

greenlaw110 commented Feb 14, 2017

@knewmanTE I have checked the only Java failed case: Java/permeagility and found:

  1. the app comment out the update statement in the update test:
    image.

  2. I always get 404 when requesting to /updates?queries= endpoint, comment/uncomment doesn't make difference

  3. the app is using OrientDB as per https://sourceforge.net/projects/permeagility/

@knewmanTE
Copy link
Copy Markdown
Contributor Author

@greenlaw110 thanks for the extra pair of eyes! I had started a discussion with the Permeagility maintainer about his framework, but it doesn't look like he's properly added OrientDB to our suite yet. Currently, it's building an OrientDB database on the same machine as the app server, rather than building it isolated on the DB machine, so I'm okay with that failing since its database implementation needs an overhaul in general.

My Update verification currently only works for Postgres, MySQL, and MongoDB, but it shouldn't be too hard to add additional databases as they are added to the suite.

@msmith-techempower
Copy link
Copy Markdown
Member

This has been open over a year... let's close the PR for now and if someone has time, they can pick it up later.

@knewmanTE
Copy link
Copy Markdown
Contributor Author

hopefully one day we can get this working.

NateBrady23 added a commit to NateBrady23/FrameworkBenchmarks that referenced this pull request Jan 26, 2018
NateBrady23 added a commit that referenced this pull request Feb 6, 2018
* remove unused test directory

* move toolset print statements to functions

* include changes from #2536

* During verify updates, check for `world` or `World` table updates (postgres)

* make sure postgres is checking both world tables
zloster pushed a commit to zloster/FrameworkBenchmarks that referenced this pull request Feb 7, 2018
* remove unused test directory

* move toolset print statements to functions

* include changes from TechEmpower#2536

* During verify updates, check for `world` or `World` table updates (postgres)

* make sure postgres is checking both world tables
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants