Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized various tests. #13496

Merged
merged 2 commits into from Oct 16, 2020
Merged

Optimized various tests. #13496

merged 2 commits into from Oct 16, 2020

Conversation

ngnpope
Copy link
Member

@ngnpope ngnpope commented Oct 5, 2020

See commit messages for explanations.

@django django deleted a comment from ngnpope Oct 16, 2020
@felixxm felixxm changed the title Various minor changes to speed up tests. Optimized various tests. Oct 16, 2020
@felixxm
Copy link
Member

felixxm commented Oct 16, 2020

Thanks 🚀 On Oracle:

test before after
test_include_materialized_views 148.0s 12.717s
test_include_views 150.0s 9.971s

There are 8 cache backends to test and each test of touch() takes
~7s → ~56s. This seems excessive and it feels like we should be able
to reduce this significantly. time.sleep() accepts floating point
values and is guaranteed to sleep for at least the number of seconds
specified as of Python 3.5.

We do run the risk that there may be spurious test failures from this,
but that already seemed to be the case anyway.
@ngnpope
Copy link
Member Author

ngnpope commented Oct 16, 2020

That's exactly what I was hoping for!

@felixxm felixxm merged commit 177a49e into django:master Oct 16, 2020
@ngnpope ngnpope deleted the speed-up-tests branch October 16, 2020 12:23
@claudep
Copy link
Member

claudep commented Oct 17, 2020

That's great. however not the final word. I already got a failing test on a PR which could be related (cache.tests.PyLibMCCacheTests.test_touch, https://djangoci.com/job/pull-requests-bionic/database=postgis,label=bionic-pr,python=python3.6/8623/).

@ngnpope
Copy link
Member Author

ngnpope commented Oct 17, 2020

@claudep Indeed, but these failures did crop up on occasion before. I don't have a rate of how often, but these things are hard to measure reliably anyway.

I thought the saving made it at least worth trying, on the basis that we can review over the coming days/weeks and tweak or revert if we perceive it to be worse.

So thank you for the first report! 😄

In case you missed it, I discussed this above: #13496 (comment)

@timgraham
Copy link
Member

I've just seen the same failure of test_touch on my PR.

@ngnpope
Copy link
Member Author

ngnpope commented Oct 17, 2020

Thanks for the heads-up @timgraham.

So here is the relevant bit of the test:

        # cache.touch() updates the timeout.
        cache.set('expire1', 'very quickly', timeout=1)
        self.assertIs(cache.touch('expire1', timeout=2), True)
        time.sleep(1.0)
        self.assertIs(cache.has_key('expire1'), True)
        time.sleep(1.5)
        self.assertIs(cache.has_key('expire1'), False)

So the flow is:

  1. Set with a short timeout.
  2. Update to a longer timeout while verifying the key is still set.
  3. Wait long enough for the original timeout to have expired.
  4. Then check the key is still set.
  5. Wait long enough for the longer timeout to expire.
  6. Then check the key is not set.

We're failing at step 4. For that to happen we must be losing a whole second somewhere… ☹️

I note that there is some guidance on troubleshooting timeouts. I wonder if anything crops up in the memcached stats related to maximum connections @felixxm?

@ngnpope
Copy link
Member Author

ngnpope commented Oct 19, 2020

Hmm. Just seen a lot more touch failures.

It is rather strange that we would need to wait for multiple seconds. Are all parallel test runs executed on the same memcached instance on CI? I presume that there is some sort of key prefixing if so, otherwise this is bound to fail. Or are too many parallel runs causing too many connections or too much load on memcached per the link in my previous comment.

Perhaps it is just easier to revert 177a49e, but it feels like there is some underlying problem…

@felixxm
Copy link
Member

felixxm commented Oct 19, 2020

I presume that there is some sort of key prefixing if so, otherwise this is bound to fail.

Yes, based on a Python version, database and build name, so it's properly isolated.

Perhaps it is just easier to revert 177a49e, but it feels like there is some underlying problem…

Agreed let's revert it (#13565), there is no doubt that it caused more frequent failures. We need to investigate this more closely 🕵️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants