-
Notifications
You must be signed in to change notification settings - Fork 512
METRON-1004: Travis CI - Job Exceeded Maximum Time Limit #624
Conversation
If you succeed here, you'll be my best friend. We're drowning in integration test drama over here. :) |
I moved work from a mostly unrelated branch over here. Specifically, it attempts to get the InMemoryComponents to be reused across a given test class. This is particularly helpful for ProfilerIntegrationTest and MaasIntegrationTest. It's not extended to everything (in particular single test only classes). It does make the build run faster, but not necessarily fast enough to solve this problem. There's also some cleanup that needs to be taken care of. Skadoo appears to be mostly resolved on this branch for some reason. It also performs an actual thread.stop(), because it needs to actually kill. |
This looks really good. After trying to find a better 'use' or 'sequencing' to resolve some problems - and failing - I am really glad to see this. I do have a couple of things I have tried that I might think are worth considering with this. |
This does look good. A couple of observations in no particular order of importance; just wanted to get this out there for discussion. Considering the overheadI want to consider the overhead not in our tests for a moment. In the last run, I count the following timings:
We should factor that in. Where to FocusBuild TimeThe natural conclusion is to focus on the long pole, those integration tests, but we may be served to also consider the build time. Our build takes a long time and we depend upon parallelization to make the build return in a sensible time (the user time for the build is 26 minutes!). Furthermore, our build is extremely IO heavy due to the shading that we (necessarily) do. While we are on a shared system with the rest of the apache projects, I think reducing the IO burden of our build. While I think that shading is important, we have a very ham-fisted way of doing it. We shade for two reasons:
One issue is that if we consider the tree of projects induced by their dependent nature, is that we shade non-leaf projects for purpose of relocation. I propose we stop doing that. Let's take, for instance, I propose fixing this one of two ways:
I think the first may be the easiest to achieve and most surgical. Ultimately, it may even be advantageous to have a single jar created as the deployable output of our process (or maybe a small handful representing the independent subcomponents: Integration TestsObviously the integration tests are the long pole in the tent. A couple of thoughts on these:
|
I submitted PRs against this branch to incorporate the suggested changes above for:
I submit them without credit. |
As a note, what I have is currently the first steps towards reusing infra. It's not perfect, and it's not reused across classes. There was an attempt to use the build matrix to split fast and slow tests, but it resulted in inconsistent failures. Seems like Maven gets tangled up between the builds. Could merit further investigation. it'll increase processing time (because both unit and integration tests have to actually build), but should avoid having either portion of the build timeout. |
@@ -90,23 +90,6 @@ public boolean accept(File dir, String name) { | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think until this is completely deprecated, we should keep the test, but disable it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test is added back in, but with an @Ignore
and a comment. Let me know if you want anything else there.
.travis.yml
Outdated
@@ -17,7 +17,7 @@ before_install: | |||
- export PATH=$M2_HOME/bin:$PATH | |||
script: | |||
- | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it makes it slow right?
Can we document with the commits, as you go, the rationale behind the changes, so we can look back and understand a little bit?
"why did we get rid of FOO?"
Let me check the commit log
" Remove foo. It is seen to cause an increase of X in Y and do z. it is also pretty snarky and fresh"
"Oh, that makes sense"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my bad. Usually I tend to consider the commits less important because it's usually a full feature, and it's just minor changes / fixes afterwards.
I'll try to make sure the messages are easier to follow, since this is pretty ongoing until it's consistent.
As @ottobackwards points out, there should be some more explanation for a recent commit. 6562246 Specifically it turns off the jacoco:prepare-agent from the Travis build and allows us to just run the tests directly without it. We don't actually produce the reports here, so it's pretty extraneous. Locally this resulted in:
|
I added a fix to actually clear out the correct directory of Maven artifacts before caching. In a separate, experimental branch, there's an attempt to cache the artifacts resulting from npm. See: https://github.com/justinleet/metron/tree/caching and https://travis-ci.org/justinleet/metron. This required a run without the integration tests on in order to make it to the populating the cache successfully, then reenabling them next commit. At this point we do have intermittent successful builds on my Travis, although I'm doubt it's consistent. |
Most recent commits attempt to fix what appears to be a preexisting intermittent test issue regarding Kafka. |
@@ -61,6 +62,7 @@ | |||
private static final int KAFKA_RETRY = 10; | |||
@Autowired | |||
private KafkaComponent kafkaWithZKComponent; | |||
private ComponentRunner runner; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this and setup()
static and make the setup/teardown be @BeforeClass/@afterclass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. setup()
uses kafkaWithZKComponent, which is @Autowired
. I don't think that can be static, so there's probably more refactoring involved. I can look at it, if it you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is that critical to getting us back to a working travis build. Just my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at it a bit this morning and @justinleet is right, it's not easily done. I'm not super happy about spinning up and down components per test-case considering the bulk of this PR has been around removing exactly that. That being said, the test is ~20s, so I'm not going to lose any sleep over it on that regard.
I think the part that I don't like most is the mixture of semantics between Spring's wiring of components and JUnit's setting up of components that we can't mix and match. It leaves a muddle, frankly:
- If someone needs more heavy-weight components in other tests in this project, they will be forced into spinning up and down infrastructure per test-case, which has proven costly and was a prominent thing that we targeted in this PR for removal.
- Our approach leaves us doing component reuse in a way that is neither standard based on the other components nor standard based on spring, which is confusing and will inevitably lead to inefficient and confusing tests.
There is evidently a fix from spring here and I'm creating METRON-1009 to capture the reversion of this unit test and uptaking the fix provided by spring or a port of the fix by spring to our components.
@merrimanr is right, it's not a requirement to get this perfect as long as we've removed the intermittent nature of the test failure (and I feel we have). It's not so bad that I am -1, but it's ugly, confusing and it will lead to errors in the future IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to add a comment specifically pointing to that ticket, and noting that this test is an outlier in terms of how we actually test.
@cestella Is there anything else in particular that you'd like to see in order to at least mitigate risk of propagating test issues like this until we upgrade spring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I think we've done enough. We, honestly, need a test guidance document similar to the parser testing document that I created as part of this ticket. It's become evident from this exercise that we can provide some over-arching guidance here to limit the probability of us arriving in this situation again. But that's a follow-on, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with everything that's been said
try { | ||
runner.start(); | ||
} catch (UnableToStartException e) { | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we should throw a runtime exception here, probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just kill the try/catch entirely? If it fails, it might as well blow up the tests anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be ok with that. This will just mask an error until it fails later less explicably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot this was in a run(). I changed it to a runtime exception, like you originally suggested.
NC |
Sigh, accidentally submitted that while kicking travis |
+1 by inspection. |
I am +1 on these changes. |
I'm gonna do one more to hit a round ten successes, then merge assuming no issues. Shouldn't be too long. |
+1. Really nice job guys. Looking forward to working builds again! |
Contributor Comments
This PR cleans up and speeds up various issues we've seen on Travis.
Pull Request Checklist
Thank you for submitting a contribution to Apache Metron.
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.
In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:
For all changes:
For code changes:
Have you included steps to reproduce the behavior or problem that is being changed or addressed?
Have you included steps or a guide to how the change may be verified and tested manually?
Have you ensured that the full suite of tests and checks have been executed in the root incubating-metron folder via:
Have you written or updated unit tests and or integration tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?
For documentation related changes:
Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via
site-book/target/site/index.html
:Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.