Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add performance reports #107

Closed
wants to merge 24 commits into from
Closed

Add performance reports #107

wants to merge 24 commits into from

Conversation

jrbourbeau
Copy link
Member

This uploads performance reports for tests

conftest.py Outdated
with Client(small_cluster) as client:
small_cluster.scale(10)
client.wait_for_workers(10)
client.restart()
yield client
with coiled.performance_report(f"{request.node.originalname}-{UNIQUE_ID}.html"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this, we are storing the performance reports on coiled, but I can't seem to find which is the bucket or storage that contains them. We should also maybe think about how to clean this up.

I see that in the coiled account we have some performance reports but not all of them, not sure exactly what's happening

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't seem to find which is the bucket or storage that contains them

I chatted with @shughes-uk offline and she mentioned today all performance reports are stored in a centralized S3 bucket that only Coiled staff have access to. I don't think this is a blocker for us, but it is good to know.

We should also maybe think about how to clean this up.

@shughes-uk mentioned this shouldn't become a problem unless we create millions of performance reports. I'm inclined to not worry about this for now -- I think it will take us a while before we've created millions of these : )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that in the coiled account we have some performance reports but not all of them, not sure exactly what's happening

@koverholt pointed out that there's currently a limit of 5 hosted performance reports per user. Per the coiled.performance_report API docs

Note each user is limited to 5 hosted reports with each a maximum file size of 10mb.

So if we want to keep performance reports around, then we should upload them to our own S3 bucket

@jrbourbeau jrbourbeau self-assigned this May 3, 2022
@ian-r-rose
Copy link
Contributor

Depending on what you are looking for, the analytics page might get you this information for free

@jrbourbeau
Copy link
Member Author

Depending on what you are looking for, the analytics page might get you this information for free

That's a good point. With both #102 and #79 it would/would have been nice to see a performance report when we notice that something is taking longer than expected. I think this is definitely in the category of "nice to have" but not required.

conftest.py Outdated
with Client(small_cluster) as client:
small_cluster.scale(10)
client.wait_for_workers(10)
client.restart()
yield client
local_file = str(tmp_path / "preformance-report.html")
with performance_report(local_file):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the coiled version of performance reports, we should get hosting for free

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was initially wanting to do, but I believe there's a limit of 5 hosted reports today (xref #107 (comment))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, good point, I had forgotten about that

@ncclementi
Copy link
Contributor

ncclementi commented Jun 9, 2022

@jrbourbeau I added a life-cycle to the s3 prefix (test-scratch/performance-reports) where the files in this "directory" will be removed after 7 days of being added, I haven't found a way of removing them if not accessed (it doesn't seem possible), but I think 7 days is probably enough time to download a file locally if needed.

Would you be ok, would that?

EDIT: Bump this period to 30 days

@ncclementi
Copy link
Contributor

I'm not seeing any new performance report being uploaded to s3, the latest date is 05/27.
I think that some of the changes might have affected this, even though things are not failing...

conftest.py Outdated
local_file = str(tmp_path / "preformance-report.html")
with performance_report(local_file):
yield client
s3.put(local_file, s3_report_url + f"/{request.node.originalname}.html")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to pass any credentials here? I can't find the proper docs, but my guess we are not seeing updates as we are not passing credentials

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, we are providing s3 which has the credentials

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We pass credentials when we create the s3 instance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then there has to be another problem because the files are not being uploaded

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see what I did wrong. I'll have to push up a commit with a fix, but probably not today

conftest.py Outdated Show resolved Hide resolved
conftest.py Outdated Show resolved Hide resolved
@ncclementi
Copy link
Contributor

Ci It's not finished yet but it seems things are working. I'm seeing performance reports uploaded. I wonder if we should log the link same way we do with cluster dumps, at the moment from CI I can't find where the performance reports are.

Screen Shot 2022-06-10 at 3 55 03 PM

This is unrelated, but I'm seeing mismatches warnings. Should we worried about these ones?
Screen Shot 2022-06-10 at 3 58 20 PM

@ncclementi ncclementi self-assigned this Jun 23, 2022
@ncclementi
Copy link
Contributor

ncclementi commented Jun 25, 2022

Current status and problems (showed on this CI run):

  • live logging to log performance report link as warning doesn't work on CI, even though it works locally. Hence, we can't have a link to performance reports unless we have a failure. The performance report gets uploaded to S3 but no link is logged. Not being able to get a performance report link it makes it impossible to find the performance report on S3.

  • When tests that rely on small client fixture fail the logging of the performance report works, but only when it fails.

  • For the case of tests/stability/test_deadlock which uses its own cluster/client I can't make the upload performance report work, it's not even uploading the file, even less will get a link

In conclusion,

  • It looks like CI is not giving us all the logs and haven't been able to find a way to do so when tests are passing.

  • For the case of tests/stability/test_deadlock I'm not too worried in getting perf reports to work, although it'll be nice if it did.

cc: @ian-r-rose what do you think we should do here?

@ian-r-rose ian-r-rose self-requested a review June 28, 2022 15:47
@jrbourbeau
Copy link
Member Author

@ian-r-rose @ncclementi and I chatted about this offline and decided to put this on hold. This is a nice-to-have but also somewhat complicated. If this becomes nicer-to-have in the future we can revive this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants