Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QGreenland is very slow to open in QGIS #60

Closed
MattF-NSIDC opened this issue May 16, 2023 · 15 comments
Closed

QGreenland is very slow to open in QGIS #60

MattF-NSIDC opened this issue May 16, 2023 · 15 comments

Comments

@MattF-NSIDC
Copy link

MattF-NSIDC commented May 16, 2023

It takes something like 5 minutes to open QGreenland (can be found at public/QGreenland) in QGIS in CryoCloud. I tested this with a full 32GB node share and saw more or less the same results as a 4GB share. On my local machine (i7-1255U), the project opens in less than 5 seconds within the locally-built repo2docker container. Any idea what the cause might be? Is it just the type of virtual machine we're deploying onto?

@yuvipanda
Copy link
Contributor

Is there anything useful in the logs (~/.jupyter-server.txt I think)? Also, how big is the QGreenland file? Do smaller files start faster?

@MattF-NSIDC
Copy link
Author

MattF-NSIDC commented May 17, 2023

QGreenland as a whole contains 3GB of data that need to be loaded as layers in QGIS (the project file itself is negligible in size, it's just XML that points to the real data). If we eliminate layers, the project does start faster. Will re-test and check the logs!

Thanks :)

@yuvipanda
Copy link
Contributor

could also check CPU usage with top open in a terminal while this is happening!

@MattF-NSIDC
Copy link
Author

Unfortunately, I can't find such a log file, and top/htop are showing CPU/memory usage for the whole node, not just the share.

Memory and CPU usage is consistently 2-4% CPU and 2% CPU the whole time the project is opening! Is the read from disk possibly the bottleneck?

(notebook) jovyan@jupyter-mattf-2dnsidc:~$ find . -name "*.txt" -not -path "./shared*"
./.config/chromium/chrome_shutdown_ms.txt
./.pki/nssdb/pkcs11.txt
./.mozilla/firefox/x0rpjn03.default-release/AlternateServices.txt
./.mozilla/firefox/x0rpjn03.default-release/pkcs11.txt
./.mozilla/firefox/x0rpjn03.default-release/SiteSecurityServiceState.txt
./.conda/environments.txt
./test-repo/foo.txt
./test-repo/.ipynb_checkpoints/foo-checkpoint.txt

@MattF-NSIDC
Copy link
Author

MattF-NSIDC commented May 18, 2023

I took some timings. Rsyncing all of QGreenland from shared/ -> my homedir took 1:14:

sent 4.34G bytes  received 22.94K bytes  57.50M bytes/sec
total size is 4.34G  speedup is 1.00

real    1m14.539s
user    0m16.188s
sys     0m5.771s

EDIT: Realized the above is NFS -> NFS, also tested shared/ -> /tmp:

sent 4.34G bytes  received 22.93K bytes  321.55M bytes/sec
total size is 4.34G  speedup is 1.00

real    0m13.563s
user    0m17.329s
sys     0m4.424s

Opening QGreenland from shared/ with QGIS took 2:11, just short of twice as long as it took to read and write all the data in the project! That was unexpected.

Opening QGreenland from /tmp took under 5 seconds!

@MattF-NSIDC
Copy link
Author

Given that our workshop is next week, do you have ideas for working around this? We considered writing a script and having attendees run that script to copy data they'll need in to the container's "local disk", but (a) lack of persistence, and (b) possibility of filling up the shared node's disk are possible dealbreakers.

@yuvipanda
Copy link
Contributor

@MattF-NSIDC aaah, I had thought the data was coming from the outside world via HTTP or something of that sort. Makes total sense it's NFS being the bottleneck. When there are many users reading it, they might all be much slower too.

The larger scheme of things, I think finding a way to get QGis to load this via HTTP might be the way to go. However, I don't understand enough about QGis to even know if this is possible.

In the meantime, I've opened 2i2c-org/infrastructure#2562 to give each user a dedicated, 10Gi non-NFS disk at ~/qgis-data. I tested copying the greenland files here, took about 13s. And opening took about 15s. I think this is acceptable? If it is, we will deploy this when the week comes.

For events specifically, I would like you to consider support@2i2c.org instead of opening issues on this repo. I think issues on this repo are more for self-organizing work, and the 2i2c infrastructure team doesn't actively pay attention to issues here! I currently do, but my job duties are going to change in a little bit and that might change too :) more advance notice would also be really helpful, as I'm not sure I can find anyone to review my infrastructure change before the event starts (I've also opened 2i2c-org/features#27 for us to solidify 2i2c's event policy better). Thanks for working with us as we fine-tune the process :)

Finally, as someone involved with jupyter-desktop-server from the start, I JUST ABSOLUTELY LOVE THAT QGIS IS GOING TO BE USED IN A WORKSHOP VIA A HUB! It makes me so incredibly happy! Would you be interested in writing a blog post for the Jupyter blog on how it went after? :)

@yuvipanda
Copy link
Contributor

We should probably clean up these extra disks after your workshop is over, and remove this feature to reduce costs. Otherwise 10Gi is paid for by all users regardless of wether it is used or not.

@MattF-NSIDC
Copy link
Author

In the meantime, I've opened 2i2c-org/infrastructure#2562 to give each user a dedicated, 10Gi non-NFS disk at ~/qgis-data. I tested copying the greenland files here, took about 13s. And opening took about 15s. I think this is acceptable? If it is, we will deploy this when the week comes.

Wonderful, thank you! This will be very helpful :)

For events specifically, I would like you to consider support@2i2c.org instead of opening issues on this repo. I think issues on this repo are more for self-organizing work, and the 2i2c infrastructure team doesn't actively pay attention to issues here! I currently do, but my job duties are going to change in a little bit and that might change too :) more advance notice would also be really helpful, as I'm not sure I can find anyone to review my infrastructure change before the event starts (I've also opened 2i2c-org/features#27 for us to solidify 2i2c's event policy better). Thanks for working with us as we fine-tune the process :)

Absolutely can do. Can you give an example of an issue that is appropriate and one that's better suited for emailing support? I've been fairly fuzzy on that. For example, should the man pages issue also have been an email? I apologize for the late notice on this issue, there's been a lot to focus on getting ready for the workshop and QGIS load-time just didn't register as a concern for us until recently.

Finally, as someone involved with jupyter-desktop-server from the start, I JUST ABSOLUTELY LOVE THAT QGIS IS GOING TO BE USED IN A WORKSHOP VIA A HUB! It makes me so incredibly happy! Would you be interested in writing a blog post for the Jupyter blog on how it went after? :)

It makes me incredibly happy that we have the ability to do this, so thank you :) I'd love to write a blog post! Thanks for the invitation to do so :)

@MattF-NSIDC
Copy link
Author

Can you give an example of an issue that is appropriate and one that's better suited for emailing support? I've been fairly fuzzy on that. For example, should the man pages issue also have been an email?

@yuvipanda I'd like to take your answer to the above question and add it to an issues template for this repo so when someone goes to open a new issue they'll be prompted to reconsider if their issue should be an email. What do you think?

@yuvipanda
Copy link
Contributor

@MattF-NSIDC will try to provide answers soon :) A short version is that 'anything requiring changes outside the image' definitely needs to go to support@2i2c.org.

In the meantime though, we've deployed the change. Test it out and let us know? Also when is your event?

@MattF-NSIDC
Copy link
Author

Thank you, Yuvi! Our event is tomorrow at 9AM MT through the end of the week. We'll be active ~3 hours per day, maybe using the hub for half that time.

@MattF-NSIDC
Copy link
Author

MattF-NSIDC commented May 23, 2023

@yuvipanda Thank you again for hooking us up with the EBS volumes so quickly last week ❤️ Sorry for the last-minute nature of this issue!

We set up a script at public/QGreenland/sync.sh that we had all the attendees run as part of our first exercise to plop the needed data into the EBS volume, and that worked well. Things are going extremely smoothly so far. If I was going to do another workshop, I'd want to think about how we can get this script in the user's $PATH to enable execution with a short command.

I can't begin to describe the stress relief from not having to deal with weird problems with end-user hardware/software installations. Excited to do a write-up after this week! :)

@yuvipanda
Copy link
Contributor

@MattF-NSIDC thanks! I opened 2i2c-org/infrastructure#2575 to track the blog post!

Can you open an issue here to track what should be here vs sent to support@?

@MattF-NSIDC
Copy link
Author

#68 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants