Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better default memory allocation and communication to user #559

Open
tarheel opened this issue Apr 12, 2021 · 13 comments
Open

better default memory allocation and communication to user #559

tarheel opened this issue Apr 12, 2021 · 13 comments
Assignees

Comments

@tarheel
Copy link
Contributor

tarheel commented Apr 12, 2021

This is a companion to #552. @moldover's suggestions:

  • detect user system memory size
  • tell the user what size of election they can expect to tabulate based on their system
  • always use 80% (or whatever seems appropriate) when tabulating
  • warn the user during cvr parsing if they approach / exceed the limits for their system

Note: not 100% sure, but I don't think that last idea is possible. The JVM's preference is to make use of the memory you allocate to it, since garbage collection has a non-zero cost. So it will end up consuming most or all of the available heap space during the normal course of operation even when it's actually well within the limits we've defined.

@chughes297
Copy link
Collaborator

Didn't 1.3 cover some of this? I could be wrong but I thought 1.3 did some memory optimization.

@tarheel
Copy link
Contributor Author

tarheel commented Feb 22, 2023

No, v1.3 did not include anything related to memory use.

@tarheel
Copy link
Contributor Author

tarheel commented Feb 22, 2023

I'd recommend adding #552 to v1.4 also.

@artoonie
Copy link
Collaborator

artoonie commented Jun 8, 2023

Given the major reduction in memory footprint, is this still a P1 task? I expect users won't be hitting memory ceilings anymore, though I guess I'm not sure what hardware elections are being run on.

My gut is that we've reduced memory usage enough that this isn't as important anymore -- though I defer to those of you with more experience.

@chughes297
Copy link
Collaborator

I've learned to assume a pretty small amount of RAM on computers users might have for RCTab. Assume we have a computer with 4GB RAM - do you have a sense of the vote ceiling there?

@artoonie
Copy link
Collaborator

Just ran a basic test -- tabulating 300,000 CVRs out of 1,500,000 records takes 500mb additional memory (in addition to 350mb used just to launch RCTab). That means we'd be able to support 2.4 million CVRs on a 4GB machine - plus/minus a bit, since the rest of the machine will use some memory, but the JVM would also garbage collect more often.

However, I have noticed a memory leak -- each time you run a tabulation, more and more memory is used. That could be the cause: if people re-run elections multiple times, each subsequent election has less memory to work with, and it'll eventually hit a ceiling.

I think if we solve this leak, we'd solve the overall problem. I'm going to look into that.

@chughes297
Copy link
Collaborator

two questions:

  1. could we expect this to scale linearly? like if i had a computer with 40GB RAM could I reasonably expect to be able to tabulate ~24 million records?
  2. is there any difference in capacity across CVR formats? are some more resource intensive than others?

@artoonie
Copy link
Collaborator

artoonie commented Jun 12, 2023

  1. Yes, I expect so, but I'm testing to validate that.
  2. I haven't thoroughly tested all formats, but from the handful I've tested, reading CVRs isn't the memory-intensive part: it's doing the tabulation (which is format-agnostic). So, best I can tell, the bottleneck will be format-agnostic.

I'm still working to get a better understanding here.

@chughes297
Copy link
Collaborator

chughes297 commented Jun 12, 2023

  1. Cool!
  2. Ok, got it. I guess that was the point of Make CastVoteRecord use 50% less memory #640, to resolve the reading-in bottleneck that we ran into in the past.

@artoonie
Copy link
Collaborator

Alright, I've spent a lot of cycles trying to hunt down the memory leak, and I think I'm chasing a phantom: something somewhere is being cached, and I'm not familiar enough with Java debugging tools to pinpoint what's happening -- but if I wait 15 minutes, the memory decreases, so I think Java is just using the memory available on my machine, and it wouldn't happen in the "real world".

If somebody with a Virtual Machine can test this, that would be helpful ( @HEdingfield ?) -- see what happens on a machine with 4gb ram. My gut tells me that we should be fine now, and that we realistically won't be hitting limits anymore, but I'm not 100% confident in that.

@HEdingfield
Copy link
Contributor

No easy way to test on my end either. I suggest we maybe open up another issue outside of this release to check in on it again in the future (linking back to this one), and maybe close a bit later if nobody else has complained about it.

@artoonie
Copy link
Collaborator

Almost a year later -- thinking we can close this @yezr ?

@yezr
Copy link
Collaborator

yezr commented Apr 12, 2024

Looking through all github issues related to memory footprint. I see #640 PR fixing #552 that makes all tabulation use less memory by improving the memory footprint of CVRs. That PR will raise the ballot # ceiling that we can successfully tabulate given the same memory size.

I created #824 to revise the ballot to memory estimates we currently have in Section 3 of the TDP. With new ballot to memory estimates we can decide whether this issue is still necessary. Like if a machine with 4GB memory can now reliably do millions of ballots we can drop the priority of this one.

@yezr yezr assigned yezr and unassigned artoonie May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

7 participants