Skip to content

Conversation

@williamjallen
Copy link
Member

What is the current behavior?

It is currently possible to cause Lichen to use excessive amounts of memory, while running for extended periods of time, when improperly configured.

What is the new behavior?

This PR adds reasonable limits to all potentially large aspects of a given Lichen run such that files which exceed the limits are truncated without failing altogether. Future work will need to be done to add hard, fixed, limits on the amount of memory available and the time any given run is allowed to take. For now, the collection of limits here should prevent any excessive memory usage by limiting all of the potentially large files to reasonable sizes. All of the limits introduced have been condensed to a single lichen_config.json file containing:

  • concat_max_total_bytes: the total number of bytes allowed to be concatenated in total
    • This is largely an attempt to prevent excessively large tar/zip archives from being moved around and an attempt to limit the duration of Lichen runs to a reasonable amount of time.
  • max_sequences_per_file: the maximum number of hashes any given submission may contain
    • This is by far the most crucial and most restrictive limit which limits the size of a matches.json for any given submission to a reasonable upper bound. Raising this limit too much may result in excessive memory usage while compare_hashes.cpp is attempting to build a large data structure.
  • max_matching_positions: The maximum number of duplicate sequences there may be between any two submissions.
    • This serves as some amount of protection against the Lichen equivalent of a zip bomb, either accidental or deliberate. Raising this limit too much may result in excessively large matches.json files, especially if two or more files contain significant amounts of repetition between them. This also helps control the maximum memory usage used by Lichen.

@bmcutler bmcutler merged commit 0f2e364 into main Aug 13, 2021
@bmcutler bmcutler deleted the print-heuristics-when-killed branch August 13, 2021 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants