Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full refactor of files storage and fetching to avoid querying S3 when not necessary #4583

Closed
SamuelHassine opened this issue Oct 14, 2023 · 2 comments · Fixed by #5131
Closed
Labels
feature use for describing a new feature to develop solved use to identify issue that has been solved (must be linked to the solving PR)

Comments

@SamuelHassine
Copy link
Member

SamuelHassine commented Oct 14, 2023

Use case

Today, a lot of components in the OpenCTI frontend (UI, Python library) are requesting the files of an entity each time we read it. This leads a lot of organizations to have extra-cost on their S3 buckets and this has a lot of limitation:

  • No real sorting / pagination
  • No search on file
  • Performance and latency
  • etc.

To be clear: all features implemented in the ElasticSearch engine are not available when it comes to handle files (including mass deletion, etc.). This is a high priority for a lot of OpenCTI users and customers.

Solution discussed with @Kedae and @richard-julien: implement files representation in ElasticSearch, maintain synchronization between S3 and Elastic and fetch S3 content only when needed (file read or download).

@SamuelHassine SamuelHassine added feature use for describing a new feature to develop needs triage use to identify issue needing triage from Filigran Product team and removed needs triage use to identify issue needing triage from Filigran Product team labels Oct 14, 2023
@SamuelHassine SamuelHassine added this to the Release 5.12.0 milestone Oct 14, 2023
@ckane
Copy link
Contributor

ckane commented Oct 14, 2023

I'll also add that another downside to the present design is that the additional S3 querying introduces additional latency, in addition to the cost overhead mentioned. When I've refactored some of the connectors to address this (most recently, the yara connector) it speeds up the processing rate of the connector, too.

@SamuelHassine
Copy link
Member Author

Definitely @ckane!

@Kedae Kedae modified the milestones: Release 5.12.0, Release 5.13.0 Oct 17, 2023
@Jipegien Jipegien modified the milestones: Release 5.13.0, Short-term candidates Nov 24, 2023
@SamuelHassine SamuelHassine modified the milestones: Short-term candidates, Release 5.12.5 Dec 6, 2023
@SamuelHassine SamuelHassine modified the milestones: Release 5.12.6, Release 5.12.5 Dec 9, 2023
@SamuelHassine SamuelHassine added the solved use to identify issue that has been solved (must be linked to the solving PR) label Dec 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature use for describing a new feature to develop solved use to identify issue that has been solved (must be linked to the solving PR)
Projects
None yet
4 participants