Full refactor of files storage and fetching to avoid querying S3 when not necessary #4583

SamuelHassine · 2023-10-14T21:00:27Z

Use case

Today, a lot of components in the OpenCTI frontend (UI, Python library) are requesting the files of an entity each time we read it. This leads a lot of organizations to have extra-cost on their S3 buckets and this has a lot of limitation:

No real sorting / pagination
No search on file
Performance and latency
etc.

To be clear: all features implemented in the ElasticSearch engine are not available when it comes to handle files (including mass deletion, etc.). This is a high priority for a lot of OpenCTI users and customers.

Solution discussed with @Kedae and @richard-julien: implement files representation in ElasticSearch, maintain synchronization between S3 and Elastic and fetch S3 content only when needed (file read or download).

ckane · 2023-10-14T21:31:48Z

I'll also add that another downside to the present design is that the additional S3 querying introduces additional latency, in addition to the cost overhead mentioned. When I've refactored some of the connectors to address this (most recently, the yara connector) it speeds up the processing rate of the connector, too.

SamuelHassine · 2023-10-14T21:45:32Z

Definitely @ckane!

…ne instead of S3 (#4583)

SamuelHassine added feature use for describing a new feature to develop needs triage use to identify issue needing triage from Filigran Product team and removed needs triage use to identify issue needing triage from Filigran Product team labels Oct 14, 2023

SamuelHassine added this to the Release 5.12.0 milestone Oct 14, 2023

SamuelHassine mentioned this issue Oct 14, 2023

OpenCTI performing mass S3 List API requests #4550

Closed

Kedae modified the milestones: Release 5.12.0, Release 5.13.0 Oct 17, 2023

Jipegien modified the milestones: Release 5.13.0, Short-term candidates Nov 24, 2023

SamuelHassine modified the milestones: Short-term candidates, Release 5.12.5 Dec 6, 2023

richard-julien mentioned this issue Dec 6, 2023

[backend/frontend] Refactor files management to rely on internal engine instead of S3 (#4583) #5131

Merged

richard-julien linked a pull request Dec 6, 2023 that will close this issue

[backend/frontend] Refactor files management to rely on internal engine instead of S3 (#4583) #5131

Merged

richard-julien added a commit that referenced this issue Dec 7, 2023

[backend] Improve typing and engine schema mappings (#4583)

a6d225e

richard-julien closed this as completed in #5131 Dec 9, 2023

richard-julien added a commit that referenced this issue Dec 9, 2023

[backend/frontend] Refactor files management to rely on internal engi…

bb0387e

…ne instead of S3 (#4583)

SamuelHassine modified the milestones: Release 5.12.6, Release 5.12.5 Dec 9, 2023

SamuelHassine added the solved use to identify issue that has been solved (must be linked to the solving PR) label Dec 9, 2023

richard-julien added a commit that referenced this issue Dec 9, 2023

[backend] Improve migration ordering and file listing (#4583)

63fcbb4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full refactor of files storage and fetching to avoid querying S3 when not necessary #4583

Full refactor of files storage and fetching to avoid querying S3 when not necessary #4583

SamuelHassine commented Oct 14, 2023 •

edited

ckane commented Oct 14, 2023

SamuelHassine commented Oct 14, 2023

Full refactor of files storage and fetching to avoid querying S3 when not necessary #4583

Full refactor of files storage and fetching to avoid querying S3 when not necessary #4583

Comments

SamuelHassine commented Oct 14, 2023 • edited

Use case

ckane commented Oct 14, 2023

SamuelHassine commented Oct 14, 2023

SamuelHassine commented Oct 14, 2023 •

edited