Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backend/frontend] Implement full text search for documents (#1483) #4275

Merged
merged 35 commits into from Nov 9, 2023

Conversation

SouadHadjiat
Copy link
Member

@SouadHadjiat SouadHadjiat commented Sep 7, 2023

Proposed changes

PR with first version of imported files indexing and full text search for enterprise edition

  • Creation of files index (opencti_files) and configuration of attachment processor
  • Adding FileIndexManager module to index all imported files (global and in entities)
  • In global search, adding button to extend the search to indexed files
  • handle entities markings / organization restriction changes : update indexed files to have the same data restrictions

First part of file index management panel :

  • Adding "File indexing" menu in settings top bar
  • Display enterprise edition info if not enabled
  • Display requirements for file indexing (elasticsearch / opensearch)
  • Display number and size of files that will be indexed
  • Manage file indexing : start / pause, display number and size of files indexed

Indexation is paused by default.

About file indexing :
We support these mime types : text/plain, text/csv, application/pdf, application/vnd.ms-excel, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet (excel sheets)

Related issues

Checklist

  • I consider the submitted work as finished
  • I tested the code for its functionality
  • I wrote test cases for the relevant uses case
  • I added/update the relevant documentation (either on github or on notion)
  • Where necessary I refactored code to improve the overall quality

Further comments

https://www.notion.so/filigran/Management-panel-for-File-indexing-de5a1879c62f41af99943c58a30f4aa6

https://www.notion.so/filigran/Full-text-search-for-documents-11a67658fc37410cbbbdde2db78da73a#3281f177046749678fdbb8539b5cbaab

@SouadHadjiat SouadHadjiat added the filigran team use to identify PR from the Filigran team label Sep 7, 2023
@SouadHadjiat SouadHadjiat force-pushed the issue/1483 branch 2 times, most recently from 6841080 to 2e867d8 Compare October 6, 2023 11:03
@SouadHadjiat SouadHadjiat marked this pull request as ready for review October 9, 2023 15:05
@RomuDeuxfois RomuDeuxfois self-requested a review October 11, 2023 05:51
@Archidoit
Copy link
Member

Is it usefull to keep the 'line view' button on the right since there is no other view possibility ?
image

entityLink = entityLink.concat('/files');
}
return (
<ListItem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use ListItemButton

@Archidoit
Copy link
Member

If we click again on 'Extends this search to indexed files', it would be great that the panel closes.
image

@Archidoit
Copy link
Member

Archidoit commented Oct 11, 2023

Sorting file result (by name or 'attached entity type' is not working :
image

Note: Would be great to also be able to sort by the number of occurences
@Jipegien

@Archidoit
Copy link
Member

Archidoit commented Oct 11, 2023

What are all the files extensions for which this feature is working ?
I uploaded a txt file containing the world 'filter' and it was not found... But an other more simple txt file was found...

@SouadHadjiat
Copy link
Member Author

What are all the files extensions for which this feature is working ?

@Archidoit We support these mime types : text/plain, text/csv, application/pdf, application/vnd.ms-excel, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet (excel sheets). you can find them in default.json

@SouadHadjiat
Copy link
Member Author

Sorting file result (by name or 'attached entity type' is not working : image

Note: Would be great to also be able to sort by the number of occurences

It's a mistake, we don't support custom sorting for now (it will be in V2), we forgot to remove column sorting. And sorting by number of occurrence won't be possible, because it's a data we compute, not something stored that we could sort on.

@SouadHadjiat SouadHadjiat merged commit 184383d into master Nov 9, 2023
3 checks passed
@SouadHadjiat SouadHadjiat deleted the issue/1483 branch November 9, 2023 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
filigran team use to identify PR from the Filigran team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants