-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENG-1987] indexer shenanigans #779
[ENG-1987] indexer shenanigans #779
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Just a few comments.
share/util/osf.py
Outdated
@@ -10,3 +14,33 @@ def osf_sources(): | |||
).exclude( | |||
user__username=settings.APPLICATION_USERNAME, | |||
) | |||
|
|||
|
|||
OSF_GUID_RE = re.compile(r'^https?://(?:[^.]+.)?osf.io/(?P<guid>[^/]+)/?$') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick 2: This may be intentional, but (?P<guid>[^/]+)
will match an arbitrary number of non-/
characters. If you want to limit this to exactly 5, no more, no less, then you can replace the +
with {5}
: (?P<guid>[^/]{5})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that was kinda intentional -- i don't know if "exactly 5" will always be set in stone (do we have a way to tell how much of valid-guid-space is used/left?) and it seemed safe enough to allow any length, as long as it's not a nested path
c1082e9
to
2c6763c
Compare
2bd6ae6
to
b0dde85
Compare
- add `indexer` service - move common environment variables to `.docker-compose.env` - tell `celery` its app - update quay.io image names
- cleaner indexer daemon threading, messaging - remove unhelpfully redundant code - move elastic mappings and fetching logic to IndexSetup classes - available IndexSetups added to entry_points in setup.py - allow configuring a different IndexSetup for each index - currently only one `share_classic` IndexSetup that just wraps existing behavior - move index creation/deletion to ElasticManager - tidy up `settings.ELASTICSEARCH`
- require suids for pushed data - a `MutableGraph` can have a "central" node that the graph is "about" - for compatibility, infer suids for data pushed from OSF - `populate_osf_suids` command to populate suids for existing OSF data
- preserve full type lineage in `ShareV2SchemaType.type_lineage` - require specifying through-fields for many-to-many relations - allows traversing m2m relations in MutableGraph
- add `share.metadata_formats` entry points - start out with just `sharev2_elastic`, for back-compatible elasticsearch documents - allow getting a list of all entry points of a given namespace in share.util.extensions -- easy to loop through all formats - add `FormattedMetadataRecord` model - given a suid, easy to store its most recent normalized datum in all available metadata formats
an alternate path to share_classic -- will allow rending ShareObject while retaining back-compatibility for our discover pages
- python 3.6+, not 3.5 - other miscellany
- only run `ingest` on IngestJobs, not HarvestJobs - make status names in english match status names in python
9f9502a
to
9c00ff0
Compare
ok so the point of this is to make it so the elastic index can be populated directly from
NormalizedData
instead ofAbstractCreativeWork
, to make way for deletingShareObject
and all its kin (includingAbstractCreativeWork
) -- all without breaking the apps we've built that use the index directly (preprints and registries discover pages -- will be updated to use a new search api once that new api exists)changes
indexer
service.docker-compose.env
celery
its appshare.search
tidy-upshare_classic
IndexSetup that just wrapsexisting behavior
populate_osf_suids
command to populate suids for existing OSF dataFormattedMetadataRecord
modelsetup.py
underentry_points => share.metadata_formats
)ingest
, and the indexer daemon just pulls from that table (makes re-indexing much smoother)todo
postrend_backcompat
IndexSetup that builds elastic documents from NormalizedData