-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reindex in a background thread on demand. #105
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - a couple of style nits, mostly around how we handle Nones.
tag_exists = Tag(name=tag) | ||
db_session.add(tag_exists) | ||
db_session.commit() | ||
Tag(name=tag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] can you just do this since duplicate tags won't be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I crafted the model for Tag
objects means that tags are deduplicated as a matter of course. There can never be duplicate tags when done this way.
|
||
def seconds_since_index(): | ||
last_update = IndexMetadata.get_last_update('lock', 'index') | ||
if last_update is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if last_update:
return (datetime.datetime.utcnow() - last_update).total_seconds()
return None
return 'Currently indexing' | ||
seconds = seconds_since_index() | ||
if seconds is None: | ||
return "Never" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same None comment from above (None is inherently falsey)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seconds is unlikely to be (but could be) zero; so not changing here.
@@ -104,7 +104,8 @@ | |||
</div> | |||
|
|||
<div class="footer"> | |||
Served with ♥ by <a href="https://github.com/airbnb/knowledge-repo">Knowledge Repo</a> <a title='{{version_revision}}' href="https://github.com/airbnb/knowledge-repo/releases/tag/v{{ version }}">{{ version }}</a> | |||
Served with ♥ by <a href="https://github.com/airbnb/knowledge-repo">Knowledge Repo</a> <a title='{{version_revision}}' href="https://github.com/airbnb/knowledge-repo/releases/tag/v{{ version }}">{{ version }}</a><br /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha in theory we should make this an actual glyphicon
@classmethod | ||
def set(cls, type, name, value): | ||
m = db_session.query(IndexMetadata).filter(IndexMetadata.type == type).filter(IndexMetadata.name == name).first() | ||
if m is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same None comment
@classmethod | ||
def get_last_update(cls, type, name): | ||
m = db_session.query(IndexMetadata).filter(IndexMetadata.type == type).filter(IndexMetadata.name == name).first() | ||
if m is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same None comment
7c01300
to
823d1d8
Compare
0d4b797
to
edc7ead
Compare
edc7ead
to
6050d2c
Compare
__tablename__ = 'index_metadata' | ||
|
||
id = db.Column(db.Integer, nullable=False, primary_key=True) | ||
type = db.Column(db.String(255), nullable=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you imagining that type and name will be jointly unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if so, we should endcode this table that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also do we need type and name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; tuples of type
and name
should be unique. I'll encode this into the table as you suggest. We need type and name for things like:
type='repository_revision', name='<repository uri>
', value='<revision>
'
It would be possible to cram these into one field, but I felt like that was a bit clumsy. Agreed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup sounds good
seconds_check = seconds_since_index_check() | ||
if is_indexing() or (seconds is not None) and (seconds < 5 * 60) and (seconds_check < 5 * 60): | ||
return False | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this reads a little strangely without an except block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is actually kosher Python code. It's the neatest way to guarantee that code block always runs even if there are exceptions (which are not caught).
if not current_app.config.get('REPOSITORY_INDEXING_ENABLED', True): | ||
return False | ||
|
||
seconds = seconds_since_index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small efficiency gains if you bail early here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I don't think the early bail is worth it though. We can revisit this if we ever find the performance suffering.
6050d2c
to
61f0dde
Compare
61f0dde
to
2be5e6f
Compare
@matthewwardrop when you say "so background threading for the index update is disabled for sqlite databases", does this mean that, when using the default database for the index, web server restarts are still required to pick up repository changes? |
Hi @sryza! No... It just means that when the git repository is updated, there will be a synchronous updating of the repository index. Depending on the size of the repository, this could be slow... But I strongly advise against using sqlite databases in production. For testing or for small servers it probably is not terrible. |
Great, thanks for the clarification! |
Currently, the index is only updated once immediately after being launched. Reindexing the repository requires restarting the app, which can take some time depending on the size of the repository. This patch changes this behaviour to reindex on demand. That is, every time a request is made, knowledge repo checks whether it should update the index. Knowledge repo will never update the index if another index is already underway, or if it is less than five minutes since the last index. It will otherwise re-index if the KnowledgeRepository(-ies) report a newer version than what is recorded in the index. When an index is being performed, it is done in a background thread. The status of the index is noted in the footer of the page, which indicated how long ago the index was performed (or if the index is currently being performed). An example is shown below:
This PR would currently have some race conditions when using SQLite databases due to the frequent use of
db_session.flush()
in_update_index
, and so background threading for the index update is disabled for sqlite databases.