Reindex in a background thread on demand. #105

matthewwardrop · 2016-11-02T21:27:07Z

Currently, the index is only updated once immediately after being launched. Reindexing the repository requires restarting the app, which can take some time depending on the size of the repository. This patch changes this behaviour to reindex on demand. That is, every time a request is made, knowledge repo checks whether it should update the index. Knowledge repo will never update the index if another index is already underway, or if it is less than five minutes since the last index. It will otherwise re-index if the KnowledgeRepository(-ies) report a newer version than what is recorded in the index. When an index is being performed, it is done in a background thread. The status of the index is noted in the footer of the page, which indicated how long ago the index was performed (or if the index is currently being performed). An example is shown below:

This PR would currently have some race conditions when using SQLite databases due to the frequent use of db_session.flush() in _update_index, and so background threading for the index update is disabled for sqlite databases.

NiharikaRay

Looks good - a couple of style nits, mostly around how we handle Nones.

NiharikaRay · 2016-11-02T21:46:03Z

knowledge_repo/app/app.py

-                    tag_exists = Tag(name=tag)
-                    db_session.add(tag_exists)
-                    db_session.commit()
+                Tag(name=tag)


[question] can you just do this since duplicate tags won't be added?

The way I crafted the model for Tag objects means that tags are deduplicated as a matter of course. There can never be duplicate tags when done this way.

NiharikaRay · 2016-11-02T21:47:05Z

knowledge_repo/app/index.py

+
+def seconds_since_index():
+    last_update = IndexMetadata.get_last_update('lock', 'index')
+    if last_update is None:


if last_update: return (datetime.datetime.utcnow() - last_update).total_seconds() return None

NiharikaRay · 2016-11-02T21:47:27Z

knowledge_repo/app/index.py

+        return 'Currently indexing'
+    seconds = seconds_since_index()
+    if seconds is None:
+        return "Never"


same None comment from above (None is inherently falsey)

Seconds is unlikely to be (but could be) zero; so not changing here.

NiharikaRay · 2016-11-02T21:48:19Z

knowledge_repo/app/templates/base.html

@@ -104,7 +104,8 @@
        </div>

        <div class="footer">
-            Served with ♥ by <a href="https://github.com/airbnb/knowledge-repo">Knowledge Repo</a> <a title='{{version_revision}}' href="https://github.com/airbnb/knowledge-repo/releases/tag/v{{ version }}">{{ version }}</a>
+            Served with ♥ by <a href="https://github.com/airbnb/knowledge-repo">Knowledge Repo</a> <a title='{{version_revision}}' href="https://github.com/airbnb/knowledge-repo/releases/tag/v{{ version }}">{{ version }}</a><br />


haha in theory we should make this an actual glyphicon

NiharikaRay · 2016-11-02T21:48:36Z

knowledge_repo/app/models.py

+    @classmethod
+    def set(cls, type, name, value):
+        m = db_session.query(IndexMetadata).filter(IndexMetadata.type == type).filter(IndexMetadata.name == name).first()
+        if m is not None:


same None comment

NiharikaRay · 2016-11-02T21:48:51Z

knowledge_repo/app/models.py

+    @classmethod
+    def get_last_update(cls, type, name):
+        m = db_session.query(IndexMetadata).filter(IndexMetadata.type == type).filter(IndexMetadata.name == name).first()
+        if m is not None:


same None comment

danfrankj · 2016-11-08T08:24:19Z

knowledge_repo/app/models.py

+    __tablename__ = 'index_metadata'
+
+    id = db.Column(db.Integer, nullable=False, primary_key=True)
+    type = db.Column(db.String(255), nullable=False)


are you imagining that type and name will be jointly unique?

if so, we should endcode this table that way

also do we need type and name?

Yes; tuples of type and name should be unique. I'll encode this into the table as you suggest. We need type and name for things like:
type='repository_revision', name='<repository uri>', value='<revision>'

It would be possible to cram these into one field, but I felt like that was a bit clumsy. Agreed?

yup sounds good

danfrankj · 2016-11-08T08:34:44Z

knowledge_repo/app/index.py

+    seconds_check = seconds_since_index_check()
+    if is_indexing() or (seconds is not None) and (seconds < 5 * 60) and (seconds_check < 5 * 60):
+        return False
+    try:


nit: this reads a little strangely without an except block.

I think this is actually kosher Python code. It's the neatest way to guarantee that code block always runs even if there are exceptions (which are not caught).

danfrankj · 2016-11-08T08:39:02Z

knowledge_repo/app/index.py

+    if not current_app.config.get('REPOSITORY_INDEXING_ENABLED', True):
+        return False
+
+    seconds = seconds_since_index()


small efficiency gains if you bail early here

Yeah. I don't think the early bail is worth it though. We can revisit this if we ever find the performance suffering.

sryza · 2016-11-08T16:40:41Z

@matthewwardrop when you say "so background threading for the index update is disabled for sqlite databases", does this mean that, when using the default database for the index, web server restarts are still required to pick up repository changes?

matthewwardrop · 2016-11-08T16:54:32Z

Hi @sryza! No... It just means that when the git repository is updated, there will be a synchronous updating of the repository index. Depending on the size of the repository, this could be slow... But I strongly advise against using sqlite databases in production. For testing or for small servers it probably is not terrible.

sryza · 2016-11-08T16:56:19Z

Great, thanks for the clarification!

matthewwardrop mentioned this pull request Nov 2, 2016

knowledge repo content refresh #48

Closed

NiharikaRay reviewed Nov 2, 2016

View reviewed changes

matthewwardrop force-pushed the mw_reindex_ondemand branch 3 times, most recently from 7c01300 to 823d1d8 Compare November 4, 2016 05:17

matthewwardrop added the WIP label Nov 4, 2016

matthewwardrop force-pushed the mw_reindex_ondemand branch 2 times, most recently from 0d4b797 to edc7ead Compare November 8, 2016 08:13

matthewwardrop removed the WIP label Nov 8, 2016

matthewwardrop force-pushed the mw_reindex_ondemand branch from edc7ead to 6050d2c Compare November 8, 2016 08:26

matthewwardrop mentioned this pull request Nov 8, 2016

[Feature] Allow update endpoint #49

Closed

danfrankj approved these changes Nov 8, 2016

View reviewed changes

danfrankj reviewed Nov 8, 2016

View reviewed changes

matthewwardrop force-pushed the mw_reindex_ondemand branch from 6050d2c to 61f0dde Compare November 8, 2016 08:57

Reindex in a background thread on demand.

2be5e6f

matthewwardrop force-pushed the mw_reindex_ondemand branch from 61f0dde to 2be5e6f Compare November 8, 2016 09:03

matthewwardrop merged commit 7300433 into master Nov 8, 2016

matthewwardrop deleted the mw_reindex_ondemand branch November 8, 2016 09:07

matthewwardrop mentioned this pull request Nov 8, 2016

Improve documentation around the web editor #126

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reindex in a background thread on demand. #105

Reindex in a background thread on demand. #105

matthewwardrop commented Nov 2, 2016 •

edited

NiharikaRay left a comment

NiharikaRay Nov 2, 2016

matthewwardrop Nov 2, 2016

NiharikaRay Nov 2, 2016

NiharikaRay Nov 2, 2016

matthewwardrop Nov 3, 2016

NiharikaRay Nov 2, 2016

NiharikaRay Nov 2, 2016

NiharikaRay Nov 2, 2016

danfrankj Nov 8, 2016

danfrankj Nov 8, 2016

danfrankj Nov 8, 2016

matthewwardrop Nov 8, 2016 •

edited

danfrankj Nov 8, 2016

danfrankj Nov 8, 2016

matthewwardrop Nov 8, 2016

danfrankj Nov 8, 2016

matthewwardrop Nov 8, 2016

sryza commented Nov 8, 2016

matthewwardrop commented Nov 8, 2016

sryza commented Nov 8, 2016

Reindex in a background thread on demand. #105

Reindex in a background thread on demand. #105

Conversation

matthewwardrop commented Nov 2, 2016 • edited

NiharikaRay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewwardrop Nov 8, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sryza commented Nov 8, 2016

matthewwardrop commented Nov 8, 2016

sryza commented Nov 8, 2016

matthewwardrop commented Nov 2, 2016 •

edited

matthewwardrop Nov 8, 2016 •

edited