New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated docs search to use PostgreSQL full-text search #797

Merged
merged 1 commit into from Nov 22, 2017

Conversation

Projects
None yet
4 participants
@pauloxnet
Contributor

pauloxnet commented Nov 12, 2017

All Full-Text Search features based on Elasticsearch replaced with PostgreSQL FTS
https://groups.google.com/d/topic/django-developers/kxH56zaAeZY/discussion

@apollo13

This comment has been minimized.

Member

apollo13 commented Nov 12, 2017

You can remove the "search by" advertisement. We have it there currently because a company is sponsoring it, we do not advertise the open source technologies we use. Will also make the footer shorter again.

path__startswith='ref/class-based-views/flattened-index'
)
for document in documents:
document.json['breadcrumbs'] = list(

This comment has been minimized.

@apollo13

apollo13 Nov 12, 2017

Member

Is there any reason for a JSON field here instead of adding an array field for breadcrumbs and slugfield for slug etc…

This comment has been minimized.

@pauloxnet

pauloxnet Nov 13, 2017

Contributor

As in the ES "document" we need 'breadcrumbs' and 'slug' only as "metadata" for search indexing or representation and not for query filter or other query operations, we have 'path' for this.

Besides with 'breadcrumbs' and 'slug' in the Json field we don't need to modify a lot Document class and templates.

This comment has been minimized.

@apollo13

apollo13 Nov 13, 2017

Member

Well, not having to modify stuff is not really a good argument, I'd at least rename json to metadata then. Cause json alone is not really descriptive at all.

This comment has been minimized.

@pauloxnet

pauloxnet Nov 13, 2017

Contributor

Yes, metadata will be a better name, and I'm going to rename it.

I still think having breadcrumbs and slug in the JSONField is a good solution, also because we don't need them for filtering operation and because is more descriptive than an ArrayField and flexible.

]
operations = [
TrigramExtension(),

This comment has been minimized.

@apollo13

apollo13 Nov 12, 2017

Member

This can be folded into the previous migration, no?

This comment has been minimized.

@pauloxnet

pauloxnet Nov 13, 2017

Contributor

Probably yes. I'm going to "merge" the two migrations.

But I saw there's a permission error in travis ci Must be superuser to create this extension.

How do you think we can fix this ?

DOCUMENT_SEARCH_VECTOR = (
SearchVector('title', weight='A') +
SearchVector(Coalesce(KeyTextTransform('slug', 'json'), Value('')), weight='A') +
SearchVector(Coalesce(KeyTextTransform('toc', 'json'), Value('')), weight='B') +

This comment has been minimized.

@apollo13

apollo13 Nov 12, 2017

Member

Where is toc coming from?

This comment has been minimized.

@pauloxnet

pauloxnet Nov 13, 2017

Contributor

toc is generated from sphinx, like body or title, and contains html code with all paragraph titles.

Using toc in the search vector is useful because paragraph titles are more important than body text and we can assign more relevancy, but minus than title or slug.

pauloxnet added a commit to pauloxnet/djangoproject.com that referenced this pull request Nov 13, 2017

@pauloxnet

This comment has been minimized.

Contributor

pauloxnet commented Nov 13, 2017

I've renamed the json field to metadata and I've merged migrations as requested.

@timgraham

This comment has been minimized.

Member

timgraham commented Nov 14, 2017

Can you look into how we can use a superuser on Travis to fix the build error?

django.db.utils.ProgrammingError: permission denied to create extension "pg_trgm"
HINT:  Must be superuser to create this extension.
@@ -0,0 +1 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 2326.301 444.75"><path fill="#f3f8f5" fill-opacity=".408" d="M411.558 272.53c-53.28 10.99-56.942-7.05-56.942-7.05 56.255-83.473 79.77-189.43 59.477-215.362C358.73-20.622 262.89 12.834 261.29 13.702l-.514.093c-10.526-2.186-22.307-3.487-35.547-3.703-24.11-.396-42.4 6.32-56.28 16.843 0 0-170.97-70.432-163.02 88.58 1.69 33.83 48.48 255.964 104.3 188.87 20.4-24.535 40.11-45.28 40.11-45.28 9.79 6.504 21.51 9.822 33.8 8.63l.95-.81c-.3 3.046-.16 6.025.38 9.553-14.38 16.064-10.16 18.885-38.9 24.802-29.09 5.994-12 16.665-.85 19.454 13.52 3.382 44.81 8.173 65.95-21.42l-.85 3.376c5.63 4.512 9.59 29.35 8.92 51.868-.67 22.517-1.11 37.976 3.33 50.05 4.44 12.076 8.86 39.245 46.65 31.148 31.57-6.766 47.93-24.3 50.21-53.548 1.61-20.793 5.27-17.72 5.5-36.31l2.93-8.8c3.38-28.187.53-37.28 19.99-33.05l4.72.415c14.31.65 33.05-2.303 44.05-7.415 23.68-10.99 37.72-29.34 14.37-24.518z"/><path fill="none" stroke="#004b38" stroke-width="11.122" d="M357.04 117.154c.915 16.94-3.648 28.48-4.223 46.513-.85 26.213 12.498 56.216-7.616 86.256m-8-117.823c.3 2.16-3.95 7.923-9.5 8.693-5.54.773-10.29-3.73-10.59-5.888-.29-2.16 3.96-4.54 9.51-5.312 5.55-.773 10.3.353 10.6 2.508zm-168.86 4.403c-.3 2.16 3.96 7.923 9.51 8.693 5.55.773 10.29-3.73 10.59-5.888.3-2.16-3.95-4.54-9.51-5.312-5.54-.774-10.29.353-10.59 2.507zm16.1 140.205c-1.45-9.444 3.11-20.684 7.98-33.833 7.33-19.728 24.25-39.46 10.72-102.042-10.08-46.636-77.74-9.705-77.78-3.382-.04 6.33 3.06 32.06-1.13 62.02-5.46 39.1 24.88 72.17 59.83 68.79m1.34 8.74c-14.38 16.07-10.15 18.89-38.9 24.81-29.08 6-11.99 16.67-.84 19.46 13.53 3.39 44.81 8.18 65.95-21.42 6.44-9.01-.04-23.39-8.88-27.06-4.27-1.77-9.99-3.99-17.33 4.23zm169.13-11s3.67 18.05 56.95 7.05c23.35-4.82 9.3 13.53-14.38 24.53-19.43 9.02-62.99 11.33-63.71-1.13-1.83-32.15 22.93-22.38 21.14-30.44-1.61-7.25-12.69-14.37-20.01-32.13-6.39-15.5-87.69-134.36 22.55-116.7 4.04-.83-28.75-104.86-131.92-106.55C122 8.94 125.38 137.48 125.38 137.48m136-123.3c-5.92 1.86 95.17-36.95 152.62 36.46 20.29 25.937-3.227 131.89-59.48 215.37M168.96 26.938S-2.126-42.99 5.827 116.02c1.692 33.83 48.49 255.97 104.303 188.874 20.396-24.538 38.842-43.782 38.842-43.782m65.278 28.744c-1.467 52.44.37 105.26 5.5 118.09 5.136 12.83 16.125 37.8 53.914 29.7 31.57-6.77 43.058-19.87 48.042-48.78 3.672-21.27 10.75-80.35 11.657-92.45" stroke-linecap="round" stroke-linejoin="round"/><g fill="#f3faf5" fill-opacity=".408" font-family="Strait" letter-spacing="0" word-spacing="0"><path d="M742.615 195.84c0 20.096-5.962 35.837-17.885 47.225-11.924 11.387-28.27 17.08-49.035 17.08H592.9l.213 95.056-22.507.36V132.09l111.332-.388c19.157 0 34.03 5.694 44.613 17.08 10.72 11.39 16.08 27.134 16.08 47.23m-22.71.81c0-31.35-15-47.027-45.01-47.027l-81.99-.352v93.047h77.77c32.83 0 49.24-15.272 49.24-45.818m185.66 85.383c0 23.847-6.23 42.472-18.69 55.87-13.93 15.004-34.23 22.506-60.89 22.506-12.73 0-23.85-1.72-33.36-5.22-9.38-3.61-17.62-9.24-24.72-16.88-12.86-14.33-19.29-33.09-19.29-56.27 0-23.31 6.23-41.8 18.69-55.46 13.8-15.27 33.36-22.91 58.68-22.91 26.8 0 47.03 7.44 60.69 22.31 12.6 13.4 18.89 32.09 18.89 56.07m-19.49 0c0-41.66-20.03-62.49-60.08-62.49-18.09 0-32.29 5.56-42.6 16.68-4.95 5.36-8.77 11.93-11.45 19.7-2.55 7.64-3.82 16.35-3.82 26.13 0 19.16 4.96 34.43 14.87 45.82 10.05 11.26 24.38 16.88 43.01 16.88 20.63 0 35.97-5.83 46.02-17.48 4.82-5.22 8.37-11.59 10.65-19.09 2.28-7.5 3.42-16.21 3.42-26.12m176.71 32.76c0 30.15-22.58 45.22-67.72 45.22-13.8 0-25.46-1.06-34.97-3.21-9.38-2.15-17.01-5.63-22.91-10.45-4.55-3.62-8.17-8.37-10.85-14.27-2.68-6.03-4.42-13.47-5.22-22.31l20.9-.35c0 11.79 3.69 20.5 11.06 26.13 7.91 6.7 21.91 10.05 42 10.05 31.62 0 47.43-9.44 47.43-28.34 0-10.98-5.29-18.55-15.87-22.7-2.14-.95-14.94-3.07-38.38-6.43-23.31-3.49-37.85-6.56-43.61-9.24-12.46-5.89-18.69-16.54-18.69-31.95 0-15 5.56-26.19 16.68-33.56 5.5-3.07 12.33-5.42 20.5-7.03 8.31-1.73 17.89-2.61 28.74-2.61 43.41 0 65.11 16.28 65.11 48.83h-19.89c0-22.91-15.27-34.36-45.82-34.36-15.54 0-27.13 2.36-34.77 7.03-7.63 4.56-11.45 11.66-11.45 21.3 0 8.98 5.16 15.41 15.48 19.3 6.03 2.15 19.9 4.556 41.6 7.236 19.43 2.15 32.76 5.09 39.99 8.84 13.8 6.43 20.7 17.35 20.7 32.755m91.16 41.14c-3.21.67-6.09 1.13-8.64 1.41-2.41.35-4.15.39-5.22.39-13.664 0-23.51-2.364-29.54-7.035-6.03-4.825-9.043-13.2-9.043-25.12V224h-24.92v-15.1h24.72l.215-48.432h19.09V209.1l33.36-.353v15.272l-33.36-.36v98.66c0 7.37 1.61 12.46 4.83 15.27 3.35 2.82 9.24 4.22 17.68 4.22 1.21 0 2.546 0 4.02-.35 1.474-.36 3.684-.74 6.63-1.41l.215 15.27m156.23-4.56c0 44.48-24.98 66.71-74.96 66.71-26.12 0-44.68-4.29-55.66-12.86-9.51-7.37-15.34-19.22-17.486-35.57l19.69-.35c2.28 11.92 6.43 20.23 12.46 24.92 8.04 6.29 21.503 9.44 40.393 9.44 20.1 0 34.5-4.56 43.21-13.67 8.84-9.11 13.265-22.04 13.265-38.79l-.21-26.13c-12.06 18.89-32.49 28.33-61.29 28.33-23.98 0-42.54-6.57-55.666-19.7-6.03-6.43-10.65-14.21-13.87-23.32-3.08-9.25-4.62-19.76-4.62-31.55 0-23.58 6.23-41.6 18.69-54.06 13.13-13.67 32.15-20.5 57.07-20.5 17.016 0 30.88 3.55 41.6 10.65 3.617 2.68 9.646 9.17 18.087 19.49l.217-25.32h19.09v142.1m-19.09-73.16c0-17.55-4.89-31.55-14.67-42-9.65-10.588-23.85-15.88-42.6-15.88-18.623 0-33.09 5.23-43.41 15.68-10.18 10.32-15.27 24.65-15.27 43.01 0 39.52 20.03 59.29 60.09 59.29 8.706 0 16.41-1.34 23.11-4.02 6.697-2.68 12.66-6.63 17.884-11.86 9.92-10.31 14.87-25.05 14.87-44.21m130.26-53.7c-5.76-1.06-11.12-1.62-16.07-1.62-31.48 0-47.22 20.57-47.22 61.7l.217 71.346h-19.09l-.21-146.1 19.09-.354v26.324h1.01c3.21-10.313 8.3-18.083 15.27-23.31 7.63-5.09 19.02-7.64 34.16-7.64h7.64c2.14 0 3.88 0 5.22.35v19.3m157.33 62.31h-136.5c0 16.75 4.62 30.41 13.863 41 10.72 12.33 26.06 18.49 46.02 18.49 28.27 0 45.954-12.66 53.055-37.98h20.7c-3.08 14.47-10.38 26.66-21.9 36.58-6.568 5.22-14.27 9.24-23.11 12.05-8.71 2.96-18.22 4.42-28.54 4.42-25.858 0-45.887-7.77-60.09-23.31-13.13-13.93-19.69-32.76-19.69-56.47 0-23.58 6.83-42.2 20.5-55.866 13.662-13.8 32.89-20.7 57.672-20.7 13.26 0 24.85 1.94 34.763 5.824 9.91 3.75 18.22 9.58 24.92 17.48 12.19 14.2 18.29 33.7 18.29 58.48m-20.1-13.25c0-14.07-5.43-26.66-16.28-37.78-10.72-11.12-24.59-16.68-41.6-16.68-8.71 0-16.55 1.34-23.51 4.02-6.97 2.54-13.2 6.49-18.69 11.85-8.44 9.11-13.87 21.97-16.28 38.59h116.35" style="line-height:125%;-inkscape-font-specification:Strait" font-size="40"/><path d="M1828.62 296.015c0 52.463-35.324 78.693-105.97 78.693-42.325 0-73.223-9.01-92.695-27.034-15.93-14.807-23.898-37.335-23.898-67.59h28.485c0 48.76 29.37 73.143 88.107 73.143 28.16 0 48.51-5.07 61.07-15.21 10.62-8.69 15.93-22.45 15.93-41.276 0-18.66-7.24-31.78-21.73-39.34-8.21-4.02-28.25-9.09-60.11-15.2-40.07-7.24-66.06-14.56-77.97-21.97-17.22-10.78-25.83-28.96-25.83-54.55 0-24.14 7.32-42 21.96-53.59 15.77-12.87 42.8-19.31 81.1-19.31 69.2 0 103.8 27.44 103.8 82.32h-28.73c0-41.03-25.67-61.55-77.01-61.55-25.75 0-44.58 4.11-56.49 12.31-11.27 8.85-16.9 21.81-16.9 38.87 0 17.86 7.24 30.5 21.72 37.9 7.72 4.18 30.41 9.98 68.07 17.38 35.4 6.43 59.3 13.84 71.69 22.21 16.9 10.46 25.35 28.41 25.35 53.83m282.08 77.97l-17.38 18.83-42-35c-8.85 5.63-19.64 9.9-32.35 12.79-12.72 2.89-27.6 4.35-44.66 4.35-43.61 0-78.05-12.95-103.32-38.86-25.11-26.07-37.66-60.26-37.66-102.59 0-43.29 12.47-77.64 37.41-103.07 24.94-25.43 59.3-38.14 103.07-38.14 22.53 0 42.56 3.39 60.1 10.14 17.54 6.76 32.5 17.06 44.9 30.9 23.17 25.1 34.76 58.5 34.76 100.18 0 23.34-3.54 43.93-10.63 61.8-7.08 17.7-18.03 32.75-32.83 45.14l40.55 33.55m-24.38-140.49c0-18.34-2.498-34.68-7.48-49-4.99-14.48-12.718-27.12-23.178-37.9-19.796-22.05-46.99-33.07-81.59-33.07-34.118 0-61.556 10.54-82.316 31.62-20.6 21.08-30.9 50.53-30.9 88.35 0 36.37 10.38 65.5 31.14 87.39 20.92 21.73 48.28 32.59 82.08 32.59 16.57 0 31.54-2.57 44.9-7.72 13.35-5.15 25.34-12.79 35.96-22.93 20.92-21.88 31.38-51.66 31.38-89.31M2326 367.75l-194.32.357V99.64l26.796-.49v246.22h167.77l-.25 22.45" style="line-height:125%;-inkscape-font-specification:Strait" font-size="28"/></g></svg>

This comment has been minimized.

@timgraham

timgraham Nov 15, 2017

Member

Remove these images.

This comment has been minimized.

@pauloxnet

pauloxnet Nov 15, 2017

Contributor

I'm going to remove both png and svg images.

@timgraham

Is it possible to add some tests now that elasticsearch isn't used?

@@ -157,6 +163,7 @@ def sync_to_db(self, decoded_documents):
release=self,
path=_clean_document_path(document['current_page_name']),
title=unescape_entities(strip_tags(document['title'])),
metadata=document

This comment has been minimized.

@timgraham

timgraham Nov 15, 2017

Member

Include a trailing comma in cases like this so that if more items are added later, this line doesn't have to be modified again.

This comment has been minimized.

@pauloxnet

pauloxnet Nov 15, 2017

Contributor

ok

from django.core.cache import cache
from django.db import models, transaction
from django.db.models import F

This comment has been minimized.

@timgraham

timgraham Nov 15, 2017

Member

I'd use models.F and skip this import.

This comment has been minimized.

@pauloxnet

pauloxnet Nov 15, 2017

Contributor

ok

objects = DocumentManager()
class Meta:
indexes = [

This comment has been minimized.

@timgraham

timgraham Nov 15, 2017

Member

A single line looks better.

This comment has been minimized.

@pauloxnet

pauloxnet Nov 15, 2017

Contributor

ok

# Alias the main 'docs' index to the new one.
DocumentDocType.alias_to_main_index(new_index_name)
Document.objects.update(search=None)
# don't index the module pages since source code is hard to

This comment has been minimized.

@timgraham

timgraham Nov 15, 2017

Member

Please use sentences with capitals and periods.

This comment has been minimized.

@pauloxnet

pauloxnet Nov 15, 2017

Contributor

I only copied this comment from:
https://github.com/django/djangoproject.com/blob/master/docs/search.py#L184

I tried to fix it, but English is not my first language ;-)

Document.objects.update(search=None)
# don't index the module pages since source code is hard to
# combine with full text search
# not the crazy big flattened index of the CBVs

This comment has been minimized.

@timgraham

timgraham Nov 15, 2017

Member

This could be inlined above the path__startswith='ref/class-based-views/flattened-index' line.

This comment has been minimized.

@pauloxnet

pauloxnet Nov 15, 2017

Contributor

I moved the comment above related code

@pauloxnet

This comment has been minimized.

Contributor

pauloxnet commented Nov 15, 2017

I'm going to look into how we can use a superuser on Travis to fix the build error.

pauloxnet added a commit to pauloxnet/djangoproject.com that referenced this pull request Nov 15, 2017

@pauloxnet

This comment has been minimized.

Contributor

pauloxnet commented Nov 15, 2017

I will add some tests too asap

@pauloxnet

This comment has been minimized.

Contributor

pauloxnet commented Nov 16, 2017

I fixed the travis error and all test passed.

I'm going to add some tests.

@pauloxnet

This comment has been minimized.

Contributor

pauloxnet commented Nov 16, 2017

@timgraham I've added a first test for document full-text search and all test passes

@apollo13 can you review my commits related to your requested changes ?

@timgraham

Could you please squash your commits when updating? We prefer rebasing to merging master into a branch (you'll notice that some changes from master are appearing as changes in this PR).

.source(includes=['title']))
results = Document.objects.filter(
release__lang=release.lang

This comment has been minimized.

@timgraham

timgraham Nov 20, 2017

Member

Add trailing comma as mentioned before.

# combine with full text search.
Document.objects.exclude(path__startswith='_modules')
# Not the crazy big flattened index of the CBVs.
.exclude(path__startswith='ref/class-based-views/flattened-index'))

This comment has been minimized.

@timgraham

timgraham Nov 20, 2017

Member

Put the closing ) on the next line.

.exclude(path__startswith='ref/class-based-views/flattened-index'))
for document in documents:
document.metadata['breadcrumbs'] = list(
Document.objects.breadcrumbs(document).values('title', 'path'))

This comment has been minimized.

@timgraham

timgraham Nov 20, 2017

Member

Put the closing ) on the next line.

class DocumentManagerTestCase(TestCase):
def setUp(self):

This comment has been minimized.

@timgraham

timgraham Nov 20, 2017

Member

Use setUpTestData().

class DocumentManagerTestCase(TestCase):
def setUp(self):
# We need to create some Document to search.

This comment has been minimized.

@timgraham

timgraham Nov 20, 2017

Member

Comment can be omitted.

},
'path': 'releases/1.9.4',
'release': dev,
'title': 'Django 1.9.4 release notes'}

This comment has been minimized.

@timgraham

timgraham Nov 20, 2017

Member

Put the closing } on the next line.

@@ -321,3 +323,77 @@ def test_sitemap_404(self):
response.context['exception'],
"No sitemap available for section: 'xx'"
)
class DocumentManagerTestCase(TestCase):

This comment has been minimized.

@timgraham

timgraham Nov 20, 2017

Member

TestCase -> Tests
(TestCase implies something that's meant to be subclassed that doesn't contain any tests itself.)

self.log('Failed indexing item %s' % id_)
# Alias the main 'docs' index to the new one.
DocumentDocType.alias_to_main_index(new_index_name)
Document.objects.update(search=None)

This comment has been minimized.

@timgraham

timgraham Nov 20, 2017

Member

How long does this management command take to run? Are search results unavailable while it's updating?

This comment has been minimized.

@pauloxnet

pauloxnet Nov 20, 2017

Contributor

Locally it takes ~15 seconds mostly because bredcrumbs() method execution, but I think we can optimize it, the search vector update is very fast instead.
During updated index execution the search works but with no results.

This comment has been minimized.

@timgraham

timgraham Nov 22, 2017

Member

I don't think we can have seconds of downtime every time the documentation updates. Seems like there should be a way to avoid that.

This comment has been minimized.

@pauloxnet

pauloxnet Nov 22, 2017

Contributor

I'm trying to think of a way to minimize idle times during the indexing phase.

I realize that with elastic you could have multiple indexes at the same time, but for curiosity I would be interested in knowing the indexing times with it.

This comment has been minimized.

@pauloxnet

pauloxnet Nov 22, 2017

Contributor

I've sent a commit just now with a first try to avoid downtime during index updating.

I've moved all document related update into update_docs command.

I've used transaction in update_index where we now only refresh the indexes (now it takes less than 3 seconds)

This comment has been minimized.

@pauloxnet

pauloxnet Nov 22, 2017

Contributor

Yes, search field contain the index for each Document and we need to delete this index from all documents before update it using DOCUMENT_SEARCH_VECTOR so we are sure no old/dead/removed/deprecated/etc... document will shown in the search results.

This comment has been minimized.

@timgraham

timgraham Nov 22, 2017

Member

Aren't obsolete documents removed when the update_docs command calls sync_to_db()?

@transaction.atomic
def sync_to_db(self, decoded_documents):
"""
Sync the given list of documents (decoded fjson files from sphinx) to
the database. Deletes all the release's documents first then
reinserts them as needed.
"""
self.documents.all().delete()
# Read excluded paths from robots.docs.txt.
robots_path = settings.BASE_DIR.joinpath('djangoproject', 'static', 'robots.docs.txt')
with open(str(robots_path), 'r') as fh:
excluded_paths = [
line.strip().split('/')[-1] for line in fh
if line.startswith("Disallow: /%s/%s/" % (self.lang, self.release_id))
]
for document in decoded_documents:
if ('body' not in document or 'title' not in document or
document['current_page_name'].split('/')[0] in excluded_paths):
# We don't care about indexing documents with no body or title, or partially translated
continue
Document.objects.create(
release=self,
path=_clean_document_path(document['current_page_name']),
title=unescape_entities(strip_tags(document['title'])),
)

This comment has been minimized.

@pauloxnet

pauloxnet Nov 22, 2017

Contributor

But you can use update_index without update_docs.

Anyway, the deletion of all the indexes takes a few milliseconds within a transaction and I can't see any side effects, but I think it gives us the certainty that we always have clean indexes.

Do you think erasing search field before update it will be a problem ?

This comment has been minimized.

@timgraham

timgraham Nov 22, 2017

Member

I don't see the point of the extra work. It seems to me the only way it could have an effect is if one of the excluded documents (exclude(path__startswith='_modules')..exclude(path__startswith='ref/class-based-views/flattened-index')) were indexed at some point, or if a new excluded document were added. I guess it's fine to keep for that latter reason, although it would be less expensive to construct a queryset of the excluded documents and instead set only those to None.

This comment has been minimized.

@pauloxnet

pauloxnet Nov 22, 2017

Contributor

I thinks the reason you've written is enough to leave the index deletion before the update.

If you want I can construct a queryset excluded_documents and set only those to None.

'release__release'
).filter(
release_id=release.id,
search=search_query

This comment has been minimized.

@timgraham

timgraham Nov 20, 2017

Member

Add trailing comma

@@ -200,6 +206,26 @@ def breadcrumbs(self, document):
else:
return self.none()
def search(self, text, release):
"""
Full Text-Search module for searching given text into search fileds.

This comment has been minimized.

@timgraham

timgraham Nov 20, 2017

Member

I don't think "Full Text-search" should be title-cased. I'm not sure if "module" is the correct word?

@pauloxnet

This comment has been minimized.

Contributor

pauloxnet commented Nov 20, 2017

@timgraham I committed all requested changes, but what are the right steps to squash all my commits and to permit you to merge ? Do I have rebase it first ?

@timgraham

This comment has been minimized.

Member

timgraham commented Nov 20, 2017

I think the easiest thing to do is merge in master and then run git diff master > ~/patch.diff and then apply that patch in a new branch.

changes addressed

@timgraham timgraham force-pushed the pauloxnet:pg_fts branch from 52de3a0 to 1be9a48 Nov 22, 2017

@timgraham timgraham changed the title from PostgreSQL Full-Text Search version to Updated docs search to use PostgreSQL full-text search Nov 22, 2017

@timgraham timgraham merged commit 1be9a48 into django:master Nov 22, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@timgraham

This comment has been minimized.

Member

timgraham commented on .travis.yml in 1be9a48 Nov 22, 2017

(This patch is by @pauloxnet -- I forgot to use the correct author when amending it.)

@timgraham

This comment has been minimized.

Member

timgraham commented Nov 22, 2017

This is deployed now. I should have tested it a bit more, I guess. For example, searching for get_fieldsets doesn't return any results. Do you have time to work on that?

@pauloxnet

This comment has been minimized.

Contributor

pauloxnet commented Nov 22, 2017

@timgraham I thought that there would be a test phase before deploy on the official site.

Anyway on my system searching for get_fiedlsets works: 10 results for get_fieldsets in version 1.11

I think something went wrong with your update because I can't see breadcrumbs at all and only 6 result for field search 6 results for field in version 1.11 locally this is the result 198 results for field in version 1.11

The code is the same, maybe there's some logs to inspect ?

@timgraham

This comment has been minimized.

Member

timgraham commented Nov 22, 2017

Oh, now that I think about it, I guess a complete rebuild of the documentation is required, not just update_index.

@pauloxnet

This comment has been minimized.

Contributor

pauloxnet commented Nov 22, 2017

If you ping me after update_docs completion I'll try to check the documentation search.

@timgraham

This comment has been minimized.

Member

timgraham commented Nov 22, 2017

The build has completed and the search results are looking better.

@charettes

This comment has been minimized.

Member

charettes commented Nov 22, 2017

I'm not sure what's broken but it looks like it's still having issues dealing why simple search requests

e.g. the following doesn't return any result.

https://docs.djangoproject.com/en/1.11/search/?q=override_settings

@charettes

This comment has been minimized.

Member

charettes commented Nov 22, 2017

@timgraham

This comment has been minimized.

Member

timgraham commented Nov 22, 2017

It might be that the 1.11 docs didn't update -- the same queries seem to work with 2.0. I'll try to retrigger that build.

@timgraham

This comment has been minimized.

Member

timgraham commented Nov 22, 2017

I found this:

>>> Document.objects.filter(release__release='1.11').filter(search__isnull=True).count()
1972
>>> Document.objects.filter(release__release='1.11').exclude(search__isnull=True).count()
387
>>> Document.objects.filter(release__release='2.0').filter(search__isnull=True).count()
149
>>> Document.objects.filter(release__release='2.0').exclude(search__isnull=True).count()
391
>>> Document.objects.filter(release__release='1.10').exclude(search__isnull=True).count()
1012
>>> Document.objects.filter(release__release='1.10').filter(search__isnull=True).count()
1266

And then ran this:

>>> Document.objects.filter(release__release='1.11', search__isnull=True).exclude(path__startswith='_modules').exclude(path__startswith='ref/class-based-views/flattened-index').update(search=DOCUMENT_SEARCH_VECTOR)
681

Now results for 1.11 are appearing. Is it possible that trying to update all documents at once (as update_index does) fails silently?

@pauloxnet

This comment has been minimized.

Contributor

pauloxnet commented Nov 23, 2017

I don't understand when these numbers were detected.
However, the update_index is decorated with a @transaction.atomic maybe this could cause the behavior you have hypothesized.
Anyway, now the search function seems to work well for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment