
## PostgreSQL specific lookups¶

### [Trigram similarity]

New in Django 1.10.

The `trigram_similar` lookup allows you to perform trigram lookups, measuring the number of trigrams (three consecutive characters) shared, using a dedicated PostgreSQL extension. A trigram lookup is given an expression and returns results that have a similarity measurement greater than the current similarity threshold.

To use it, add `'django.contrib.postgres'` in your `INSTALLED_APPS` and activate the `pg_trgm` extension on PostgreSQL. You can install the extension using the `TrigramExtension` migration operation.

The `trigram_similar` lookup can be used on `CharField` and `TextField`:

    >>> City.objects.filter(name__trigram_similar="Middlesborough")
    ['<City: Middlesbrough>']

[Trigram similarity]: https://docs.djangoproject.com/en/1.11/ref/contrib/postgres/lookups/#trigram-similarity


In [1]:
ImageFile.objects.filter(_imagehash__trigram_similar='949675aa79790994')

<ImageFileQuerySet [<ImageFile: 36797861831-c19e166530-o-9jwwS4b.jpg>, <ImageFile: 19-MAG-ungdomspolitikk-OA-18-SPLI2Zr.jpg>]>

In [2]:
from pathlib import Path
import imagehash
import PIL
import requests
from IPython.display import HTML
from io import BytesIO
import base64

In [3]:
def img2html(image_file):
    img = PIL.Image.open(image_file.small)
    blob = BytesIO()
    img.save(blob, 'png')
    blob.seek(0)
    data = base64.b64encode(blob.read()).decode('ascii')
    blob.close()
    return '<img src="data:image/png;base64,{data}" />'.format(data=data)

In [4]:
def show_duplicates(images):
    master = images[0]
    html = f'<h2>{master}</h2>'
    for image in images:
        data = {
             'src':      image.original,
             'hash':     image.imagehash,
             'md5':      image.md5,
             'filesize': image.size,
             'size':     (image.full_width, image.full_height),
             'diff':     abs(image.imagehash - master.imagehash),
        }
        title = '\n'.join(f'{k:<10}: {v}' for k, v in data.items())
        html += f'<div style="width: calc(50% - 10px); display: inline-block; padding: 5px" >'
        html += img2html(image)
        html += f'<pre style="position: relative; background: rgba(255, 255, 255, 0.8)">{title}</pre>'
        html += f'</div>'
    html += '<hr/>'   
    return html

In [None]:
def show_near_duplicates_trigram(field='imagehash'):
    qs = ImageFile.objects.order_by('pk')
    processed = set() 
    output = ''
    for im in qs:
        items = im.similar(field) & qs.exclude(pk__lt=im.pk).exclude(pk__in=processed)
        if items:
            processed.update([item.pk for item in items])
            output += show_duplicates(qs.filter(pk=im.pk) | items)
    return HTML(output)

show_near_duplicates_trigram('md5')