Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimodal search tutorial #202

Merged
merged 24 commits into from
Nov 12, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
0231eb4
Started notebook
alexklibisz Nov 8, 2020
a84b149
Showing some promising results
alexklibisz Nov 8, 2020
704ddf8
using docker-compose and watch example is decent
alexklibisz Nov 8, 2020
86c34fe
More progress
alexklibisz Nov 9, 2020
60e0dc5
Some more progress. Probably need to implement score function.
alexklibisz Nov 9, 2020
e8c9368
Start your engines
alexklibisz Nov 9, 2020
85e7fb4
KnnScoreFunction passing tests locally
alexklibisz Nov 9, 2020
06f11a4
Merge branch 'score-function' into multimodal-search-tutorial
alexklibisz Nov 9, 2020
598b5db
Fixed cross-node serialization. clustered tests should work now.
alexklibisz Nov 9, 2020
fde3862
Merge branch 'master' into multimodal-search-tutorial
alexklibisz Nov 10, 2020
f2b9e79
More progress but hit a bug that needs to be fixed
alexklibisz Nov 11, 2020
41c1cbb
Check if docID is exhausted before getting score
alexklibisz Nov 11, 2020
7f856bd
changelog
alexklibisz Nov 11, 2020
e83b222
Merge branch 'check-no-more-docs' into multimodal-search-tutorial
alexklibisz Nov 11, 2020
b949eab
Persisting ES indices across runs. First pass at ElastiknnQuery trait…
alexklibisz Nov 12, 2020
19dfa9b
Hashing query works with new setup
alexklibisz Nov 12, 2020
2680cd6
All but sparse indexed passing. Need to put that back.
alexklibisz Nov 12, 2020
1755ec1
NearestNeighborsQuerySpec passes
alexklibisz Nov 12, 2020
17d6682
Remove tutorial
alexklibisz Nov 12, 2020
24b151a
Update docs and changelog
alexklibisz Nov 12, 2020
d352300
Add back tutorial-notebooks dir
alexklibisz Nov 12, 2020
ad010e3
Merge branch 'master' into multimodal-search-tutorial
alexklibisz Nov 12, 2020
2c883f6
Reformatted. Will probably just link to the notebook.
alexklibisz Nov 12, 2020
0660a61
Good enough
alexklibisz Nov 12, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2,401 changes: 2,401 additions & 0 deletions docs/pages/tutorials/multimodal-search-raw.html

Large diffs are not rendered by default.

11 changes: 11 additions & 0 deletions docs/pages/tutorials/multimodal-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
layout: default
title: Multimodal Search
parent: Tutorials
---

<!--
jupyter nbconvert --to html --template basic amazon-products-multi-modal-search.ipynb --stdout > ../../docs/pages/tutorials/multimodal-search-raw.html
-->

{% include_relative multimodal-search-raw.html %}
10 changes: 10 additions & 0 deletions docs/pages/tutorials/tutorials.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
layout: default
title: Tutorials
nav_order: 6
has_children: true
permalink: tutorials
---

# Tutorials
{: .no_toc }
134 changes: 134 additions & 0 deletions examples/tutorial-notebooks/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# Data
*.json
*.json.gz
*.b
2 changes: 2 additions & 0 deletions examples/tutorial-notebooks/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
FROM docker.elastic.co/elasticsearch/elasticsearch:7.9.2-amd64
RUN elasticsearch-plugin install -b https://github.com/alexklibisz/elastiknn/releases/download/0.1.0-PRE49/elastiknn-0.1.0-PRE49_es7.9.2.zip
1,960 changes: 1,960 additions & 0 deletions examples/tutorial-notebooks/amazon-products-multi-modal-search.ipynb

Large diffs are not rendered by default.

49 changes: 49 additions & 0 deletions examples/tutorial-notebooks/amazonutils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
from IPython.display import display, HTML, Image
from itertools import islice
import numpy as np
import json
import gzip
import array

def iter_products(fname):
with gzip.open(fname, 'r') as g:
for l in g:
yield eval(l)

def iter_vectors(fname):
with open(fname, 'rb') as f:
while True:
try:
asin = f.read(10)
a = array.array('f')
a.fromfile(f, 4096)
yield (asin.decode(), a.tolist())
except EOFError:
break

def iter_vectors_reduced(fname, dims=1024, samples=10000):
sumarr = np.zeros(4096) * 1.0
for (_, v) in islice(iter_vectors(fname), samples):
sumarr -= np.array(v)
ii = np.argsort(sumarr)[:dims]

def f(fname):
for (asin, vec) in iter_vectors(fname):
yield (asin, np.array(vec)[ii].tolist())

return f

def display_hits(res):
print(f"Found {res['hits']['total']['value']} hits in {res['took']} ms. Showing top {len(res['hits']['hits'])}.")
print("")
for hit in res['hits']['hits']:
s = hit['_source']
print(f"Title {s.get('title', None)}"[:80] + "...")
if 'description' in s:
print(f"Desc {s['description']}"[:80] + "...")
if 'price' in s:
print(f"Price {s['price']}")
print(f"ID {s.get('asin', None)}")
print(f"Score {hit.get('_score', None)}")
display(Image(s.get("imUrl"), width=128))
print("")
31 changes: 31 additions & 0 deletions examples/tutorial-notebooks/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
version: "3.8"

services:
elasticsearch_master:
build:
context: .
dockerfile: Dockerfile
container_name: elasticsearch_master
environment:
- node.name=elasticsearch_master
- cluster.name=docker-cluster
- cluster.initial_master_nodes=elasticsearch_master
- node.master=true
- bootstrap.memory_lock=true
- http.cors.enabled=true
- http.cors.allow-origin=*
- ES_JAVA_OPTS=-Xms4G -Xmx4G
ports:
- "9200:9200"
volumes:
- data01:/usr/share/elasticsearch/data
ulimits:
nofile:
soft: 65536
hard: 65536
memlock:
soft: -1
hard: -1
volumes:
data01:
driver: local
4 changes: 4 additions & 0 deletions examples/tutorial-notebooks/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
matplotlib
jupyter
tqdm
elasticsearch