Skip to content

Search and neigbourhood visualizations using thirdai

Notifications You must be signed in to change notification settings

ThirdAILabs/EmbeddingViz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EmbeddingViz

This repository demonstrates using a cold-start recommendation model trained using ThirdAI's offering for a product search application. We pick up here after running the training, and saving embedding for products following the example in the cold-start recommendation demo notebook.

Source organization

The source is organized as follows.

cold_start/ # Using `thirdai` for inference and datagen (Python)
galaxy/     # Visualization frontend (mostly JS)
layout/     # layout algorithm for graph (C++)

Development

We integrate the product search using query from an available catalog to the search-bar for purposes of a demonstration. The frontend view is a single-page app, adapted from anvaka/pm. The backend is built with FastAPI and runs inference using models built using thirdai.

Backend Setup involves installing fastapi and uvicorn, and using a virtual-environment is recommended.

python3 -m venv env
. env/bin/activate
python3 -m pip install fastapi uvicorn

Following installation, a development server can be spawned by running:

cd cold_start 
MODEL_PATH=/path/to/model CATALOG_PATH=/path/to/catalog uvicorn cold_start_fastapi:app 

Use the parameters after launch (host, port) to update the apiUrl in frontend's config.js.

Frontend Development follows anvaka/pm to use the node ecosystem. For the development requirements, please do a clean-install via npm, which will fetch dependencies from package.json:

cd galaxy
npm ci # clean-install
node dev-server.js # Spawns a development server

Generating data

Data can be generated from an existing embedding save using generate_graph.py.

# Tweak neighbours and threshold to get a good force-layout.
python3 cold_start/generate_graph.py --catalog /path-to-catalog \
    --embed-path /path/to/embed                                 \
    --output-dir /path/to/output-dir                            \
    --neighbours 20 --threshold 20

This will create links.bin and labels.json in /path/to/output-dir/. This is consumed by the visualization. More details on how to organize this is given below.

Serving data

Data is expected to be served independently from a static source. This can be configured in config.js as dataUrl. A visualization requires the following files, taking the example of amazon-kaggle:

amazon-kaggle
├── manifest.json
├── v1
│   ├── labels.json    # holds labels (string Ids)
│   ├── links.bin      # Binary file holding link information
│   ├── meta.json      # Some metadata.
│   ├── positions.bin  # Positions computed by `layout` using `links.bin`

For more information see anvaka/pm#data-format.

The data can be served from the above root folder using a local HTTP server. Using node ecosystem, one way of doing this is as follows.

npm install -g http-server
http-server --cors -p 8081

Use the parameters (host, port) here to update config.js with the suitable dataUrl.

Layout generation

The force-layout that is required to render the Embedding visualization is created by using layout, which is written in C++ and requires OpenMP. This is adapted from anvaka/ngraph.native.

To get the executable layout which consumes links.bin generated by the python script to generate positions.bin, compile from source using cmake:

cd layout;
mkdir build; cd build;
cmake ..
make -j3

For amazon-kaggle with the directly structure described before. This process can take a while, depending on the size of your graph.

./layout amazon-kaggle/v1/links.bin
mv positions.bin amazon-kaggle/v1/positions.bin