EmbeddingViz

This repository demonstrates using a cold-start recommendation model trained using ThirdAI's offering for a product search application. We pick up here after running the training, and saving embedding for products following the example in the cold-start recommendation demo notebook.

Source organization

The source is organized as follows.

cold_start/ # Using `thirdai` for inference and datagen (Python)
galaxy/     # Visualization frontend (mostly JS)
layout/     # layout algorithm for graph (C++)

Development

We integrate the product search using query from an available catalog to the search-bar for purposes of a demonstration. The frontend view is a single-page app, adapted from anvaka/pm. The backend is built with FastAPI and runs inference using models built using thirdai.

Backend Setup involves installing fastapi and uvicorn, and using a virtual-environment is recommended.

python3 -m venv env
. env/bin/activate
python3 -m pip install fastapi uvicorn

Following installation, a development server can be spawned by running:

cd cold_start 
MODEL_PATH=/path/to/model CATALOG_PATH=/path/to/catalog uvicorn cold_start_fastapi:app

Use the parameters after launch (host, port) to update the apiUrl in frontend's config.js.

Frontend Development follows anvaka/pm to use the node ecosystem. For the development requirements, please do a clean-install via npm, which will fetch dependencies from package.json:

cd galaxy
npm ci # clean-install
node dev-server.js # Spawns a development server

Generating data

Data can be generated from an existing embedding save using generate_graph.py.

# Tweak neighbours and threshold to get a good force-layout.
python3 cold_start/generate_graph.py --catalog /path-to-catalog \
    --embed-path /path/to/embed                                 \
    --output-dir /path/to/output-dir                            \
    --neighbours 20 --threshold 20

This will create links.bin and labels.json in /path/to/output-dir/. This is consumed by the visualization. More details on how to organize this is given below.

Serving data

Data is expected to be served independently from a static source. This can be configured in config.js as dataUrl. A visualization requires the following files, taking the example of amazon-kaggle:

amazon-kaggle
├── manifest.json
├── v1
│   ├── labels.json    # holds labels (string Ids)
│   ├── links.bin      # Binary file holding link information
│   ├── meta.json      # Some metadata.
│   ├── positions.bin  # Positions computed by `layout` using `links.bin`

For more information see anvaka/pm#data-format.

The data can be served from the above root folder using a local HTTP server. Using node ecosystem, one way of doing this is as follows.

npm install -g http-server
http-server --cors -p 8081

Use the parameters (host, port) here to update config.js with the suitable dataUrl.

Layout generation

The force-layout that is required to render the Embedding visualization is created by using layout, which is written in C++ and requires OpenMP. This is adapted from anvaka/ngraph.native.

To get the executable layout which consumes links.bin generated by the python script to generate positions.bin, compile from source using cmake:

cd layout;
mkdir build; cd build;
cmake ..
make -j3

For amazon-kaggle with the directly structure described before. This process can take a while, depending on the size of your graph.

./layout amazon-kaggle/v1/links.bin
mv positions.bin amazon-kaggle/v1/positions.bin

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cold_start		cold_start
galaxy		galaxy
images		images
layout		layout
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EmbeddingViz

Source organization

Development

Generating data

Serving data

Layout generation

About

Releases

Packages

Languages

ThirdAILabs/EmbeddingViz

Folders and files

Latest commit

History

Repository files navigation

EmbeddingViz

Source organization

Development

Generating data

Serving data

Layout generation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages