This repository demonstrates using a cold-start recommendation model trained using ThirdAI's offering for a product search application. We pick up here after running the training, and saving embedding for products following the example in the cold-start recommendation demo notebook.
The source is organized as follows.
cold_start/ # Using `thirdai` for inference and datagen (Python)
galaxy/ # Visualization frontend (mostly JS)
layout/ # layout algorithm for graph (C++)
We integrate the product search using query from an available catalog to the
search-bar for purposes of a demonstration. The frontend view is a single-page
app, adapted from anvaka/pm. The backend is
built with FastAPI and runs inference using models built using thirdai
.
Backend Setup involves installing fastapi
and uvicorn
, and using a
virtual-environment is recommended.
python3 -m venv env
. env/bin/activate
python3 -m pip install fastapi uvicorn
Following installation, a development server can be spawned by running:
cd cold_start
MODEL_PATH=/path/to/model CATALOG_PATH=/path/to/catalog uvicorn cold_start_fastapi:app
Use the parameters after launch (host, port) to update the apiUrl
in
frontend's config.js.
Frontend Development follows anvaka/pm to use the node ecosystem. For the development requirements, please do a clean-install via npm, which will fetch dependencies from package.json:
cd galaxy
npm ci # clean-install
node dev-server.js # Spawns a development server
Data can be generated from an existing embedding save using generate_graph.py.
# Tweak neighbours and threshold to get a good force-layout.
python3 cold_start/generate_graph.py --catalog /path-to-catalog \
--embed-path /path/to/embed \
--output-dir /path/to/output-dir \
--neighbours 20 --threshold 20
This will create links.bin
and labels.json
in /path/to/output-dir/
. This
is consumed by the visualization. More details on how to organize this is given
below.
Data is expected to be served independently from a static source. This can be
configured in config.js as dataUrl
. A visualization
requires the following files, taking the example of amazon-kaggle
:
amazon-kaggle
├── manifest.json
├── v1
│ ├── labels.json # holds labels (string Ids)
│ ├── links.bin # Binary file holding link information
│ ├── meta.json # Some metadata.
│ ├── positions.bin # Positions computed by `layout` using `links.bin`
For more information see anvaka/pm#data-format.
The data can be served from the above root folder using a local HTTP server. Using node ecosystem, one way of doing this is as follows.
npm install -g http-server
http-server --cors -p 8081
Use the parameters (host, port) here to update config.js
with the suitable dataUrl
.
The force-layout that is required to render the Embedding visualization is created by using layout, which is written in C++ and requires OpenMP. This is adapted from anvaka/ngraph.native.
To get the executable layout
which consumes links.bin
generated by the
python script to generate positions.bin
, compile from source using cmake
:
cd layout;
mkdir build; cd build;
cmake ..
make -j3
For amazon-kaggle
with the directly structure described before. This process
can take a while, depending on the size of your graph.
./layout amazon-kaggle/v1/links.bin
mv positions.bin amazon-kaggle/v1/positions.bin