# Demo 2

In this demo, we start with a dataset, train a new DeezyMatch model and use it for the task of candidate ranking. The steps are as follows:

1. Train a new DeezyMatch model using **a toy dataset with 1K rows**. We chose such a small dataset so that the training can be done within few seconds. If you want to try this demo on other **realistic** datasets, we have:

```python
dataset_path="../../dataset/BL_IAMS_geonames.tsv"
```

on the repo.

2. Fine-tune the model trained in step 1.
3. Model inference.
4. Generate query and candidate vector representations, assemble them so that they can be used for the next steps.
5. Candidate ranking using a set of "static" queries.
6. Candidate ranking on the fly.

## Train a new model

In [None]:
from DeezyMatch import train as dm_train

# train a new model
dm_train(input_file_path="./inputs/input_dfm_demo2.yaml", 
         dataset_path="../../dataset/dummy_trainset.txt",
         model_name="demo2_model")

## Finetune a pretrained model

In [None]:
from DeezyMatch import finetune as dm_finetune

# fine-tune a pretrained model stored at pretrained_model_path and pretrained_vocab_path 
dm_finetune(input_file_path="./inputs/input_dfm_demo2.yaml", 
            dataset_path="../../dataset/dummy_trainset.txt", 
            model_name="ft_demo2_model",
            pretrained_model_path="./models/demo2_model/demo2_model.model", 
            pretrained_vocab_path="./models/demo2_model/demo2_model.vocab")

## Model inference

In [None]:
from DeezyMatch import inference as dm_inference

# model inference using a model stored at pretrained_model_path and pretrained_vocab_path 
dm_inference(input_file_path="./inputs/input_dfm_demo2.yaml",
             dataset_path="../../dataset/dummy_trainset.txt", 
             pretrained_model_path="./models/ft_demo2_model/ft_demo2_model.model", 
             pretrained_vocab_path="./models/ft_demo2_model/ft_demo2_model.vocab")

## Generate query vectors

In [None]:
from DeezyMatch import inference as dm_inference

# generate vectors for queries (specified in dataset_path) 
# using a model stored at pretrained_model_path and pretrained_vocab_path 
dm_inference(input_file_path="./inputs/input_dfm_demo2.yaml",
            dataset_path="../../query_scenarios/ukcounties_queries.txt", 
            pretrained_model_path="./models/ft_demo2_model/ft_demo2_model.model", 
            pretrained_vocab_path="./models/ft_demo2_model/ft_demo2_model.vocab",
            inference_mode="vect",
            scenario="queries/demo2")

## Generate candidate vectors

In [None]:
from DeezyMatch import inference as dm_inference

# generate vectors for candidates (specified in dataset_path) 
# using a model stored at pretrained_model_path and pretrained_vocab_path 
dm_inference(input_file_path="./inputs/input_dfm_demo2.yaml",
            dataset_path="../../candidate_scenarios/ukcounties_candidates.txt", 
            pretrained_model_path="./models/ft_demo2_model/ft_demo2_model.model", 
            pretrained_vocab_path="./models/ft_demo2_model/ft_demo2_model.vocab",
            inference_mode="vect",
            scenario="candidates/demo2")

## Assembling queries vector representations

In [None]:
from DeezyMatch import combine_vecs

# combine vectors stored in queries/test and save them in combined/queries_test
combine_vecs(rnn_passes=['fwd', 'bwd'], 
             input_scenario='queries/demo2', 
             output_scenario='combined/queries_demo2', 
             print_every=10)

## Assembling candidates vector representations

In [None]:
from DeezyMatch import combine_vecs

# combine vectors stored in candidates/test and save them in combined/candidates_test
combine_vecs(rnn_passes=['fwd', 'bwd'], 
             input_scenario='candidates/demo2', 
             output_scenario='combined/candidates_demo2', 
             print_every=10)

## Candidate Ranker

In [None]:
from DeezyMatch import candidate_ranker

# Select candidates based on L2-norm distance (aka faiss distance):
# find candidates from candidate_scenario 
# for queries specified in query_scenario
candidates_pd = \
    candidate_ranker(query_scenario="./combined/queries_demo2",
                     candidate_scenario="./combined/candidates_demo2", 
                     ranking_metric="faiss", 
                     selection_threshold=100., 
                     num_candidates=5, 
                     search_size=5, 
                     output_path="ranker_results/candidates_deezymatch_demo2", 
                     pretrained_model_path="./models/ft_demo2_model/ft_demo2_model.model", 
                     pretrained_vocab_path="./models/ft_demo2_model/ft_demo2_model.vocab")

In [None]:
candidates_pd

## Candidate ranking on-the-fly

In [None]:
from DeezyMatch import candidate_ranker

# Ranking on-the-fly
# find candidates from candidate_scenario 
# for queries specified by the `query` argument
candidates_pd = \
    candidate_ranker(query=["lincoln", "warwick"],
                     candidate_scenario="./combined/candidates_demo2", 
                     ranking_metric="faiss", 
                     selection_threshold=100., 
                     num_candidates=5, 
                     search_size=5, 
                     output_path="ranker_results/candidates_deezymatch_demo2", 
                     pretrained_model_path="./models/ft_demo2_model/ft_demo2_model.model", 
                     pretrained_vocab_path="./models/ft_demo2_model/ft_demo2_model.vocab")

In [None]:
candidates_pd