<a href="https://colab.research.google.com/github/rawkintrevo/caikit-nlp/blob/101-b/examples/Caikit_Getting_Started.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Embeddings Taks Examples

### Installation and Setup

In this example Jupyter notebook, we'll be caikit-nlp to run the embeddings tasks available.

### Installing `caikit` and `caikit-nlp`

Next, we'll install specific versions of the caikit and caikit-nlp libraries, as the project is still in beta and breaking changes can happen.

In [2]:
!pip install git+https://github.com/caikit/caikit | tail -n 1
!pip install git+https://github.com/caikit/caikit-nlp  | tail -n 1

  Running command git clone --filter=blob:none --quiet https://github.com/caikit/caikit /private/var/folders/5x/cztshy892cbf92p2fdgqlxhc0000gn/T/pip-req-build-1dc7bgl5
Successfully installed caikit-0.26.24.dev2+g2d02e00
  Running command git clone --filter=blob:none --quiet https://github.com/caikit/caikit-nlp /private/var/folders/5x/cztshy892cbf92p2fdgqlxhc0000gn/T/pip-req-build-4zjb5wzg
Successfully installed caikit-nlp-0.4.11 grpcio-1.63.0 grpcio-health-checking-1.62.2 grpcio-reflection-1.62.2


### Import the EmbeddingsModule
Then we instantiate the caikit module that contains the embeddings taks we want to run.

In [3]:
from caikit_nlp.modules.text_embedding import EmbeddingModule

<function register_backend_type at 0x105d622a0> is still in the BETA phase and subject to change!


### Loading the model

When running the code from the module without the runtime, we need to load the model we want to use by passing the path to it's directory that contains the bootstraped `config.yaml` and the `artifacts` folder.

> Make sure you get the correct path from the model downloaded at the [Models](./README.md#models) section of the documentation.

In [4]:
embeddings_module = EmbeddingModule.load('caikit-nlp/examples/embeddings/models/all-minilm-l6-v2/')

In [5]:
seq = "Generate a summary of the context that answers the question. Explain the answer in multiple steps if possible. Answer style should match the context. Ideal Answer Length 2-3 sentences. To start a huddle: In Slack, open a channel or DM. Huddles work in Slack Connect, including Slack Connect DMs. On the bottom-left of your Slack sidebar, Open mini window icon. For more details, refer to Available Features. You can also start a huddle in a channel or DM. In the upper-right corner of your message window, click the headphones toggle." 

####  Code to retrieve embeddings

In [9]:
embeddings_response = embeddings_module.run_embeddings(texts=[seq], truncate_input_tokens=0)
embeddings_response

EmbeddingResults(results={
  "vectors": [
    {
      "data": {
        "values": [
          -0.021235033869743347,
          0.005407060030847788,
          -0.05031250789761543,
          0.02103376016020775,
          -0.01952761597931385,
          0.04022081568837166,
          0.05267741531133652,
          0.07997128367424011,
          -0.05027708038687706,
          0.0029696396086364985,
          0.02008694037795067,
          -0.022737698629498482,
          -0.03342238441109657,
          -0.02924107201397419,
          0.07941676676273346,
          0.08066023886203766,
          0.012848478741943836,
          -0.08107789605855942,
          0.013536344282329082,
          0.004463522229343653,
          0.07578443735837936,
          -0.024243105202913284,
          0.009474135003983974,
          0.001784364227205515,
          0.011258895508944988,
          -0.010532456450164318,
          0.004065200220793486,
          0.03179578483104706,
          -0.00573662575

#### Sentence Similarity task

In [6]:
ss_response = embeddings_module.run_sentence_similarity(
    source_sentence="This is an apple", 
    sentences=["This is another apple", "This is a banana"],
    truncate_input_tokens=0)
ss_response

{
  "result": {
    "scores": [
      0.8578777313232422,
      0.5489557981491089
    ]
  },
  "producer_id": {
    "name": "EmbeddingModule",
    "version": "0.0.1"
  },
  "input_token_count": 18
}

### Reranker 

In [7]:
rr_results = embeddings_module.run_rerank_query(
    documents= [
    {
      "text": "first sentence",
      "additionalProp1": 0,
      "additionalProp2": 0,
      "additionalProp3": 0
    },

 {
      "text": "second sentence",
      "additionalProp1": 0,
      "additionalProp2": 0,
      "additionalProp3": 0
    }

  ],
  query="second")
rr_results

{
  "result": {
    "query": "second",
    "scores": [
      {
        "document": {
          "text": "second sentence",
          "additionalProp1": 0,
          "additionalProp2": 0,
          "additionalProp3": 0
        },
        "index": 1,
        "score": 0.5184812545776367,
        "text": "second sentence"
      },
      {
        "document": {
          "text": "first sentence",
          "additionalProp1": 0,
          "additionalProp2": 0,
          "additionalProp3": 0
        },
        "index": 0,
        "score": 0.4005824625492096,
        "text": "first sentence"
      }
    ]
  },
  "producer_id": {
    "name": "EmbeddingModule",
    "version": "0.0.1"
  },
  "input_token_count": 11
}

In [12]:
# more than 1 query
rr_results_multi = embeddings_module.run_rerank_queries(
    documents=[
    {
      "text": "first sentence is this is a banana",
      "additionalProp1": 0,
      "additionalProp2": 0,
      "additionalProp3": 0
    },

 {
      "text": "second sentence is this is an apple",
      "additionalProp1": 0,
      "additionalProp2": 0,
      "additionalProp3": 0
    }

  ],
  queries=[
    "banana",
    "is an apple"
  ])
rr_results_multi

{
  "results": [
    {
      "query": "banana",
      "scores": [
        {
          "document": {
            "text": "first sentence is this is a banana",
            "additionalProp1": 0,
            "additionalProp2": 0,
            "additionalProp3": 0
          },
          "index": 0,
          "score": 0.723056972026825,
          "text": "first sentence is this is a banana"
        },
        {
          "document": {
            "text": "second sentence is this is an apple",
            "additionalProp1": 0,
            "additionalProp2": 0,
            "additionalProp3": 0
          },
          "index": 1,
          "score": 0.28278833627700806,
          "text": "second sentence is this is an apple"
        }
      ]
    },
    {
      "query": "is an apple",
      "scores": [
        {
          "document": {
            "text": "second sentence is this is an apple",
            "additionalProp1": 0,
            "additionalProp2": 0,
            "additionalProp3": 0
    