# Vectorizers

In this notebook, we will show how to use RedisVL to create embeddings using the built-in text embedding vectorizers. Today RedisVL supports:
1. OpenAI
2. HuggingFace
3. Vertex AI


In [1]:
import os

# set redis address
username = "default"
host = "<enter your redis host here>"
port = "<enter your redis port here>"
password = "<enter your redis password here>"


REDIS_URL = f"redis://{username}:{password}@{host}:{port}"
os.environ["REDIS_URL"] = REDIS_URL

## Creating Text Embeddings

This example will show how to create an embedding from 3 simple sentences with a number of different text vectorizers in RedisVL.

- "That is a happy dog"
- "That is a happy person"
- "Today is a nice day"


### OpenAI

The ``OpenAITextVectorizer`` makes it simple to use RedisVL with the embeddings models at OpenAI. For this you will need to install ``openai``. 

```bash
pip install openai
```


In [2]:
import getpass

# setup the API Key
api_key = os.environ.get("OPENAI_API_KEY") or getpass.getpass("Enter your OpenAI API key: ")

In [3]:
from redisvl.vectorize.text import OpenAITextVectorizer

# create a vectorizer
oai = OpenAITextVectorizer(
    model="text-embedding-ada-002",
    api_config={"api_key": api_key},
)

test = oai.embed("This is a test sentence.")
print("Vector dimensions: ", len(test))
test[:10]

Vector dimensions:  1536


[-0.001046799123287201,
 -0.0031105349771678448,
 0.0024228920228779316,
 -0.004480978474020958,
 -0.010343699716031551,
 0.012758520431816578,
 -0.00535263866186142,
 -0.003002384677529335,
 -0.007115328684449196,
 -0.03378167003393173]

In [4]:
# Create many embeddings at once
sentences = [
    "That is a happy dog",
    "That is a happy person",
    "Today is a sunny day"
]

embeddings = oai.embed_many(sentences)
embeddings[0][:10]

[-0.017399806529283524,
 -2.3427608653037169e-07,
 0.0014656063867732882,
 -0.02562308870255947,
 -0.019890939816832542,
 0.016027139499783516,
 -0.0036763285752385855,
 0.0008253469131886959,
 0.006609130185097456,
 -0.025165533646941185]

In [5]:
# openai also supports asyncronous requests, which we can use to speed up the vectorization process.
embeddings = await oai.aembed_many(sentences)
print("Number of Embeddings:", len(embeddings))


Number of Embeddings: 3


### Huggingface

[Huggingface](https://huggingface.co/models) is a popular NLP platform that has a number of pre-trained models you can use off the shelf. RedisVL supports using Huggingface "Sentence Transformers" to create embeddings from text. To use Huggingface, you will need to install the ``sentence-transformers`` library.

```bash
pip install sentence-transformers
```

In [6]:
os.environ["TOKENIZERS_PARALLELISM"] = "false"
from redisvl.vectorize.text import HFTextVectorizer


# create a vectorizer
# choose your model from the huggingface website
hf = HFTextVectorizer(model="sentence-transformers/all-mpnet-base-v2")

# embed a sentence
test = hf.embed("This is a test sentence.")
test[:10]

  from .autonotebook import tqdm as notebook_tqdm


[0.00037813105154782534,
 -0.05080341547727585,
 -0.03514720872044563,
 -0.023251093924045563,
 -0.04415826499462128,
 0.020487893372774124,
 0.0014619074063375592,
 0.03126181662082672,
 0.056051574647426605,
 0.0188154224306345]

In [7]:
# You can also create many embeddings at once
embeddings = hf.embed_many(sentences, as_buffer=True)


## Search with Provider Embeddings

Now that we've created our embeddings, we can use them to search for similar sentences. We will use the same 3 sentences from above and search for similar sentences.

First, we need to create the schema for our index.

Here's what the schema for the example looks like in yaml for the HuggingFace vectorizer:

```yaml
index:
    name: providers
    prefix: rvl

fields:
    text:
        - name: sentence
    vector:
        - name: embedding
          dims: 768
          algorithm: flat
          distance_metric: cosine
```

In [8]:
from redisvl.index import SearchIndex

# construct a search index from the schema
index = SearchIndex.from_yaml("./schema.yaml")

# connect to local redis instance
index.connect(REDIS_URL)

# create the index (no data yet)
index.create(overwrite=True)

In [9]:
# use the CLI to see the created index
!rvl index listall

[32m01:29:52[0m [34m[RedisVL][0m [1;30mINFO[0m   Using Redis address from environment variable, REDIS_URL
[32m01:29:53[0m [34m[RedisVL][0m [1;30mINFO[0m   Indices:
[32m01:29:53[0m [34m[RedisVL][0m [1;30mINFO[0m   1. user_index
[32m01:29:53[0m [34m[RedisVL][0m [1;30mINFO[0m   2. providers


In [10]:
# load expects an iterable of dictionaries where
# the vector is stored as a bytes buffer

data = [{"text": t,
         "embedding": v}
        for t, v in zip(sentences, embeddings)]

index.load(data)

In [11]:
from redisvl.query import VectorQuery

# use the HuggingFace vectorizer again to create a query embedding
query_embedding = hf.embed("That is a happy cat")

query = VectorQuery(
    vector=query_embedding,
    vector_field_name="embedding",
    return_fields=["text"],
    num_results=3
)

results = index.search(query.query, query_params=query.params)
for doc in results.docs:
    print(doc.text)
    print(doc.vector_distance)

That is a happy dog
0.160862088203
That is a happy person
0.273597955704
Today is a sunny day
0.744559645653
