Skip to content

Commit

Permalink
Octoai embeddings (run-llama#12857)
Browse files Browse the repository at this point in the history
  • Loading branch information
ptorru authored and chrisalexiuk-nvidia committed Apr 25, 2024
1 parent d215ab4 commit b4de677
Show file tree
Hide file tree
Showing 13 changed files with 525 additions and 0 deletions.
138 changes: 138 additions & 0 deletions docs/docs/examples/embeddings/octoai.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/embeddings/octoai.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# OctoAI Embeddings\n",
"\n",
"This guide shows you how to use [OctoAI's Embeddings](https://octo.ai/docs/text-gen-solution/getting-started) through LlamaIndex."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, let's install LlamaIndex and OctoAI's dependencies"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install llama-index-embeddings-octoai"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install llama-index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Include your OctoAI API key below. You can get yours at [OctoAI](https://octo.ai). \n",
"\n",
"[Here](https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token) are some instructions in case you need more guidance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"OCTOAI_API_KEY = \"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can then query embeddings on OctoAI"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from llama_index.embeddings.octoai import OctoAIEmbedding\n",
"\n",
"embed_model = OctoAIEmbedding(api_key=OCTOAI_API_KEY)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Basic embedding example\n",
"embeddings = embed_model.get_text_embedding(\"How do I sail to the moon?\")\n",
"print(len(embeddings), embeddings[:10])\n",
"assert len(embeddings) == 1024"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using Batched Embeddings"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"texts = [\n",
" \"How do I sail to the moon?\",\n",
" \"What is the best way to cook a steak?\",\n",
" \"How do I apply for a job?\",\n",
"]\n",
"\n",
"embeddings = embed_model.get_text_embedding_batch(texts)\n",
"print(len(embeddings))\n",
"assert len(embeddings) == 3\n",
"assert len(embeddings[0]) == 1024"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
poetry.lock

llama_index/_static
.DS_Store
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
bin/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
etc/
include/
lib/
lib64/
parts/
sdist/
share/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
.ruff_cache

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints
notebooks/

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
pyvenv.cfg

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# Jetbrains
.idea
modules/
*.swp

# VsCode
.vscode

# pipenv
Pipfile
Pipfile.lock

# pyright
pyrightconfig.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
poetry_requirements(
name="poetry",
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
GIT_ROOT ?= $(shell git rev-parse --show-toplevel)

help: ## Show all Makefile targets.
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[33m%-30s\033[0m %s\n", $$1, $$2}'

format: ## Run code autoformatters (black).
pre-commit install
git ls-files | xargs pre-commit run black --files

lint: ## Run linters: pre-commit (black, ruff, codespell) and mypy
pre-commit install && git ls-files | xargs pre-commit run --show-diff-on-failure --files

test: ## Run tests via pytest.
pytest tests

watch-docs: ## Build and watch documentation.
sphinx-autobuild docs/ docs/_build/html --open-browser --watch $(GIT_ROOT)/llama_index/
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# LlamaIndex Embeddings Integration: Octoai

Using the [OctoAI](https://octo.ai) Embeddings Integration is a simple as:

```python
from llama_index.embeddings.octoai import OctoAIEmbedding
from os import environ

OCTOAI_API_KEY = environ["OCTOAI_API_KEY"]
embed_model = OctoAIEmbedding(api_key=OCTOAI_API_KEY)
embeddings = embed_model.get_text_embedding("How do I sail to the moon?")
assert len(embeddings) == 1024
```

One can also request a batch of embeddings via:

```python
texts = [
"How do I sail to the moon?",
"What is the best way to cook a steak?",
"How do I apply for a job?",
]

embeddings = embed_model.get_text_embedding_batch(texts)
assert len(embeddings) == 3
```

## API Access

[Here](https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token) are some instructions on how to get your OctoAI API key.

## Contributing

Follow the good practices of all poetry based projects.

When in VScode, one may want to manually select the Python interpreter, specially to run the example iPython notebook. For this use `ctrl+shift+p`, then type or select: `Python: Select Interpreter`
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
python_sources()
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from llama_index.embeddings.octoai.base import OctoAIEmbedding

__all__ = ["OctoAIEmbedding"]
Loading

0 comments on commit b4de677

Please sign in to comment.