Skip to content

Commit

Permalink
docs: coming from langchain (#1660)
Browse files Browse the repository at this point in the history
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
  • Loading branch information
jupyterjazz committed Jun 19, 2023
1 parent deb892f commit 4e6bf49
Show file tree
Hide file tree
Showing 2 changed files with 87 additions and 2 deletions.
85 changes: 84 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,9 @@ DocArray handles your data while integrating seamlessly with the rest of your **
> - [Coming from Pydantic](#coming-from-pydantic)
> - [Coming from FastAPI](#coming-from-fastapi)
> - [Coming from a vector database](#coming-from-vector-database)
> - [Coming from Langchain](#coming-from-langchain)
DocArray was released under the open-source [Apache License 2.0](https://github.com/docarray/docarray/blob/main/LICENSE) in January 2022. It is currently a sandbox project under [LF AI & Data Foundation](https://lfaidata.foundation/).
DocArray has been distributed under the open-source [Apache License 2.0](https://github.com/docarray/docarray/blob/main/LICENSE) since January 2022. It is currently a sandbox project under [LF AI & Data Foundation](https://lfaidata.foundation/).

## Represent

Expand Down Expand Up @@ -776,6 +777,88 @@ Of course this is only one of the things that DocArray can do, so we encourage y
</details>


## Coming from Langchain

<details markdown="1">
<summary>Click to expand</summary>

With DocArray, you can connect external data to LLMs through Langchain. DocArray gives you the freedom to establish
flexible document schemas and choose from different backends for document storage.
After creating your document index, you can connect it to your Langchain app using [DocArrayRetriever](https://python.langchain.com/docs/modules/data_connection/retrievers/integrations/docarray_retriever).

Install Langchain via:
```shell
pip install langchain
```

1. Define a schema and create documents:
```python
from docarray import BaseDoc, DocList
from docarray.typing import NdArray
from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

# Define a document schema
class MovieDoc(BaseDoc):
title: str
description: str
year: int
embedding: NdArray[1536]


movies = [
{"title": "#1 title", "description": "#1 description", "year": 1999},
{"title": "#2 title", "description": "#2 description", "year": 2001},
]

# Embed `description` and create documents
docs = DocList[MovieDoc](
MovieDoc(embedding=embeddings.embed_query(movie["description"]), **movie)
for movie in movies
)
```

2. Initialize a document index using any supported backend:
```python
from docarray.index import (
InMemoryExactNNIndex,
HnswDocumentIndex,
WeaviateDocumentIndex,
QdrantDocumentIndex,
ElasticDocIndex,
)

# Select a suitable backend and initialize it with data
db = InMemoryExactNNIndex[MovieDoc](docs)
```

3. Finally, initialize a retriever and integrate it into your chain!
```python

from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.retrievers import DocArrayRetriever


# Create a retriever
retriever = DocArrayRetriever(
index=db,
embeddings=embeddings,
search_field="embedding",
content_field="description",
)

# Use the retriever in your chain
model = ChatOpenAI()
qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)
```

Alternatively, you can use built-in vector stores. Langchain supports two vector stores: [DocArrayInMemorySearch](https://python.langchain.com/docs/modules/data_connection/vectorstores/integrations/docarray_in_memory) and [DocArrayHnswSearch](https://python.langchain.com/docs/modules/data_connection/vectorstores/integrations/docarray_hnsw).
Both are user-friendly and are best suited to small to medium-sized datasets.

</details>

## Installation

To install DocArray from the CLI, run the following command:
Expand Down
4 changes: 3 additions & 1 deletion tests/documentation/test_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,5 +70,7 @@ def test_files_good(fpath):

def test_readme():
check_md_file(
fpath='README.md', memory=True, keyword_ignore=['tensorflow', 'fastapi', 'push']
fpath='README.md',
memory=True,
keyword_ignore=['tensorflow', 'fastapi', 'push', 'langchain', 'MovieDoc'],
)

0 comments on commit 4e6bf49

Please sign in to comment.