docs: coming from langchain (#1660)

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
docarray · Jun 19, 2023 · 4e6bf49 · 4e6bf49
1 parent deb892f
commit 4e6bf49
Show file tree

Hide file tree

Showing 2 changed files with 87 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -36,8 +36,9 @@ DocArray handles your data while integrating seamlessly with the rest of your **
 > - [Coming from Pydantic](#coming-from-pydantic)
 > - [Coming from FastAPI](#coming-from-fastapi)
 > - [Coming from a vector database](#coming-from-vector-database)
+> - [Coming from Langchain](#coming-from-langchain)
 
-DocArray was released under the open-source [Apache License 2.0](https://github.com/docarray/docarray/blob/main/LICENSE) in January 2022. It is currently a sandbox project under [LF AI & Data Foundation](https://lfaidata.foundation/).
+DocArray has been distributed under the open-source [Apache License 2.0](https://github.com/docarray/docarray/blob/main/LICENSE) since January 2022. It is currently a sandbox project under [LF AI & Data Foundation](https://lfaidata.foundation/).
 
 ## Represent
 
@@ -776,6 +777,88 @@ Of course this is only one of the things that DocArray can do, so we encourage y
 </details>
 
 
+## Coming from Langchain
+
+<details markdown="1">
+  <summary>Click to expand</summary>
+
+With DocArray, you can connect external data to LLMs through Langchain. DocArray gives you the freedom to establish 
+flexible document schemas and choose from different backends for document storage.
+After creating your document index, you can connect it to your Langchain app using [DocArrayRetriever](https://python.langchain.com/docs/modules/data_connection/retrievers/integrations/docarray_retriever).
+
+Install Langchain via:
+```shell
+pip install langchain
+```
+
+1. Define a schema and create documents:
+```python
+from docarray import BaseDoc, DocList
+from docarray.typing import NdArray
+from langchain.embeddings.openai import OpenAIEmbeddings
+
+embeddings = OpenAIEmbeddings()
+
+# Define a document schema
+class MovieDoc(BaseDoc):
+    title: str
+    description: str
+    year: int
+    embedding: NdArray[1536]
+
+
+movies = [
+    {"title": "#1 title", "description": "#1 description", "year": 1999},
+    {"title": "#2 title", "description": "#2 description", "year": 2001},
+]
+
+# Embed `description` and create documents
+docs = DocList[MovieDoc](
+    MovieDoc(embedding=embeddings.embed_query(movie["description"]), **movie)
+    for movie in movies
+)
+```
+
+2. Initialize a document index using any supported backend:
+```python
+from docarray.index import (
+    InMemoryExactNNIndex,
+    HnswDocumentIndex,
+    WeaviateDocumentIndex,
+    QdrantDocumentIndex,
+    ElasticDocIndex,
+)
+
+# Select a suitable backend and initialize it with data
+db = InMemoryExactNNIndex[MovieDoc](docs)
+```
+
+3. Finally, initialize a retriever and integrate it into your chain!
+```python
+
+from langchain.chat_models import ChatOpenAI
+from langchain.chains import ConversationalRetrievalChain
+from langchain.retrievers import DocArrayRetriever
+
+
+# Create a retriever
+retriever = DocArrayRetriever(
+    index=db,
+    embeddings=embeddings,
+    search_field="embedding",
+    content_field="description",
+)
+
+# Use the retriever in your chain
+model = ChatOpenAI()
+qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)
+```
+
+Alternatively, you can use built-in vector stores. Langchain supports two vector stores: [DocArrayInMemorySearch](https://python.langchain.com/docs/modules/data_connection/vectorstores/integrations/docarray_in_memory) and [DocArrayHnswSearch](https://python.langchain.com/docs/modules/data_connection/vectorstores/integrations/docarray_hnsw). 
+Both are user-friendly and are best suited to small to medium-sized datasets.
+
+</details>
+
 ## Installation
 
 To install DocArray from the CLI, run the following command:

diff --git a/tests/documentation/test_docs.py b/tests/documentation/test_docs.py
@@ -70,5 +70,7 @@ def test_files_good(fpath):
 
 def test_readme():
     check_md_file(
-        fpath='README.md', memory=True, keyword_ignore=['tensorflow', 'fastapi', 'push']
+        fpath='README.md',
+        memory=True,
+        keyword_ignore=['tensorflow', 'fastapi', 'push', 'langchain', 'MovieDoc'],
     )