Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: make document indices self-contained #1678

Merged
merged 35 commits into from
Aug 1, 2023
Merged

Conversation

jupyterjazz
Copy link
Contributor

@jupyterjazz jupyterjazz commented Jun 28, 2023

As stated in #1603, our documentation for doc indices is not consistent and lacks some of the functionalities.

Definition of Done:

  • come up with a rough template, i.e. what should each page include, and rewrite doc indices accordingly
  • modify the doc index introduction page
  • add docs for redis
  • add docs for milvus
  • add docs for qdrant
  • validate code examples
  • refine text

High-level Template:

  • Intro
  • Basic Usage
  • Initialize
  • Index
  • Vector Search
  • Filter
  • Text Search
  • Hybrid Search
  • Access Documents
  • Delete Documents
  • Update Documents
  • Configuration
  • Nested Data

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
@jupyterjazz jupyterjazz linked an issue Jun 28, 2023 that may be closed by this pull request
@JoanFM JoanFM marked this pull request as draft July 6, 2023 07:13
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
jupyterjazz and others added 3 commits July 6, 2023 17:07
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
@github-actions github-actions bot added size/xl and removed size/m labels Jul 6, 2023
@github-actions
Copy link

github-actions bot commented Jul 6, 2023

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
@github-actions
Copy link

github-actions bot commented Jul 6, 2023

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@codecov
Copy link

codecov bot commented Jul 17, 2023

Codecov Report

Patch coverage has no change and project coverage change: -54.40% ⚠️

Comparison is base (31c2bb9) 84.46% compared to head (b402802) 30.06%.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1678       +/-   ##
===========================================
- Coverage   84.46%   30.06%   -54.40%     
===========================================
  Files         134      133        -1     
  Lines        8838     8731      -107     
===========================================
- Hits         7465     2625     -4840     
- Misses       1373     6106     +4733     
Flag Coverage Δ
docarray 30.06% <ø> (-54.40%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 110 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
Copy link
Contributor

@alexcg1 alexcg1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure:

  • Consistent use of document/Document
  • H1 headings are title case, H2 and lower are sentence case

docs/user_guide/storing/nested_data.md Outdated Show resolved Hide resolved
docs/user_guide/storing/index_elastic.md Outdated Show resolved Hide resolved
docs/user_guide/storing/index_elastic.md Outdated Show resolved Hide resolved
docs/user_guide/storing/index_elastic.md Outdated Show resolved Hide resolved
docs/user_guide/storing/index_elastic.md Outdated Show resolved Hide resolved
docs/user_guide/storing/index_hnswlib.md Outdated Show resolved Hide resolved
docs/user_guide/storing/index_in_memory.md Outdated Show resolved Hide resolved
docs/user_guide/storing/index_milvus.md Outdated Show resolved Hide resolved
docs/user_guide/storing/index_qdrant.md Outdated Show resolved Hide resolved

### 1.2.1. WCS (managed instance)
**WCS (managed instance)**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be h4 headings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but the table of contets got too verbose

Copy link
Member

@JohannesMessner JohannesMessner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a small issue with the way things are currently set up.
The pages on HNSWLib and InMemory seem to be fully self-contained, giving general information like the following:
image

But this (and some other backend agnostic information) seems to be missing from the pages describing other backends (weavaite, qdrant, etc).

If we want thins to be truly self-contained, then we should just copy that sort of information to all the backends imo.

But I like the direction this is going into!!

To combine these operations into a single, hybrid search query, you can use the query builder that is accessible
through [build_query()][docarray.index.abstract.BaseDocIndex.build_query]:

### Initialize the Document Index and add data
```python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add 1-2 sentences of explanation to these code snippets? i think they are very self-explanatory, but personally as a user i don't like code snippets without words around them :)

)
retrieved_docs, scores = doc_index.execute_query(query)
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should here again add a big fat link to all the backend documentation pages and tell people that they can get more detailed information there

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

github-actions bot commented Aug 1, 2023

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
@github-actions
Copy link

github-actions bot commented Aug 1, 2023

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Copy link
Contributor

@alexcg1 alexcg1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key points:

  • Change all "top 10 most similar" to "ten most similar" throughout
  • Wrap all methods in backticks. Some still lack them
  • H2 headings and lower should be sentence case. Some are still title case

There are a few other items throughout the review too

docs/user_guide/storing/docindex.md Outdated Show resolved Hide resolved
docs/user_guide/storing/docindex.md Outdated Show resolved Hide resolved
docs/user_guide/storing/docindex.md Outdated Show resolved Hide resolved
docs/user_guide/storing/docindex.md Outdated Show resolved Hide resolved
docs/user_guide/storing/docindex.md Outdated Show resolved Hide resolved
docs/user_guide/storing/index_elastic.md Outdated Show resolved Hide resolved
docs/user_guide/storing/index_hnswlib.md Outdated Show resolved Hide resolved
docs/user_guide/storing/index_milvus.md Outdated Show resolved Hide resolved
docs/user_guide/storing/index_qdrant.md Outdated Show resolved Hide resolved
docs/user_guide/storing/index_weaviate.md Outdated Show resolved Hide resolved
@JohannesMessner
Copy link
Member

Key points:

* Change all "top 10 most similar" to "ten most similar" throughout

* Wrap all methods in backticks. Some still lack them

* H2 headings and lower should be sentence case. Some are still title case

There are a few other items throughout the review too

Even better than backticks would be to link the the corresponding docstring / API docs :)

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
@github-actions
Copy link

github-actions bot commented Aug 1, 2023

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@jupyterjazz
Copy link
Contributor Author

Even better than backticks would be to link the the corresponding docstring / API docs :)

already linked, they just lacked the backticks

@alexcg1
Copy link
Contributor

alexcg1 commented Aug 1, 2023

Even better than backticks would be to link the the corresponding docstring / API docs :)
Many method names do link to that stuff, just not backticked

Copy link
Contributor

@alexcg1 alexcg1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final nitpicks

docs/user_guide/storing/docindex.md Outdated Show resolved Hide resolved
docs/user_guide/storing/index_hnswlib.md Outdated Show resolved Hide resolved
docs/user_guide/storing/index_in_memory.md Outdated Show resolved Hide resolved
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
@github-actions
Copy link

github-actions bot commented Aug 1, 2023

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

github-actions bot commented Aug 1, 2023

📝 Docs are deployed on https://ft-docs-self-contained-indices--jina-docs.netlify.app 🎉

Copy link
Member

@JohannesMessner JohannesMessner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

big win for people that are interested in a specific backend!

@JoanFM JoanFM merged commit 7c10295 into main Aug 1, 2023
25 of 26 checks passed
@JoanFM JoanFM deleted the docs-self-contained-indices branch August 1, 2023 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docs: make each doc index documentation self-contained
4 participants