Skip to content

Commit

Permalink
Merge pull request #4 from chatnoir-eu/new-api
Browse files Browse the repository at this point in the history
Support staging API
  • Loading branch information
heinrichreimer committed Jan 11, 2023
2 parents b5c4c18 + 9292163 commit faf15d2
Show file tree
Hide file tree
Showing 29 changed files with 1,955 additions and 553 deletions.
5 changes: 3 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ jobs:
matrix:
os:
- ubuntu-latest
- macos-latest
- windows-latest
# - macos-latest
# - windows-latest
python:
- 3.7
- 3.9
Expand Down Expand Up @@ -118,6 +118,7 @@ jobs:
- name: "🧪 Test Python code"
env:
CHATNOIR_API_KEY: ${{ secrets.CHATNOIR_API_KEY }}
CHATNOIR_API_KEY_STAGING: ${{ secrets.CHATNOIR_API_KEY_STAGING }}
run: pytest chatnoir_api examples tests --cov --cov-report=term --cov-report=xml
- name: "📤 Upload test coverage"
uses: actions/upload-artifact@v2
Expand Down
12 changes: 12 additions & 0 deletions .idea/php.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

81 changes: 71 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,13 @@ pip install chatnoir-api
```

## Usage
The ChatNoir API offers two main features: [search](#search) with BM25F and [retrieving document contents](#retrieve-document-content).

### Search
To search with the ChatNoir API you need to request an [API key](https://chatnoir.eu/apikey/).
Then you can use our Python client to search for documents.
The `results` object is an iterable wrapper of the search results which handles pagination for you.
List-style indexing is supported to access individual results or sub-lists of results:

```python
from chatnoir_api.v1 import search
Expand All @@ -34,36 +39,92 @@ result_1234 = results[1234]
print(result_1234)
```

### Retrieve Document by ID
#### Search the new ChatNoir
There's a [new](https://chatnoir.web.webis.de/) ChatNoir version with the same API interface. To run your search requests against the new API (e.g., if you want to search the ClueWeb22), set `staging=True` like this:

```python
from chatnoir_api.v1 import search

api_key: str = "<API_KEY>"
results = search(api_key, "python library", staging=True)
```

#### Phrase Search
To search for phrases, use the `search_phrases` method in the same way as normal `search`:

```python
from chatnoir_api.v1 import search_phrases

api_key: str = "<API_KEY>"
results = search_phrases(api_key, "python library", staging=True)
```

### Retrieve Document Content
Often the title and ID of a document is not enough to effectively re-rank a list of search results.
To retrieve the full content or plain text for a given document you can use the `html_contents` helper function.
The `html_contents` function expects a ChatNoir-internal UUID, shorthand UUID, or a TREC ID
and the index from which to retrieve the document.

#### Retrieve by TREC ID
You can retrieve a document by its TREC ID like this:

```python
from chatnoir_api import cache_contents, Index

contents = cache_contents(
"clueweb09-en0051-90-00849",
Index.ClueWeb09,
)
print(contents)

plain_contents = cache_contents(
"clueweb09-en0051-90-00849",
Index.ClueWeb09,
plain=True,
)
print(plain_contents)
```

#### Retrieve by ChatNoir-internal UUID
You can also retrieve a document by its ChatNoir-internal UUID like this:

```python
from uuid import UUID

from chatnoir_api import html_contents, Index
from chatnoir_api import cache_contents, Index

contents = html_contents(
contents = cache_contents(
UUID("e635baa8-7341-596a-b3cf-b33c05954361"),
Index.CommonCrawl1511,
)
print(contents)

plain_contents = html_contents(
plain_contents = cache_contents(
UUID("e635baa8-7341-596a-b3cf-b33c05954361"),
Index.CommonCrawl1511,
plain=True,
)
print(plain_contents)
```

contents = html_contents(
"clueweb09-en0051-90-00849",
Index.ClueWeb09,
#### Retrieve by ChatNoir-internal short UUID
For newer ChatNoir versions, you can also retrieve a document by its ChatNoir-internal _short_ UUID like this:

```python
from chatnoir_api import cache_contents, Index, ShortUUID

contents = cache_contents(
ShortUUID("6svePe3PXteDeGPk1XqTLA"),
Index.ClueWeb22,
base_url="https://chatnoir.web.webis.de/"
)
print(contents)

plain_contents = html_contents(
"clueweb09-en0051-90-00849",
Index.ClueWeb09,
plain_contents = cache_contents(
ShortUUID("6svePe3PXteDeGPk1XqTLA"),
Index.ClueWeb22,
plain=True,
base_url="https://chatnoir.web.webis.de/"
)
print(plain_contents)
```
Expand Down
22 changes: 15 additions & 7 deletions chatnoir_api/__init__.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,25 @@
__version__ = "1.0.0"
__version__ = "2.0.0"

from chatnoir_api import html, model
from chatnoir_api import cache, model
from chatnoir_api.model import highlight, result

# Re-export child modules.
Index = model.Index
Slop = model.Slop
ShortUUID = model.ShortUUID
Highlight = highlight.Highlight
HighlightedText = highlight.HighlightedText
MinimalResult = result.MinimalResult
Explanation = result.Explanation
ExplainedMinimalResult = result.ExplainedMinimalResult
Result = result.Result
ExplainedResult = result.ExplainedResult
MinimalResultStaging = result.MinimalResultStaging
ExplainedMinimalResultStaging = result.ExplainedMinimalResultStaging
ResultStaging = result.ResultStaging
ExplainedResultStaging = result.ExplainedResultStaging
Meta = result.Meta
MetaIndex = result.MetaIndex
ExtendedMeta = result.ExtendedMeta
Results = result.Results
ResultsMeta = result.ResultsMeta
SearchResult = result.SearchResult
PhraseSearchResult = result.PhraseSearchResult
MinimalPhraseSearchResult = result.MinimalPhraseSearchResult
html_contents = html.html_contents
cache_contents = cache.cache_contents
7 changes: 5 additions & 2 deletions chatnoir_api/html.py → chatnoir_api/cache.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from typing import Union
from urllib.parse import urljoin
from uuid import UUID, uuid5, NAMESPACE_URL

from requests import get, Response
Expand All @@ -7,10 +8,11 @@
from chatnoir_api.model import Index


def html_contents(
def cache_contents(
uuid_or_document_id: Union[UUID, str],
index: Index,
plain: bool = False,
base_url: str = BASE_URL,
) -> str:
uuid: UUID
if isinstance(uuid_or_document_id, str):
Expand All @@ -19,12 +21,13 @@ def html_contents(
uuid = uuid_or_document_id

response: Response = get(
f"{BASE_URL}/cache",
urljoin(base_url, "cache"),
params={
"uuid": str(uuid),
"index": index.value,
"raw": True,
"plain": plain,
}
)
response.raise_for_status()
return response.text
5 changes: 2 additions & 3 deletions chatnoir_api/constants.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
BASE_URL = "https://chatnoir.eu"
API_BASE_URL = f"{BASE_URL}/api"
API_V1_URL = f"{API_BASE_URL}/v1"
BASE_URL = "https://chatnoir.eu/"
BASE_URL_STAGING = "https://chatnoir.web.webis.de/"

0 comments on commit faf15d2

Please sign in to comment.