Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Extract V0 API #60

Merged
merged 19 commits into from May 18, 2022
Merged

Add Extract V0 API #60

merged 19 commits into from May 18, 2022

Conversation

rodrigue-h
Copy link
Contributor

@rodrigue-h rodrigue-h commented Mar 19, 2022

This PR adds Extract to the Python SDK.

Usage:

import cohere
from cohere.extract import Entity, Example

co = cohere.Client('API_KEY')

result = co.extract(
    model="large",
    examples=[
        Example(
            text="hello my name is John, and I like to play ping pong",
            entities=[
                Entity(type="Name", value="John"),
                Entity(type="Game", value="ping pong")
            ]
        ),
        Example(
            text="Jon Ke from work is amazing at chess!",
            entities=[
                Entity(type="Name", value="Jon Ke"),
                Entity(type="Game", value="chess")
            ]
        ),
        Example(
            text="David won't stop talking about playing tennis!",
            entities=[
                Entity(type="Name", value="David"),
                Entity(type="Game", value="tennis")
            ]
        ),
    ],
    texts=[
        "David will be playing soccer?"
    ]
)

print(result)
'''
cohere.Extractions {
        extractions: [
          {'id': 'caf0d948-57e8-4d48-849c-007a0c13b4cb',
          'text': 'David will be playing soccer?',
          'entities': [{'type': 'Name', 'value': 'David'}, {'type': 'Game', 'value': 'soccer'}]}]
}
'''

cohere/extract.py Outdated Show resolved Hide resolved
cohere/extract.py Outdated Show resolved Hide resolved
cohere/client.py Outdated Show resolved Hide resolved
tests/tests.py Outdated Show resolved Hide resolved
@elaineg
Copy link

elaineg commented Apr 4, 2022

Can we get this reviewed? Hoping to get documentation written for this as soon as merged.

@OrenLeung
Copy link
Contributor

OrenLeung commented Apr 4, 2022

Can we get this reviewed? Hoping to get documentation written for this as soon as merged.

@elaineg it has been reviewed . Waiting for the suggestions to be addressed

@rodrigue-h

@mkozakov mkozakov self-assigned this May 5, 2022
@mkozakov mkozakov changed the title Extract endpoint Add Extract V0 API May 17, 2022
Comment on lines +87 to +94
self.assertEqual(len(extractions), 2)
self.assertIsInstance(extractions, Extractions)
self.assertIsInstance(extractions[0].text, str)
self.assertIsInstance(extractions[1].text, str)
self.assertIsInstance(extractions[0].entities, list)
self.assertIsInstance(extractions[1].entities, list)
self.assertEqual(len(extractions[0].entities), 2)
self.assertEqual(len(extractions[1].entities), 2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: should we (can we) make assertions on the extracted string values? e.g.

self.assertEqual(extractions[0].text, "Charlie")

Do these tests run using the live Cohere API, or some mocked data?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are run against the Prod API, so they are more like end-to-end tests. I think in this case we can definitely do what you suggested without too much flakiness! I'll make the change

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM.
Not sure if we would like to add docblocks on the methods.
For me, the methods are self-explanatory, but not sure for others.

@@ -23,7 +23,7 @@ def has_ext_modules(foo) -> bool:

setuptools.setup(
name='cohere',
version='1.3.8',
version='1.3.9',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we follow semantic versioning here, i.e. should this change (adding the extract API) be considered a minor update (new feature) rather than a patch?

I wonder if we should consider a more structured approach to SDK version updates. e.g. https://python-semantic-release.readthedocs.io/en/latest/ compute the version automatically based on commit messages, and generate the changelog appropriately too. Something for the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed semantic versioning too! @adrian-cohere is looking at a bunch of improvements we can make to our SDKs, we're going to start a list soon.

For this case, this is a V0 API release, mostly for us to play with internally, so I think a patch is maybe applicable here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh nice.

Interesting. Since it's more of an internal V0, do you think we should do something like an experimental_ or unstable_ prefix? Like co.unstable_extract().

The risk with settling on a public co.extract() method today is that it could require a breaking version update if V1 has a sufficiently different signature.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea. In practice, the risk here is low, unless some external party starts using an undocumented function, but there is no reason to add a non-prefixed version right now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good point by @elliottsj.
I'm curious to know if we could also adopt the idea of alpha & beta versions as we might want to start testing internally, then with a few power/strategic users and then fully to the public.
Let's discuss this as part of our Api design + documentation framework.

@mkozakov mkozakov merged commit 01d1c24 into main May 18, 2022
@mkozakov mkozakov deleted the feature/extract-endpoint branch May 18, 2022 22:31
vshmyhlo pushed a commit that referenced this pull request Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants