REST API Design #28

Imran31 · 2016-03-19T22:34:38Z

This issue is to brainstorm the design of the API endpoints and responses. I'll start with a couple of points on shorter URLs, HATEOAS and the folder structure.

Shorter URLs

I propose maintaining numeric IDs for each author, corpus, text, etc. and using those to construct the REST endpoints.

So, for example, endpoint GET /lang/latin/corpus/perseus/author/tacitus/text/germania becomes GET /lang/latin/corpus/1/author/6/text/8.

This keeps the URLs short while allowing the actual names that the IDs map to to be as long as needed.

A problem with this (assuming an external API consumer) is figuring out the ID of a specific author/corpus/text.

API Discoverability

The formal term for this is HATEOAS. This implies a user should be able to browse and discover all the endpoints of the REST API using the REST API itself.

Towards this, we should define endpoints like GET /lang/latin/corpus/ that returns a response:

{"corpora": [ {"name": "perseus", "id": "1"}, ... ]}

This way, the user will be able to query for all the available corpora and figure out the ID.

Another example of this is from my POS tagger implementation. It is possible to view the list of languages and POS tagging methods they support via GET /core/pos, and perform the actual POS tagging for a string via POST /core/pos.

In general, adding a GET request handler to endpoints like /lang, /lang/<int:lang_id>/corpus, etc. should make the API discoverable.

Folder Structure

Right now all the resources are defined in a single file (api_json.py), and so are tests (tests.py). There is also no distinction between files containing utility functions and actual REST resources.

I briefly mentioned this in my #20 (comment).

An example of my proposed organisation is in #27. Inside the folder for a specific function (/pos), the resources will be in views.py, the database stuff (if any) in models.py, utility functions in utils.py and parameters in constants.py.

(It may be better to keep constants.py at the root of the API folder structure, to easily find and change)

The text was updated successfully, but these errors were encountered:

kylepjohnson · 2016-03-20T02:45:51Z

@Imran31 Thanks for sharing your thoughts. Here are a few initial responses:

So, for example, endpoint GET /lang/latin/corpus/perseus/author/tacitus/text/germania becomes GET /lang/latin/corpus/1/author/6/text/8.

The API is intentionally explicit. I prefer this because it the URL is instantly recognizable. "Author 6, text 8" means nothing, but "Tacitus, Germania" is universally recognizable.

There is something to be said for keeping URLs short, but we are very, very far from what I would consider long.

a user should be able to browse and discover all the endpoints of the REST API using the REST API itself.

We have this already, though I think it could be made more intuitive. For example:

I'm open to hearing other ways of doing this.

About your POS addition to API, I'll need to look into this further. I will probably want to see an API which accounts for all "core" processing, not just individual parts.

Imran31 · 2016-03-20T13:56:55Z

There is something to be said for keeping URLs short, but we are very, very far from what I would consider long.

Yeah IDs will not be more helpful then, I thought that the growing URL length is a problem. I too think the existing URLs are much easier to recognise.

About your POS addition to API, I'll need to look into this further. I will probably want to see an API which accounts for all "core" processing, not just individual parts.

Sure! Does it make sense to list out all the /core/* endpoints and how they will respond to different HTTP methods?

I'll start with a list of endpoints and their associated classes:

/core/jvreplacer: JVReplacer
/core/stem: Stemmer
/core/lemmatize: LemmaReplacer
/core/syllabify: Syllabifier
/core/ner: ner
core/tokenize: PunktLanguageVars, TokenizeSentence, word_tokenize
core/distance: TextReuse, Levenshtein

lukehollis · 2016-03-20T16:53:42Z

I think this (#28 (comment)) looks good for the first iteration of the project and can revise them in the future as it makes sense for the more complex tasks.

kylepjohnson · 2016-03-21T05:51:16Z

@lukehollis If you're comfortable with this API, then let's go for it. Just so long as everyone knows that the specifics will be subject to a revision sometime.

Thanks to all on this.

Imran31 · 2016-03-24T21:25:08Z

Thanks @lukehollis, I have extended this discussion into my proposal (I just shared it with the organisation via the GSoC website). I look forward to your comments there.

kylepjohnson added the question label Mar 20, 2016

modassir mentioned this issue Mar 20, 2016

Add route for accessing CLTK stemmer #20

Closed

lukehollis assigned Imran31 Mar 20, 2016

This was referenced Mar 20, 2016

Add route for accessing CLTK tokenizer #23

Open

Expose POS tagging from CLTK core #27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REST API Design #28

REST API Design #28

Imran31 commented Mar 19, 2016

kylepjohnson commented Mar 20, 2016

Imran31 commented Mar 20, 2016

lukehollis commented Mar 20, 2016

kylepjohnson commented Mar 21, 2016

Imran31 commented Mar 24, 2016

REST API Design #28

REST API Design #28

Comments

Imran31 commented Mar 19, 2016

Shorter URLs

API Discoverability

Folder Structure

kylepjohnson commented Mar 20, 2016

Imran31 commented Mar 20, 2016

lukehollis commented Mar 20, 2016

kylepjohnson commented Mar 21, 2016

Imran31 commented Mar 24, 2016