Skip to content

Synopsis is a REST API to a document summarisation service

License

Notifications You must be signed in to change notification settings

iaindillingham/synopsis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Synopsis

synopsis | noun (synopses)
A brief survey or general summary of something. Lexico

Synopsis is a REST API to a document summarisation service. It's not intended for production use; it's intended to show some nice people that I can make a REST API in under four hours.

Installation

The easiest way to install Synopsis is by first installing Poetry. Next, clone this repository and run poetry install. That's it!

Usage

First, run Synopsis with Flask's builtin server.

poetry run python synopsis/api.py

The documents endpoint provides the usual CRUD functions.

# Create a document
curl http://localhost:5000/documents -d 'text=Hello, World!'

# Get the list of documents
curl http://localhost:5000/documents

# Get the document with the given document_id
curl http://localhost:5000/documents/<document_id>

# Update the document with the given document_id
curl http://localhost:5000/documents/<document_id> -d 'text=Hello, Sailor!' -X PUT

# Delete the document with the given document_id
curl http://localhost:5000/documents/<document_id> -X DELETE

There are several plain-text Wikipedia pages in the documents directory. These are for testing the summaries endpoint.

# Create a document
curl http://localhost:5000/documents -d "text=$(cat documents/Squirrel.txt)"

# Get a summary of the document with the given document_id
curl http://localhost:5000/summaries/<document_id>

QA

poetry run flake8

poetry run pytest

Next steps

The document store. If you restart Flask's builtin server, then you clear the document store. Ouch! We should consider an alternative document store, such as a relational database. Thankfully, this is easy with SQLAlchemy and Flask-SQLAlchemy.

English-language only. Synopsis uses Gensim for document summarisation, but Gensim's summarize function is English-language only. We could modify Synopsis' summarize function to use an alternative implementation. Indeed, we could even write our own: several are described in Text Summarization Techniques: A Brief Survey by Allahyari et al. However, we should probably ask the client for the language - or have Synopsis determine the language - and warn the client when the language isn't English.

How much text is enough text? Gensim should summarize a text of 20,000 characters in about two seconds. (See the performance section of the documentation.) Is this much text enough text? How should we trade this against Synopsis' response time?

Static type checking and docstrings. My future-self is always grateful when my past-self remembered to use static type checking and write docstrings, especially for helper functions such as get_next_document_id.

More tests. We should test the methods that call abort_if_document_does_not_exist and abort_if_text_is_missing under failure, as well as success, conditions.

Request parsing. Synopsis uses Flask-RESTful to encourage best practices with minimal setup, but Flask-RESTful's request parser is "slated for removal". We should consider using an alternative request parser, such as marshmallow.

Curl. Why does curl http://localhost:5000/documents -d "text=@documents/Squirrel.txt" assign the string @documents/Squirrel.txt to text? It should assign the contents of the file to text!

Credits

Much of api.py is based on the full example from Flask-RESTful's documentation.

The client fixture in test_api.py is based on the testing skeleton in Flask's documentation.

About

Synopsis is a REST API to a document summarisation service

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages