This is an attempt at using Weboob as a
Cozy Konnector.
It wraps around Weboob, receiving a JSON description of the modules to fetch
on stdin
and returning a JSON of the fetched results on stdout
.
Although the primary goal is to wrap around Weboob to use it in Cozy, this script might be of interest for anyone willing to wrap around Weboob and communicate with JSON pipes.
First, you need to have Weboob installed on your system.
Typical command-line usage for this script is:
cat konnectors.json | python -m cozyweboob.__main__
where konnectors.json
is a valid JSON file defining konnectors to be used.
Typical command-line usage for this script is:
./server.py
This script spawns a Bottle webserver, listening on localhost:8080
(by
default).
It exposes a couple of routes:
-
the
/fetch
route, which supportsPOST
method to send a valid JSON string defining konnectors to be used as the request body. Typical example to send it some content is:curl -X POST --data "$(cat konnectors.json)" "http://localhost:8080/"
where
konnectors.json
is a valid JSON file defining konnectors to be used. Downloaded files will be stored in a temporary directory, and their file URI will be passed back in the output JSON. If you do not have a direct access to the filesystem, you can use the/retrieve
endpoint below to retrieve such downloaded files through the network. -
the
/list
route, which will provide you a JSON dump of all the available modules, their descriptions and the configuration options you should provide them. -
the
/retrieve
route, which supportsPOST
method and a singlepath
POST
parameter which is the path to the previously downloaded file to retrieve. Note that this route will not delete the temporary file whose content has been retrieved, and you should delete it manually. -
the
/clean
route (POST
method), which will delete all temporary downloaded files. This route will return a JSON list of deleted folders.
IMPORTANT: Note this small webserver is not production ready and only
here as a proof of concept and to be used in a controlled development
environment. The /retrieve
route will basically provide anyone to access any
file from your temp directory, which is a real security concern in production.
Note: You can specify the host and port to listen on using the
COZYWEBOOB_HOST
and COZYWEBOOB_PORT
environment variables.
There is another command-line script available if you would rather communicate
with it in a conversation manner, using stdin
and stdout
(typically to
integrate it with Node modules using
Python-shell). To run it, use:
./stdin_conversation.py
Then, you can write on stdin
and fetch the responses from stdout
.
Available commands are:
GET /list
to list all available modules.POST /fetch JSON_PARAMS
whereJSON_PARAMS
is an input JSON for module parameters. Downloaded files will be stored in a temporary directory, and their file URI will be passed back in the output JSON.POST /clean
to clean temporary downloaded files.exit
to quit the script and end the conversation.
JSON responses are the same one as from the HTTP server script. It is basically the same script without HTTP encapsulation.
Note: To simplify the script, note that it only supports single line
commands. Then, your JSON_PARAMS
should be the same single stdin
line as
the GET /fetch
part.
Using COZYWEBOOB_ENV=debug
, you can enable debug features for all of these
scripts, which might be useful for development. These features are:
- Logging
- If you pass a blank field in a JSON konnector description
(typically
password: ""
), the script will ask you its value at runtime, usinggetpass
.
The JSON file read on stdin
should have a specific structure. A typical
example is given in konnectors.json.sample
.
Basically, it consists of a list of maps. Each map corresponds to a given Weboob module to run, with a given set of parameters (then allowing the script to run multiple times the same module with different configurations). Each map should have at the following three keys:
name
is the name of the Weboob module to run (same name as used in Weboob).parameters
is a map of parameters to use for this particular module, as required by the associated Weboob backend.id
should be a unique string of your choice, to uniquely identify this run of the specified module with the specified set of parameters.actions
is an optional list of actions to perform. It should contains two keys,fetch
anddownload
. For each key, you can either passtrue
to completely handle the actions, or a map of capabilities associated to list of contents to fetch. Typically, you can pass"fetch": { "CapDocument": ["bills"]}
to fetch only bills from theCapDocuments
capability. You can also pass"download": { "CapDocument": ["someID"] }
to download a specific document, identified by its ID. If not provided, the default is to fetch only, and do not download anything.
The resulting JSON file, on stdout
is a map associating the id
fields as
provided in input JSON file to a map of fetched data by this module.
Each module map has a cookies
entry containing the cookies used to fetch the
data, so that any program running afterwards can download documents.
Important note: Most of such websites have very short lived sessions,
meaning in most cases these cookies
will be useless for extra download as
the session will most likely be destroyed on the server side.
The other entries in these maps depend on the module capabilities as defined
by Weboob. Detailed informations about these other entires can be found in the
doc/capabilities
folder.
All contributions are welcome. Feel free to make a PR :)
Python code is currently Python 2, but should be Python 3 compatible as Weboob is moving towards Python 3. All Python code should be PEP8 compliant. I use some extra rules, taken from PyLint.
The content of this repository is licensed under an MIT license, unless explicitly mentionned otherwise.