Skip to content

UCREL/pymusas

Repository files navigation

PyMUSAS

Python Multilingual Ucrel Semantic Analysis System, is a rule based token and Multi Word Expression semantic tagger. The tagger can support any semantic tagset, however the tagset we have concentrated on and released pre-configured spaCy components for is the Ucrel Semantic Analysis System (USAS).


CI License

PyPI Version Supported Python Versions

Number of PyMUSAS PyPI downloads for the last month Launch Binder

Documentation

  • 📚 Usage Guides - What the package is, tutorials, how to guides, and explanations.
  • 🔎 API Reference - The docstrings of the library, with minimum working examples.
  • 🚀 Roadmap

Language support

PyMUSAS currently support 10 different languages with pre-configured spaCy components that can be downloaded, each language has it's own guide on how to tag text using PyMUSAS. Below we show the languages supported, if the model for that language supports Multi Word Expression (MWE) identification and tagging (all languages support token level tagging by default), and size of the model:

Language (BCP 47 language code) MWE Support Size
Mandarin Chinese (cmn) ✔️ 1.28MB
Welsh (cy) ✔️ 1.09MB
Spanish, Castilian (es) ✔️ 0.20MB
Finnish (fi) 0.63MB
French (fr) 0.08MB
Indonesian (id) 0.24MB
Italian (it) ✔️ 0.50MB
Dutch, Flemish (nl) 0.15MB
Portuguese (pt) ✔️ 0.27MB
English (en) ✔️ 0.88MB

Install PyMUSAS

Can be installed on all operating systems and supports Python version >= 3.10 < 3.14, to install run:

pip install pymusas

If using uv:

uv add pymusas

Development

Setup

You can either use the dev container with your favourite editor, e.g. VSCode. Or you can create your setup locally below we demonstrate both.

In both cases they share the same tools, of which these tools are:

  • uv for Python packaging and development
  • node for building and serving the documentation.
  • make (OPTIONAL) for automation of tasks, not strictly required but makes life easier.

Dev Container

A dev container uses a docker container to create the required development environment, the Dockerfile we use for this dev container can be found at ./.devcontainer/Dockerfile. To run it locally it requires docker to be installed, you can also run it in a cloud based code editor, for a list of supported editors/cloud editors see the following webpage.

To run for the first time on a local VSCode editor (a slightly more detailed and better guide on the VSCode website):

  1. Ensure docker is running.
  2. Ensure the VSCode Dev Containers extension is installed in your VSCode editor.
  3. Open the command pallete CMD + SHIFT + P and then select Dev Containers: Rebuild and Reopen in Container

You should now have everything you need to develop, uv, node, npx, yarn, make, for VSCode various extensions like Pylance, etc.

If you have any trouble see the VSCode website..

Local

To run locally first ensure you have the following tools installted locally:

  • uv for Python packaging and development. (version 0.9.6)
  • node using nvm with yarn. (version 24). After installing this run the following cd docs && corepack use yarn it install the correct version of yarn.
  • make (OPTIONAL) for automation of tasks, not strictly required but makes life easier.
    • Ubuntu: apt-get install make
    • Mac: Xcode command line tools includes make else you can use brew.
    • Windows: Various solutions proposed in this blog post on how to install on Windows, inclduing Cygwin, and Windows Subsystem for Linux.

When developing on the project you will want to install the Python package locally in editable format with all the extra requirements, this can be done like so:

uv sync

Running linters and tests

This code base uses isort, flake8 and mypy to ensure that the format of the code is consistent and contain type hints. ISort and mypy settings can be found within ./pyproject.toml and the flake8 settings can be found in ./.flake8. To run these linters:

make lint

To run the tests with code coverage (NOTE these are the code coverage tests that the Continuos Integration (CI) reports at the top of this README, the doc tests are not part of this report):

make tests

To run the doc tests, these are tests to ensure that examples within the documentation run as expected:

make doc-tests

Setting a different default python version

The default or recommended Python version is shown in [.python-version](./.python-version, currently 3.12, this can be changed using the uv command:

uv python pin
# uv python pin 3.13

Team

PyMUSAS is an open-source project that has been created and funded by the University Centre for Computer Corpus Research on Language (UCREL) at Lancaster University. For more information on who has contributed to this code base see the contributions page.

Packages

No packages published

Contributors 6

Languages