Swedish Information Extraction Tool

A command line information extraction tool which works with Swedish text. The tool performs various natural language processing tasks, such as parsing, named entity recognition and information extraction. Swedish text can either be input using custom files in the input_data directory, or sample data generated from the SUC 3.0 corpus in training_data.

This tool was written alongside my undergraduate level dissertation which explored Natural Language Processing with Swedish text.

Installation

In order to use the tool, various packages need to be installed. It is recommended that the package installer, pip, is used for this process. Installation can be achieved using the following command:

$ pip3 install -r requirements.txt

Once the packages are installed, the tool can be ran using the command:

$ python3 sv_information_extraction mode source

mode can be either: parse for the parser module, ner for the ner module, or ie for the information extraction module.

source can be either a filename, such as sample_swedish.txt, which is provided out of the box, or --sample which uses excerpts from SUC 3.0. In order to use the SUC 3.0 corpus, it must first be downloaded from here and placed in the training_data folder with the filename "suc3.xml".

Requirements

Python 3 is the recommended version to be used with this tool, following the official deprecation of Python 2.

The tool requires a model to operate. Models can be created through custom spaCy pipelines or downloaded from the internet. The model which was used during development was the sv_model_xpos model available here. There are both UPOS and XPOS-tagged models available, with the XPOS model using Swedish-specific tags, while UPOS uses universal tags. These models have very small differences in performance between them and are both sufficient.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

It is worth noting that this is a legacy-style archive, and likely won't be updated.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
sv_information_extraction		sv_information_extraction
LICENSE		LICENSE
README.md		README.md
dns18sxx_IE.pdf		dns18sxx_IE.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Swedish Information Extraction Tool

Installation

Requirements

Contributing

About

Releases

Packages

Languages

License

danstoakes/2021-swedish-information-extraction-cli

Folders and files

Latest commit

History

Repository files navigation

Swedish Information Extraction Tool

Installation

Requirements

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages