Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
ar
 
 
de
 
 
el
 
 
en
 
 
es
 
 
fi
 
 
fr
 
 
grc
 
 
it
 
 
ko
 
 
la
 
 
mg
 
 
nn
 
 
no
 
 
oge
 
 
pl
 
 
 
 
 
 
ru
 
 
 
 
 
 
th
 
 
zh
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Unitex/GramLab Language Resources

Unitex/GramLab is the open source, cross-platform, multilingual, lexicon- and grammar-based corpus processing suite

This repository contains the Language Resources which are distributed within Unitex/GramLab.

Languages

Language name Native name Language Family IETF ISO 639-2 ISO 639-1
Arabic العربية Afro-Asiatic ar ara ar
Chinese 汉语/漢語 Sino-Tibetan zh chi/zho zh
English English Indo-European en eng en
Finnish Suomi Uralic fi fin fi
French Français Indo-European fr fra fr
Georgian (Ancient) ქართული South Caucasian oge
German Deutsch Indo-European de deu de
Greek (ancient) Αρχαία Ελληνικα Indo-European grc grc
Greek (modern) Ελληνικά Indo-European el ell el
Italian Italiano Indo-European it ita it
Korean 한국어 Koreanic ko kor ko
Latin Latine Indo-European la lat la
Malagasy Malagasy Austronesian mg mlg mg
Norwegian Bokmål Norsk bokmål Indo-European no nob nb
Norwegian Nynorsk Norsk nynorsk Indo-European nn nno nn
Polish Polski Indo-European pl pol pl
Portuguese (Portugal) Português (Portugal) Indo-European pt-BR
Portuguese (Brazil) Português (Brasil) Indo-European pt-PT
Russian Русский Indo-European ru rus ru
Serbian-Cyrillic Српски Indo-European sr-Cyrl sro sr
Serbian-Latin Serbian (Latin) Indo-European sr-Latn srm
Spanish Español Indo-European es spa es
Thai ไทย Tai–Kadai th tha th

Contributing

We welcome everyone to contribute to improve the Unitex/GramLab Language Resources by forking this repository and sending a pull request with their changes.

How to add a new language support in Unitex

To add a new language to Unitex:

  • Copy the folder template zxx-t-Skel and rename it according to the ISO 639-1 code of the new language
  • Use the IETF language tag if the ISO 639-1 code is not available for your language.

Your new language must provide at least:

  • An alphabet file (Alphabet.txt) and optionally a sorted alphabet (Alphabet_sort.txt)
  • A sample corpus (Corpus/Corpus.txt). Make sure you have the rights to share this resource and provide the author information on Corpus/Corpus.info
  • A sample dictionary (Dela/lang-CODE.dic) containing at least the words of the sample text
  • A sentence delimitation graph (Graphs/Preprocessing/Sentence/Sentence.grf)

Before share your contribution, make sure that:

  • File names only use 7-bits ASCII characters.
  • For each compiled graph fst2 you are also proving the .grf version.
  • For each dictionary .dic you are also providing a .info file describing the dictionary content (codes used in it, number of entries, authors, etc).
  • You accept the LGPLLR license.

RELEX network

Language Resources are mainly built and maintained by the members of the RELEX network, an international network of laboratories specialized in Computational Linguistics that was created by Maurice Gross and his LADL (Laboratoire d'Automatique Documentaire et Linguistique) team.

Country Partner
Belgium Catholic University of Leuven
Belgium CENTAL
Brazil Federal University of Goias
Brazil NILC
Brazil Projeto Relex
Brazil PUC RIO
Canada University of Montréal
Denmark University of Copenhagen
England Research and Development Unit for English Studies
France CRISCO
France EHESS
France LDI
France LIGM
France LIMSI
France LIP6
France LORIA
France UFRL
France Université de Tours
France University Bordeaux 3
France University Grenoble 3
France University of Franche-Comté
France University of Paris-Est Marne-la-Vallée
France University of Rouen
France University of Strasbourg
France University Paris 8
France University Paris-Sorbonne
Germany CIS, University of Munich
Germany University of Heidelberg
Greece ILSP
Greece University of Thessaloniki
Hong Kong City University of Hong Kong
Hungaria Research Institute for Linguistics
Israel University of Tel Aviv
Italy University of Bari
Italy University of Salerno
Japan Information Science Research Center
Korea Hankuk University of Foreign Studies
Madagascar University of Antananarivo
Norway University of Bergen
Poland Adam Mickiewicz University
Portugal LabEL
Portugal University of Algarve
Serbia University of Belgrad
Slovakia The Faculty of Economics
Spain Autonomous University of Barcelona
Spain University of Alicante
Switzerland University of Genève
Switzerland University of Zürich
United States Florida International University
United States New York University
United States University of California San Diego
United States University of North Texas

Documentation

User's Manual (in PDF format) is available in English and French (more translations are welcome). You can view and print them with Evince, downloadable here. The latest version of the User's Manual is accessible here.

Support

Support questions can be posted in the community support forum. Please feel free to submit any suggestions or requests for new features too. Some general advice about asking technical support questions can be found here.

Reporting Bugs

See the Bug Reporting Guide for information on how to report bugs.

Governance Model

Unitex/GramLab project decision-making is based on a community meritocratic process, anyone with an interest in it can join the community, contribute to the project design and participate in decisions. The Unitex/GramLab Governance Model describes how this participation takes place and how to set about earning merit within the project community.

Spelling

Unitex/GramLab is spelled with capitals "U" "G" and "L", and with everything else in lower case. Excepting the forward slash, do not put a space or any character between words. Only when the forward slash is not allowed, you can simply write “UnitexGramLab”.

It's common to refer to the Unitex/GramLab Core as "Unitex", and to the Unitex Project-oriented IDE as "GramLab". If you are mentioning the distribution suite (Core, IDE, Linguistic Resources and others bundled tools) always use "Unitex/GramLab".

License

Language Resources are distributed under the terms of the Lesser General Public License For Linguistic Resources (LGPLLR). Contact unitex-devel@univ-mlv.fr for further inquiries.


Copyright (C) 2019 Université Paris-Est Marne-la-Vallée

Releases

No releases published

Languages

You can’t perform that action at this time.