Skip to content

This project aims to develop NLP resources to support NLP-based research in Igbo. It is a low-resourced African language spoken majorly in the eastern part of Nigeria by the Igbo people ( about 45 million speakers and is made up of over 20 dialects).

License

Iykeln/IgboNLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

IgboNLP

This project aims to develop NLP resources to support NLP-based research in Igbo. It is a low-resourced African language spoken majorly in the eastern part of Nigeria by the Igbo people ( about 45 million speakers and is made up of over 20 dialects).

IgboTaggedCorpus

It is a folder in IgboNLP repo that contains 10 different files of Part-of-Speech (POS) tagged Igbo corpus. There are different genres of the corpus, viz; novel, religion, story, news, poem and essay.

The files that start with fiction is a corpus from a novel, ITC is a religious corpus, story, news, poem and essay.

They are POS annotated in three categories: CoarseTagged.corpus: Coarse-grained with common grammatical tags. finegrained.corpus: Fine-grained granularity with tags extended from CoarseTagged.corpus. stripXS.corpus: Coarse by collapsing _XS tags from finegrained.corpus.

For quick reference, kindly consult these papers: https://www.aclweb.org/anthology/W14-4914.pdf ,

http://etheses.whiterose.ac.uk/17043/1/Ikechukwu.E.Onyenwe-PHD-ThesisComplete_version.pdf ,

https://www.researchgate.net/publication/333333916_Toward_an_Effective_Igbo_Part-of-Speech_Tagger ,

https://www.researchgate.net/publication/322407750_A_Basic_Language_Resource_Kit_Implementation_for_the_Igbo_NLP_Project ,

https://dl.acm.org/doi/pdf/10.1145/3146387 ,

https://dl.acm.org/doi/10.1145/3314942 .

Reference

Kindly cite these papers if you use our IgboTagged Corpus:

Ikechukwu E Onyenwe, Mark Hepple, Uchechukwu Chinedu, and Ignatius Ezeani. 2018. A Basic Language Resource Kit Implementation for the IgboNLP Project. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 17, 2, Article 10 (February 2018), 23 pages. DOI:https://doi.org/10.1145/3146387

Ikechukwu E. Onyenwe, Mark Hepple, Uchechukwu Chinedu, and Ignatius Ezeani. 2019. Toward an Effective Igbo Part-of-Speech Tagger. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 18, 4, Article 42 (August 2019), 26 pages. DOI:https://doi.org/10.1145/3314942

Publications

Ikechukwu E. Onyenwe, Mark Hepple, Uchechukwu Chinedu, and Ignatius Ezeani. Toward an Effective Igbo Part-of-Speech Tagger. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 18, 4, Article 42 (May 2019), 26 pages. DOI:https://doi.org/10.1145/3314942.

Onyenwe Ikechukwu, E., Onyedinma Ebele, G., & Aniegwu Godwin, E. Bootstrapping Method For Developing Part-Of-Speech Tagged Corpus In Low Resource Languages Tagset-A Focus On An African Igbo. https://arxiv.org/abs/1903.05225. Published on Journal on Natural Language Computing (IJNLC), 8(1), 13.

Ignatius Ezeani, Mark Hepple, Ikechukwu Onyenwe, Chioma Enemuo. 2018. Multi-task Projected Embedding for Igbo. Text, Speech, and Dialogue (TSD) of Faculty of InformaticsMasaryk University. Springer International Publishing.

Ignatius Ezeani, Ikechukwu Onyenwe and Mark Hepple. Transferred Embeddings for Igbo Similarity, Analogy, and Diacritic Restoration. Association for Computational Linguistics (ACL) Anthology on Semantic Deep Learning, 2018, Pages 30-38. https://www.aclweb.org/anthology/W18-4004/.

Mark Hepple, Uchechukwu Chinedu, and Ignatius Ezeani. 2018. A Basic Language Resource Kit Implementation for the IgboNLP Project. Association for Computing Machinery (ACM) Trans. Asian Low-Resour. Lang. Inf. Process. 17, 2, Article 10 (January 2018), 23 pages. DOI:https://doi.org/10.1145/3146387.

Ezeani, Ignatius, Mark Hepple, Ikechukwu Onyenwe, and Enemouh Chioma. Igbo Diacritic Restoration using Embedding Models. Association for Computational Linguistics (ACL) Anthology of North American Chapter of the Association for Computational Linguistics, Pages 54–60, June 2018. DOI: http://dx.doi.org/10.18653/v1/N18-4008.

David Wright and Ikechukwu E. Onyenwe. Predatory discourses and the incitement of violence against women in an online discussion forum. 2018. The 9th Inter-Varietal Applied Corpus Studies (IVACS) Conference Language, Communities & Mobility, University of Malta June 13 – 15 2018.

Ignatius Ezeani, Mark Hepple and Ikechukwu Onyenwe. 2017. Lexical Disambiguation of Igbo using Diacritic Restoration. Association for Computational Linguistics (ACL) Anthology on Sense, Concept and Entity Representations and their Applications, 2017, Pages 53–60, April 2017. DOI: http://dx.doi.org/10.18653/v1/W17-1907

Chioma Enemouh, Mark Hepple, Ignatius Ezeani, and Ikechukwu Onyenwe. 2017. Morph-Infected Word Detection in Igbo via Bitext. Widening Natural Language Processing (WiNLP), Association for Computational Linguistics (ACL) 2017.

Onyenwe Ikechukwu E. and Mark Hepple. 2016. Predicting Morphologically-Complex Unknown Words in Igbo. Text, Speech, Dialogue (TSD) Springer international -Verlag in Artifcial Intelligence, Volume 9924 1. DOI: https://doi.org/10.1007/978-3-319-45510-5_24.

Ignatius Ezeani, Mark Hepple, Ikechukwu Onyenwe. 2016. Automatic Restoration of Diacritics for Igbo Language. Text, Speech, Dialogue (TSD) Springer International -Verlag. Volume 9924 1. DOI: https://doi.org/10.1007/978-3-319-45510-5_23.

Onyenwe Ikechukwu, Mark Hepple and Chinedu Uchechukwu. 2016. Improving Accuracy of Igbo Corpus Annotation Using Morphological Reconstruction and Transformation-Based Learning. Proceedings of TALAf 2016: Traitement Automatique des Langues Africaines (TALAf 2016: African Language Processing), 23rd French Conference on Natural Language Processing, JEP-TALN-RECITAL 2016 at Inalco, Paris, 2016. Publisher: ATALA/AFCP, pages 1-10. https://talaf.imag.fr/2016/Diapos/OnyenweIK.pdf.

Onyenwe Ikechukwu, Mark Hepple, Chinedu Uchechukwu, and Ignatius Ezeani. 2015. Use of Transformation-Based Learning in Annotation Pipeline of Igbo, an African Language. Association of Computational Linguistics (ACL), Pages: 24–33, September 2015. https://www.aclweb.org/anthology/W15-5405/.

Onyenwe Ikechukwu E., Chinedu Uchechukwu, and Mark Hepple. 2014. Part-of-speech Tagset and Corpus Development for Igbo, an African. Association of Computational Linguistics (ACL), Pages: 93–98, August 2014. DOI: http://dx.doi.org/10.3115/v1/W14-4914.

About

This project aims to develop NLP resources to support NLP-based research in Igbo. It is a low-resourced African language spoken majorly in the eastern part of Nigeria by the Igbo people ( about 45 million speakers and is made up of over 20 dialects).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published