Sangkak-Challenge-IA
Let's challenge the NLP issues of the old continent in a different way...
Others languages | Status |
---|---|
-> Français | OK |
-> ... | ... |
SANGKAK-CHALLENGE-IA is an inter-datascientist and Natural Language Processing (NLP) researcher/engineer challenge aimed at creating concrete artificial intelligence solutions on an open-source dataset in African languages.
SANGKAK can be translated "To calculate while playing" in Yémba (language spoken in the department of Menoua in the West of Cameroon).
Africa has an unprecedented cultural and linguistic heritage. Its 3000 languages are still among the most under-resourced languages in the world, despite all the initiatives created in recent years on the continent. The challenges are very great and we have a major advantage today to radically change things: the technologies and applications of data science.
Working groups have sprung up on the continent in recent years and they have produced significant amounts of structured and unstructured resources for African languages. In addition to the lexicographic resources of the NTeALan Social Network association, we can also cite those of INALCO, the Masahkane collective, Google Research, Meta and many other organizations and universities around the world.
Some resources exist and even though a good part of these resources are private, they should now be exploited to create value within the linguistic companies concerned. All of this also involves identifying local issues, finding a possible link between these issues and available resources. This is one of the main reasons for this challenge project.
...
Status | ||
---|---|---|
Official website | https://sangkak-challenge-ia.ntealan.net | OK |
Work language | FR, EN | OK |
Slack Community | OK | |
Edition | February 2023 | OK |
Topic | Name Entities Recognition (NER) | OK |
Source data | MasakhaNER 2.0 | OK |
African languages | bbj, bam, ewe, fon, hau, ibo, kin, lug, mos, nya, pcm , sna, swa, tsn, twi, wol, xho, yor, zul | OK |
Workshop session | 04 September 2023 | OK |
Publication submitted | ... | In progress |
Steps | Status | |
---|---|---|
1 | Defining the topic of the session | OK |
2 | Writing session specifications | OK |
3 | Creation of the Github directory of the session | OK |
4 | Creation of the challenge website (1st edition) | OK |
5 | Opening of applications for the session | OK |
6 | Selection of participants for the session | OK |
7 | Beginning of the challenge among the participants | OK |
8 | End of the challenge for the participants | OK |
9 | Programming of the workshop | OK |
10 | Beginning of the workshop (debate on the proposed solutions) | OK |
11 | End of the workshop (debate on the proposed solutions) | OK |
12 | Drafting of the work report | OK |
13 | Publication of work | In progress |
14 | End of the challenge session | OK |
...
...
...
To participate in this session and challenge other participants:
-
Each participant or group of participants will have to appropriate the corpus African POS Datasets by cloning this git repository.
-
You then had to create a repository in your own Github space following this structure:
- /data_source (being the reference to the NER Masahkane/optional corpora)
- /evaluation
- /training
- methodology.md
- license.md
-
You must then propose your solution by respecting this structure. You are free to add other additional folders or files of your choice.
-
Rename your folder with the initials of the challenge followed by that of your project (Example: SCIA-ENR: ENR being the initials of your project) and then create a branch indicating a version number (Example: 0.1) and push it to your personal Github repository. You can also fork this sample directory which gives you a preview of this structure. (We will use this link as a git submodule of the proposals folder in this official challenge repository)
-
Go back to this repository and enter your proposal in the file PARTICIPANTS according to the fields provided. Then make a pull request so that the organizing committee validates your proposal and links your repository to this official repository.
Thank you for scrupulously respecting this procedure so that the organizing committee can better integrate your work into the official repository.
This challenge is organized by NTeALan Research and Development in collaboration with NTeALan Cameroon and NTeALan France.
- Elvis MBONING (Lead Data scientist NLP/NLU/Chatbot)
- Jean-Marc Bassahak (Lead Motion Design and web developer)
- Jules Assoumou (Vice rector of University of Ngaoundéré)
- Tatiana Moteu (Data Scientist / PhD Student)
- All research team of NTeALan Research and Developpment...
For any additional questions, do not hesitate to contact the challenge's organizing committee by Mail or the Slack platform.
...
...
This challenge is actually sponsored by:
- Research teams ERTIM of INALCO
- NTeALan Social Network association