Skip to content

Latest commit

 

History

History
151 lines (103 loc) · 8.4 KB

README-en.md

File metadata and controls

151 lines (103 loc) · 8.4 KB




Sangkak Challenge IA

Sangkak-Challenge-IA

Let's challenge the NLP issues of the old continent in a different way...

creativecommons

Kaggle MIT scikit-learn TensorFlow Pandas Matplotlib PyTorch Linux Git


Others languages Status
-> Français OK
-> ... ...

SANGKAK-CHALLENGE-IA is an inter-datascientist and Natural Language Processing (NLP) researcher/engineer challenge aimed at creating concrete artificial intelligence solutions on an open-source dataset in African languages.

SANGKAK can be translated "To calculate while playing" in Yémba (language spoken in the department of Menoua in the West of Cameroon).

Why create this challenge?

Africa has an unprecedented cultural and linguistic heritage. Its 3000 languages are still among the most under-resourced languages in the world, despite all the initiatives created in recent years on the continent. The challenges are very great and we have a major advantage today to radically change things: the technologies and applications of data science.

Working groups have sprung up on the continent in recent years and they have produced significant amounts of structured and unstructured resources for African languages. In addition to the lexicographic resources of the NTeALan Social Network association, we can also cite those of INALCO, the Masahkane collective, Google Research, Meta and many other organizations and universities around the world.

Some resources exist and even though a good part of these resources are private, they should now be exploited to create value within the linguistic companies concerned. All of this also involves identifying local issues, finding a possible link between these issues and available resources. This is one of the main reasons for this challenge project.

February 2023 edition

...

Organizational information

Status
Official website https://sangkak-challenge-ia.ntealan.net OK
Work language FR, EN OK
Slack Community sangkak-challenge-ia OK
Edition February 2023 OK
Topic Name Entities Recognition (NER) OK
Source data MasakhaNER 2.0 OK
African languages bbj, bam, ewe, fon, hau, ibo, kin, lug, mos, nya, pcm , sna, swa, tsn, twi, wol, xho, yor, zul OK
Workshop session 04 September 2023 OK
Publication submitted ... In progress

Important steps to remember

Steps Status
1 Defining the topic of the session OK
2 Writing session specifications OK
3 Creation of the Github directory of the session OK
4 Creation of the challenge website (1st edition) OK
5 Opening of applications for the session OK
6 Selection of participants for the session OK
7 Beginning of the challenge among the participants OK
8 End of the challenge for the participants OK
9 Programming of the workshop OK
10 Beginning of the workshop (debate on the proposed solutions) OK
11 End of the workshop (debate on the proposed solutions) OK
12 Drafting of the work report OK
13 Publication of work In progress
14 End of the challenge session OK

How to participate in this February 2023 session ?

...

Context

...

Goals

...

Participate in the February 2023 session

To participate in this session and challenge other participants:

  • Each participant or group of participants will have to appropriate the corpus African POS Datasets by cloning this git repository.

  • You then had to create a repository in your own Github space following this structure:

    • /data_source (being the reference to the NER Masahkane/optional corpora)
    • /evaluation
    • /training
    • methodology.md
    • license.md
  • You must then propose your solution by respecting this structure. You are free to add other additional folders or files of your choice.

  • Rename your folder with the initials of the challenge followed by that of your project (Example: SCIA-ENR: ENR being the initials of your project) and then create a branch indicating a version number (Example: 0.1) and push it to your personal Github repository. You can also fork this sample directory which gives you a preview of this structure. (We will use this link as a git submodule of the proposals folder in this official challenge repository)

  • Go back to this repository and enter your proposal in the file PARTICIPANTS according to the fields provided. Then make a pull request so that the organizing committee validates your proposal and links your repository to this official repository.

Thank you for scrupulously respecting this procedure so that the organizing committee can better integrate your work into the official repository.

Organizing committee

This challenge is organized by NTeALan Research and Development in collaboration with NTeALan Cameroon and NTeALan France.

  • Elvis MBONING (Lead Data scientist NLP/NLU/Chatbot)
  • Jean-Marc Bassahak (Lead Motion Design and web developer)
  • Jules Assoumou (Vice rector of University of Ngaoundéré)
  • Tatiana Moteu (Data Scientist / PhD Student)
  • All research team of NTeALan Research and Developpment...

For any additional questions, do not hesitate to contact the challenge's organizing committee by Mail or the Slack platform.

Participants of the session

...

Winner of the session

...

Sponsors

This challenge is actually sponsored by:

  • Research teams ERTIM of INALCO
  • NTeALan Social Network association