Skip to content

MoroccoAI/AMLD-Africa-Workshop-2021

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Organizers

Khalil Mrini, Imane Khaouja, Ihsane Gryech, Anass Sedrati and Abdelhak Mahmoudi

Title

Moroccan Darija Wikipedia: Basics of Natural Language Processing for a Low-Resource Language

Description

NLP is a field that is in high demand, and where research progresses actively and quickly. Whereas language technology for languages like English and French is highly developed, low-resource languages (like most African indigenous languages) have been left behind and marginalized. There are many opportunities to create new tools for languages with few resources. In this tutorial, we take the example of Moroccan Darija, the national vernacular in Morocco. Our use case dataset will be the Moroccan Darija Wikipedia.

The participants will first learn statistical tools to analyze language in the tutorial. The tutorial will go over NLP notions including text pre-processing and tokenization, n-gram language modeling, n-gram frequency, topic modeling, and word embeddings. The tutorial consists of theoretical definitions and concrete examples in Python. The participants can then move to the practice part of the workshop, in teams of 1 to 5 people. Each team will be given the Moroccan Darija Wikipedia and will work on analyzing the dataset from an angle of their choice. At the end of the workshop, the teams will be invited to show their findings in a short presentation.

Agenda

  • 3:00-3:10, Introduction
  • 3:10-3:40, Anass Sedrati: Darija Wikipedia
  • 3:40-3:50, Explore the Moroccan Darija Wikipedia
  • 3:50-4:10, Ihsane Gryech: Intro to NLP
  • 4:10-4:40, Imane Khaouja: Lab1: Wikipedia Darija Cleaning Open In Colab
  • 4:40-5:00, 20 min Break
  • 5:00-5:30, Khalil Mrini: Lab2: Wikipedia Darija Topic Detection Open In Colab
  • 5:30-6:00, Abdelhak Mahmoudi: Lab3: NLP Tasks and Tools Open In Colab
  • 6:00-6:20, 20 min Break
  • 6:20:8:20, Explore on your own in breakout rooms
  • 8:20-8:50, Discussion of your results
  • 8:50-9:00, Conclusion

Slides

click here

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages