

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/Ner_masakhaner.ipynb)






# ***`NER Model for African Languages`***


## 1. Colab Setup

In [None]:
# Install PySpark and Spark NLP
! pip install -q pyspark==3.3.0 spark-nlp==4.2.8

# Install Spark NLP Display lib
! pip install --upgrade -q spark-nlp-display

In [None]:
import pandas as pd
import numpy as np
import json
import os

from pyspark.ml import Pipeline
from pyspark.sql.types import StringType, IntegerType
from pyspark.sql import SparkSession
import pyspark.sql.functions as F

import sparknlp
from sparknlp.annotator import *
from sparknlp.base import *
from sparknlp.pretrained import PretrainedPipeline
from sparknlp_display import NerVisualizer

## 2. Start Spark Session

In [None]:
spark = sparknlp.start()
print ("Spark NLP Version :", sparknlp.version())
spark

Spark NLP Version : 4.2.8


### <font color='green'> 📍***xlm_roberta_large_token_classifier_masakhaner***</font>

*It’s been trained using xlm_roberta_large fine-tuned model on African languages (**Amharic, Hausa, Igbo, Kinyarwanda, Luganda, Nigerian, Pidgin, Swahilu, Wolof, and Yorùbá**).*


### <font color='green'> 📍***distilbert_base_token_classifier_masakhaner***</font>

*it’s been finetuned on MasakhaNER dataset for African languages (**Hausa, Igbo, Kinyarwanda, Luganda, Nigerian, Pidgin, Swahilu, Wolof, and Yorùbá**) leveraging DistilBert embeddings and DistilBertForTokenClassification for NER purposes.*


## 3. Sample Examples for all of the African languages

In [None]:
text_list_amharic = ["""አህመድ ቫንዳ ከ3-10-2000 ጀምሮ በአዲስ አበባ ኖሯል።""","""ሰማያዊ ፓርቲ ዛሬ በወቅታዊ የሀገሪቱ የፖለቲካ ጉዳዮች ላይ በመኢአድ ጽህፈት ቤት የሰጠው ጋዜጣዊ መግለጫ ይከተላል ፡፡""","""የ ዓመቱ አዲሱ የዚምባብዌ ፕሬዚደንት ኤመርሰን ምናንጋግዋ በሁለቱ ቻዎቻቸው አንፃር በዕድሜ ትንሹ ናቸው ።""","""ዶይቸ ቬለ ያነጋገራቸው የመብት ተሟጋቿ ሊንዳ ማዜሪሬ በዕድሙ በተዳከሙ መሪዎች ነው የምንተዳደረው በማለት የርሳቸውን እና የአህጉሩን ወጣት ትውልድ ቅሬታ ገልጸዋል ።""","""ሪታ ፓንክኸርስት የኢትዮጵያ ባለውለታ አዲስ አበባ ላይ ትዳር የመሰረቱት የኢትዮጵያ ታሪክ ተመራማሪዎች በትዳር ከ ዓመታት በላይ ዘልቀዋል ።""","""በሳልስቱ እስራኤል ዉስጥ በተደረገዉ አጠቃላይ ምርጫ አክራሪዉ የጠቅላይ ሚንስትር ቤንያሚን ኔትንያሁ ፓርቲ ሊኩድ አሸነፈ ።"""]

In [None]:
text_list_hausa = ["""A saurari cikakken rahoton wakilin Muryar Amurka Ibrahim Abdul'aziz""","""Najeriya : Kungiyar Ma'aikatan Jami'o'i Ta Shiga Yajin Aikin Gargadi""","""A ranar Juma’a mai zuwa ne wa’adin yajin aikin na gargadi zai kammala , kuma a hirar su da wakilin Muaryar Amurka , Komared Mohammed Jaji ya yi tsokaci game da mataki na gaba .""","""Kan haka Majalisar Dinkin Duniya ta zabi Aliko Dangote , da shugaban bankin raya Afirka , da wassu mutane 25 a fadin duniya su jagoranci magance matsalar tamowa , kafin shekara 2030 .""","""Temitope Olatoye Sugar shine mai wakiltar mazabar Lagelu da Akinyele daga jihar Oyo , a majalisar wakilan tarayyar Najeriya .""","""Tsohon mataimakin shugaban Najeriya , kuma dan takarar shugaban kasa a zaben 2019 karkashin jam’iyyar adawa ta PDP , Atiku Abubakar , ya yi Allah wadai da yunkurin da wasu sojoji suka yi na kifar da “ zababbiyar gwamnatin Habasha ."""]

In [None]:
text_list_igbo = ["""Osote onye - isi ndị ome - iwu Naịjirịa bụ Ike Ekweremadu ekwuola na ike agwụla ndị Sịnatị iji otu nkeji darajụụ akwanyere ndị egburu n'ime oke ọgbaghara dị na Naịjirịa oge ọ bula .""","""Okwu a Buhari kwuru na isi ndọrọndọrọ ọchịchị na 2015 bu ịhe eji kpụrụ ya na ọnụ ugbua , ọkachasị ka ụlọ ọrụ na - ahụ maka ọnụ ọgụgụ a na - akpọ National Bureau of Statistics ( NBS ) nwepụtara ozi n'akọwa na mmadụ ruru nde asaa na nari ise so na ndị enweghi ọrụ kemgbe afọ 2016 .""","""Google Africa kwuru n'igwe okwu Twitter sị : Taa , anyị na - akwanyere onye egwuregwu bọọlụ a ma ama , Stephen Keshi ugwu .""","""Keshi chịrị ndị otu egwuregwu Super Eagles kemgbe afọ 2011 ma durukwa ha gaa asọmpi dị iche iche nke gụnyere ; Iko Mba Afrika na 2013 ( nke ha bulatara Naịjirịa ) , iko mpaghara Afrịka dị iche iche na 2013 , ma nye aka wetara Naijiria ọnọdụ n'asọmpi Iko Mbaụwa niile na 2014 .""","""N' akwụkwọ ozi , ngalaba 'US Department' tinyere na websait ha , ha kwuru sị : Yunaited Steeti na - enwe obi mwute n' iyi ọrụ nke onye ndu ndị na - ama gọọmenti Kenya aka n'ihu bụ Raila Odinga duru onwe ya ka ọnwa Jenuwari gbara ịrị atọ .""","""Cheta na Gọọmenti etiti mechiburu ụlọọrụ ngosi TV atọ maka na ha gbasara ozi gosịrị Raila Odinga ebe ọ na - edu onwe ya iyi ọrụ ma kpọkwa onwe ya onyeisiala mba Kenya , ebe ulọikpe Kenya akwụsịrị mmechi ahụ ụbọchị Wenesde .""","""Taa , otu n'ime ndị kewapụtara n'otu ndọrọndọrọ ọchịchị APC kpọrọ ndị ntaakụkọ n'isi ụlọọrụ ha maka ị kọwa echiche ha n'esomokwu nke di n'etiti ndị APC nke Imo steeti . N'ọnụ okwu TOE Ekechi bụ onụ na - ekwuchitere otu a , ha na - ebo gọvanọ Okorocha ebubo na o nupuru iwu ji patu ha isi ọtụtụ""","""Otu kporo onweha 'The Coalition of Northern Groups' na bekee gwara onyeisiala Naịjirịa bụ Muhammadu Buhuri na onye chiburu dịka osote onyeisiala n'oge garaaga bụ Atiku Abubakar na ọ ga - adị mma maọbụrụ na ha abụọ wepuru aka n'ime ọsọ ị banye n'ọkwa ọchịchị dịka onyeisiala n'afọ 2019 ."""]

In [None]:
text_list_kinyarwanda = ["""Ambasaderi w’Umuryango w’Ubumwe bw’u Burayi mu Rwanda , Nicola Bellomo , aherutse gushima uko u Rwanda rurimo guhangana n’icyorezo cya Coronavirus , yizeza ko uyu muryango uzakomeza gufatanya na rwo muri uru rugamba no mu zindi gahunda z’iterambere .""","""Imibare ya Banki y’Isi yo kuwa 9 Mata igaragaza ko ubukungu bwo muri Afurika yo-munsi y’Ubutayu bwa Sahara , bwagizweho ingaruka na Coronavirus ndetse ko buzamanuka ku kigero cya - 2 .""","""Mu butumwa yanditse kuri Twitter kuri uyu kuwa Kane , Mateke yahishuye ko kuva kera na kare atemeraga amasezerano basinyanye n’u Rwanda agamije guhosha umwuka mubi uri hagati y’ibihugu byombi , mu gihe ari umwe mu bagombaga kuba bakurikirana uko ashyirwa mu bikorwa .""","""Amagambo ya Mateke anahura n’ay’umudepite Ruth Nankabirwa , kuri uyu kuwa Gatatu wabwiye bagenzi be mu Nteko Ishinga Amategeko ko Guverinoma ya Uganda ikwiye gukemura bwangu ikibazo ifitanye n’u Rwanda , ariko asa n’uca amarenga ku buryo bwakoreshwa .""","""Ubwo bari ku ngingo zijyanye n’uko Uganda ifasha imitwe yitwaje intwaro , Nduhungirehe yatanze urugero rw’igitero cyabaye mu ijoro rishyira ku itariki ya Kane Ukwakira aho abarwanyi b’umutwe wa RUD Urunana bateye mu Kinigi ."""]

In [None]:
text_list_luganda = ["""Phillip Wokorach , Justin Kimono ne Adrian Kisito be bamu ku baayambye Uganda , eyawangula empala zino omwaka oguwedde , okuva emabeganefuna obuwanguzi .""","""Oluvannyuma yaddukira mu Zimbabwe ngakozesa Paasipooti eyali mu mannya ga David Mubiru , kyokka aboobuyinza baamuyigga ne bamukomyawo mu Uganda , mu November 2016 , okumalayo ekibonerezo ekyemyaka ena nemyaka emirala ebiri , egyamwongerwako olwokutoloka mu kkomera .""","""DPC wa Rakai , Patience Baganzi yategeezezza nti bagenda kumukwasa poliisi ye Katwe mu Kampala gye yaddiza omusango avunaanibwe .""","""OMWAMI wa Ssabasajja owessaza lya Mawokota afudde kibwatukira nalekabanna Mawokota mu kiyongobero . Kayima David Ssekyeru afudde mu ngeri yentiisa bwaseredde nagwa mu kinaabiro nga egenze okunaaba bagenze okuyita ambulensi okumuddusa mu ddwaliro e Mmengo nafiira mu kkubo nga tebanatuuka mu ddwaliro . Ssekyeru abadde amaze wiiki emu nga mugonvugonvu kyokka abadde azeemu endasi kwekwewaliriza agende mu kinaabiro""","""Omwogezi wa poliisi mu Greater Masaka , ASP Paul Kangave yategeezezza Bukedde nti poliisi yatandikiddewo okunoonyereza oluvannyuma lwokufuna amawulire gokutemulwa kwomusuubuzi ono .""","""Mugisha baamukwatira Ndeeba mu Nsiike Zooni ku ntandikwa ya wiiki ewedde era yasooka kutegeeza poliisi nti munne Mulo yattibwa ekibinja kyaba bodaboda abaabalondoola nga baakamala okubba pikipiki e Mityana ne babataayiriza e Makindye ."""]

In [None]:
text_list_Nigerian = ["""Jii 2 go mane gin ja apiko moro ma ja higni 20 mane oyang nyinge kaka Kevin Omondi kod achiel kuom jowuoth mage mane oting' o mane iluongo ni Shopie Anyango ma ja higni 23 ne jotho mana kanyo gi kanyo e masirano mane ojuko lori moro mar kambi jo China kod apiko yoo Ringa""", """Japuonjreno ma wuoi ma jahigni 15 ochopo e nyim jayal bura Joseph Karanja kama odonjne kod ketho mar nego Noel Adhiambo midenyo ma jahigni 11 ; mane en japuonjre e skul ma Kosele Community Christian Center e kar chung' od bura ma Kasipul dwee mokalo .""","""Magi oyangi gi jawach eloo State House nyadendi Kanze Dena mane owacho ni jogo nyocha opim ne tuono e pimo manyocha otim chieng tich 4""","""Kanomedochiwo ler ewii wachno Kanze nowacho ni jotich duto mag State House ipimoga moting' o e kinde ka kinde moting' o nyaka jatend piny Kenya migosi Uhuru Kenyata gi familia mare mar ng' eyo chal margi ne tuo mar Covid - 19no kowacho ni jii 4 mane oyudi ni kod tuono sani jonie kar thieth ma Kenyatta University Teaching , Referal and Research Hospital ma gidhiyoe nyime gi yudo thieth""","""MCAsgo mane otelnegi kod Julius Nyambok ma Homa Bay Central ward ne jogolo rang' isi mag chenro buora mag dongruok moting' o kama ikanoe remo e kar thieth ma Homa Bay County Referal Hospital , kambi mogo ma chiro Kigoti kod kar pidho jamni ma Arujo"""]

In [None]:
text_list_Pidgin = ["""Popular cable satellite broadcaster DsTV , no get right to Bundesliga live matches for di 2019 / 2020 season so na pipo wey get StarTimes dey in luck because na dem get broadcast rights for Sub - Saharan Africa .""","""Whichever way wey you watch just know say you dey part of one billion pipo wey Bayern CEO Karl - Heinz Rummenigge don gauge say go watch dis weekend live matches See Saturday games .""","""Conditions Spain top league and working place of Lionel Messi dey torchlight June 12 as di date when dem go resume di season .""","""LA Lakers legend Kobe Bryant and im daughter Gianna plus seven oda die for helicopter crash for di city of Calabasa , California on Sunday 26 January .""","""Ighalo move go Chinese Super League for 2017 , first with Changchun Yatai .""","""Senegal and Liverpool forward Mane beat both Egypt player Mohammed Salah and Algeria winger Riyad Mahrez to win di award wey dem do for Egypt on Tuesday ."""]

In [None]:
text_list_Swahilu = ["""Wanamgambo wa ADF Mauaji ya Alhamisi katika mkoa wa Mbau kaskazini mwa Beni yanashukiwa kufanya na kundi la waasi la Allied Democratic Force , ADF , ambalo linahusika na mfululizo wa mauaji tangu kuanza kwa ghasia mwezi November .""","""Jeshi la Congo limegundua ‘kiwanda cha kutengeneza mabomu ya kienyeji’ katika kambi moja ya ADF waliyoiteka , msemaji wa jeshi jenerali Leon Richard Kasonga amesema Jumatano .""","""Wajumbe wa kikosi kazi cha virusi vya corona cha White House wamepangiwa kutoa ushuhuda mbele ya kamati ya Nishati na Biashara ya Baraza la Wawakilishi Jumanne , na Spika wa Baraza la Wawakilishi Nancy Pelosi amesema , “ Wananchi wa Marekani wanahitaji majibu kwa nini Rais Trump anataka upimaji upunguzwe kasi wakati wataalam wanasema upimaji zaidi unahitajika .""","""Siku Jumatano maafisa wawili wa Umoja wa Mataifa watawasilisha ripoti inayoeleza kwamba kuna ushahidi wa kutosha unaodhihirisha kwamba Saudi Arabia ilidukua simu ya Bezos .""","""Mahakama ya Juu ya Korea Kusini imeamrisha mahakama ya chini ifikirie tena moja ya mashtaka ya jinai dhidi ya Rais wa zamani Park Geun - hye ambaye alilazimishwa kuondoka madarakani mwaka 2017 kutokana na kashfa ya ufisadi .""","""Waziri Mkuu wa Uingereza Boris Johnson amesema ataheshimu utaratibu wa sheria lakini Uingereza itajiondowa kutoka Umoja wa Ulaya ( EU ) ifikapo Oktoba 31 ."""]

In [None]:
text_list_Wolof = ["""Dafa di sax , ni mu ame woon noonu fit moo taxoon ñu dàq ko , moom ak benn doomu Farãs bu daan wuyoo ci turu Daniel Cohn - Bendit , ca daara ju mag jooju , ci atum 1969 .""","""Usmaan Sonkoo ngi juddoo Cees ci atum 1974 .""","""Waaw , Isaa Sàll nekkoon na fi Njiitu ndajem diiwaanu Fatig ci njeexitalu atiy 1990 .""","""IRÃ NDAW : Komisaariya bu Ndaakaaru woolu na waaraatekatu Sentv bi .""","""Ciy ati 60 , bokkoon na ci ” Groupe de Grenoble ” kurél gu doon jéem a suqali làmmiñi réew mi mook ñoomin Asan Silla , Masàmba Sare ak Saaliyu Kànji ak it Ablaay Wàdd mi fi doonoon njiitu réewum Senegaal ."""]

In [None]:
text_list_Yorùbá = ["""Ẹgbẹ́ Ohùn Àgbáyé dúró ṣinṣin pẹ̀lú Luis Carlos , ẹbíi rẹ̀ , àti oníròyìn aládàáṣiṣẹ́ gbogbo àwọn tí ó ń mú ìjọba ṣe bí ó ti yẹ ní Venezuela .""","""Ilé - iṣẹ́ẹ Mohammed Sani Musa , Activate Technologies Limited , ni ó kó ẹ̀rọ Ìwé - pélébé Ìdìbò Alálòpẹ́ ( PVCs ) tí a lò fún ọdún - un 2019 , nígbà tí ó jẹ́ òǹdíjedupò lábẹ́ ẹgbẹ́ olóṣèlúu tí ó ń tukọ̀ ètò ìṣèlú lọ́wọ́ All Progressives Congress ( APC ) fún Aṣojú Ìlà - Oòrùn Niger , ìyẹn gẹ́gẹ́ bí ilé iṣẹ́ aṣèwádìí , Premium Times ṣe tẹ̀ ẹ́""","""Ishaku Elisha Abbo ti ẹgbẹ́ alátakò People’s Democratic Party ( PDP ) jẹ́ aṣojú tí ó ń ṣojú Ẹkùn - un Àríwá Adamawa ní Ìpínlẹ̀ Adamawa , ní ìlà - oòrùn àríwá orílẹ̀ èdèe Nàìjíríà .""","""Nínú oṣù Agẹmọ 2019 , ní ìṣojú ọlọ́pàá , Abbo ṣe àṣemáṣe pẹ̀lú òṣìṣẹ́bìnrin kan nínú ìsọ̀ ohun ìbálòpọ̀ ní olú - ìlú Nàìjíríà ní Abuja .""","""Abba Moro , tí í ṣe ọmọ ẹgbẹ́ẹ PDP , ni aṣojú fún ẹ̀ka Gúúsù Benue , àárín gbùngbùn àríwá Nàìjíríà .""","""Ní ọjọ́ 15 , oṣù Ẹrẹ́nà , ọdún - un 2014 , Moro , tí í ṣe ọ̀gá pátápátá ètò abélé , ló wà nídìí ìṣẹ̀lẹ̀ abanilọ́kànjẹ́ Ìgbanisíṣẹ́ Ẹ̀ṣọ̀ aṣọ́bodè Nàìjíríà tí àwọn ọ̀dọ́langba tí ó tó bíi ẹgbẹẹgbẹ̀rún 6 tó fẹ́ àyè iṣẹ́ ẹgbẹ̀rún 4 tí ó ṣí sílẹ̀ nínú iléeṣẹ́ Ẹ̀ṣọ̀ Aṣọ́bodè Nàìjíríà tí"""]

In [None]:
model_names = ["xlm_roberta_large_token_classifier_masakhaner", 
               "distilbert_base_token_classifier_masakhaner"]

In [None]:
xlm_roberta_text_list = [text_list_amharic, 
                         text_list_hausa, 
                         text_list_igbo , 
                         text_list_kinyarwanda, 
                         text_list_luganda, 
                         text_list_Nigerian, 
                         text_list_Pidgin, 
                         text_list_Swahilu, 
                         text_list_Wolof, 
                         text_list_Yorùbá]


In [None]:
distilbert_text_list = [text_list_hausa, 
                        text_list_igbo , 
                        text_list_kinyarwanda, 
                        text_list_luganda, 
                        text_list_Nigerian, 
                        text_list_Pidgin, 
                        text_list_Swahilu, 
                        text_list_Wolof, 
                        text_list_Yorùbá]


## 4. Define Spark NLP pipeline

In [None]:
def ner_masakhaner(model_name, language_text):

    documentAssembler = DocumentAssembler()\
          .setInputCol("text")\
          .setOutputCol("document")

    sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\
          .setInputCols(["document"])\
          .setOutputCol("sentence")

    tokenizer = Tokenizer()\
          .setInputCols(["sentence"])\
          .setOutputCol("token")

    ner_converter = NerConverter()\
          .setInputCols(["sentence", "token", "ner"])\
          .setOutputCol("ner_chunk")


    if model_name == 'xlm_roberta_large_token_classifier_masakhaner':
      tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_large_token_classifier_masakhaner", "xx")\
          .setInputCols(["sentence",'token'])\
          .setOutputCol("ner")

    else:
      tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_token_classifier_masakhaner", "xx")\
          .setInputCols(["sentence",'token'])\
          .setOutputCol("ner")

    nlpPipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier, ner_converter])

    empty_data = spark.createDataFrame([[""]]).toDF("text")
    model = nlpPipeline.fit(empty_data)

    print("")
    print("\u001b[32m*************************  MODEL NAME :  " + model_name + "   ***********************\u001b[32m", end ='\n\n')
    

    for text_name in language_text:
      x = [ i for i, a in globals().items() if a == text_name][0]
      df = spark.createDataFrame(text_name, StringType()).toDF("text")
      print("")
      print("\u001b[31m*************************  LANGUAGE_TEXT :  " + x + "   ***********************\u001b[0m", end ='\n\n')
      
      
      #result dataframe
      result = model.transform(df)
      result.select(F.explode(F.arrays_zip(result.ner_chunk.result, 
                                           result.ner_chunk.metadata)).alias("cols")) \
            .select(F.expr("cols['0']").alias("chunk"),
                    F.expr("cols['1']['entity']").alias("ner_label"))\
            .show(truncate=False)

      #visualization
      NerVisualizer().display(
          result = result.collect()[3],
          label_col = 'ner_chunk',
          document_col = 'document')

In [18]:
# with "xlm_roberta_large_token_classifier_masakhaner" model
ner_masakhaner('xlm_roberta_large_token_classifier_masakhaner', xlm_roberta_text_list)

sentence_detector_dl download started this may take some time.
Approximate size to download 514.9 KB
[OK!]
xlm_roberta_large_token_classifier_masakhaner download started this may take some time.
Approximate size to download 1.7 GB
[OK!]

[32m*************************  MODEL NAME :  xlm_roberta_large_token_classifier_masakhaner   ***********************[32m


[31m*************************  LANGUAGE_TEXT :  text_list_amharic   ***********************[0m

+--------------+---------+
|chunk         |ner_label|
+--------------+---------+
|አህመድ ቫንዳ      |PER      |
|ከ3-10-2000 ጀምሮ|DATE     |
|በአዲስ አበባ      |LOC      |
|ሰማያዊ ፓርቲ      |ORG      |
|ዛሬ            |DATE     |
|በመኢአድ ጽህፈት ቤት |ORG      |
|የ ዓመቱ         |DATE     |
|የዚምባብዌ        |LOC      |
|ኤመርሰን ምናንጋግዋ  |PER      |
|ዶይቸ ቬለ        |ORG      |
|ሊንዳ ማዜሪሬ      |PER      |
|የአህጉሩን        |LOC      |
|ሪታ ፓንክኸርስት    |PER      |
|የኢትዮጵያ        |LOC      |
|አዲስ አበባ       |LOC      |
|የኢትዮጵያ        |LOC      |
|ከ ዓመታት በላይ    |DATE     |


[31m*************************  LANGUAGE_TEXT :  text_list_hausa   ***********************[0m

+-----------------------------------+---------+
|chunk                              |ner_label|
+-----------------------------------+---------+
|Muryar Amurka                      |ORG      |
|Ibrahim Abdul'aziz                 |PER      |
|Najeriya                           |LOC      |
|Kungiyar Ma'aikatan Jami'o'i       |ORG      |
|Juma’a mai zuwa                    |DATE     |
|Muaryar Amurka                     |ORG      |
|Mohammed Jaji                      |PER      |
|Majalisar Dinkin Duniya            |ORG      |
|Aliko Dangote                      |PER      |
|bankin raya Afirka                 |ORG      |
|shekara 2030                       |DATE     |
|Temitope Olatoye Sugar             |PER      |
|Lagelu                             |LOC      |
|Akinyele                           |LOC      |
|Oyo                                |LOC      |
|majalisar wakilan tarayyar Najeriya|OR


[31m*************************  LANGUAGE_TEXT :  text_list_igbo   ***********************[0m

+-----------------------------+---------+
|chunk                        |ner_label|
+-----------------------------+---------+
|Naịjirịa                     |LOC      |
|Ike Ekweremadu               |PER      |
|otu nkeji                    |DATE     |
|Naịjirịa                     |LOC      |
|Buhari                       |PER      |
|2015                         |DATE     |
|National Bureau of Statistics|ORG      |
|NBS                          |ORG      |
|afọ 2016                     |DATE     |
|Google                       |ORG      |
|Africa                       |LOC      |
|Twitter                      |ORG      |
|Taa                          |DATE     |
|Stephen Keshi                |PER      |
|Keshi                        |PER      |
|Super Eagles                 |ORG      |
|afọ 2011                     |DATE     |
|Afrika                       |LOC      |
|2013                 


[31m*************************  LANGUAGE_TEXT :  text_list_kinyarwanda   ***********************[0m

+-------------------------------------+---------+
|chunk                                |ner_label|
+-------------------------------------+---------+
|w’Umuryango w’Ubumwe bw’u Burayi     |ORG      |
|Rwanda                               |LOC      |
|Nicola Bellomo                       |PER      |
|u Rwanda                             |LOC      |
|Banki y’Isi                          |ORG      |
|kuwa 9 Mata                          |DATE     |
|Afurika yo-munsi y’Ubutayu bwa Sahara|LOC      |
|Twitter                              |ORG      |
|kuwa Kane                            |DATE     |
|Mateke                               |PER      |
|Rwanda                               |LOC      |
|Mateke                               |PER      |
|Ruth Nankabirwa                      |PER      |
|kuwa Gatatu                          |DATE     |
|Nteko Ishinga Amategeko              |ORG     


[31m*************************  LANGUAGE_TEXT :  text_list_luganda   ***********************[0m

+-----------------------------------+---------+
|chunk                              |ner_label|
+-----------------------------------+---------+
|Phillip Wokorach                   |PER      |
|Justin Kimono                      |PER      |
|Adrian Kisito                      |PER      |
|Uganda                             |LOC      |
|omwaka oguwedde                    |DATE     |
|Zimbabwe                           |LOC      |
|David Mubiru                       |PER      |
|Uganda                             |LOC      |
|November 2016                      |DATE     |
|ekyemyaka ena nemyaka emirala ebiri|DATE     |
|Rakai                              |LOC      |
|Patience Baganzi                   |PER      |
|poliisi ye Katwe                   |ORG      |
|Kampala                            |LOC      |
|Mawokota                           |LOC      |
|Mawokota                           |


[31m*************************  LANGUAGE_TEXT :  text_list_Nigerian   ***********************[0m

+-----------------------------------------+---------+
|chunk                                    |ner_label|
+-----------------------------------------+---------+
|higni 20                                 |DATE     |
|Kevin Omondi                             |PER      |
|Shopie Anyango                           |PER      |
|higni 23                                 |DATE     |
|China                                    |LOC      |
|Ringa                                    |LOC      |
|jahigni 15                               |DATE     |
|Joseph Karanja                           |PER      |
|Noel Adhiambo                            |PER      |
|jahigni 11                               |DATE     |
|skul ma Kosele Community Christian Center|ORG      |
|od bura ma Kasipul                       |ORG      |
|dwee mokalo                              |DATE     |
|State House                        


[31m*************************  LANGUAGE_TEXT :  text_list_Pidgin   ***********************[0m

+-----------------------+---------+
|chunk                  |ner_label|
+-----------------------+---------+
|DsTV                   |ORG      |
|2019 / 2020            |DATE     |
|StarTimes              |ORG      |
|Sub - Saharan Africa   |LOC      |
|Bayern                 |ORG      |
|Karl - Heinz Rummenigge|PER      |
|weekend                |DATE     |
|Saturday               |DATE     |
|Spain                  |LOC      |
|Lionel Messi           |PER      |
|June 12                |DATE     |
|LA Lakers              |ORG      |
|Kobe Bryant            |PER      |
|Gianna                 |PER      |
|city of Calabasa       |LOC      |
|California             |LOC      |
|Sunday 26 January      |DATE     |
|Ighalo                 |PER      |
|Chinese Super League   |ORG      |
|2017                   |DATE     |
+-----------------------+---------+
only showing top 20 rows




[31m*************************  LANGUAGE_TEXT :  text_list_Swahilu   ***********************[0m

+-----------------------+---------+
|chunk                  |ner_label|
+-----------------------+---------+
|ADF                    |ORG      |
|Alhamisi               |DATE     |
|Mbau kaskazini         |LOC      |
|Beni                   |LOC      |
|Allied Democratic Force|ORG      |
|ADF                    |ORG      |
|November               |DATE     |
|Congo                  |LOC      |
|ADF                    |ORG      |
|Leon Richard Kasonga   |PER      |
|Jumatano               |DATE     |
|White House            |ORG      |
|Jumanne                |DATE     |
|Nancy Pelosi           |PER      |
|Marekani               |LOC      |
|Trump                  |PER      |
|Jumatano               |DATE     |
|Umoja wa Mataifa       |ORG      |
|Saudi Arabia           |LOC      |
|Bezos                  |PER      |
+-----------------------+---------+
only showing top 20 rows




[31m*************************  LANGUAGE_TEXT :  text_list_Wolof   ***********************[0m

+------------------+---------+
|chunk             |ner_label|
+------------------+---------+
|Farãs             |LOC      |
|Daniel Cohn       |PER      |
|Bendit            |PER      |
|atum 1969         |DATE     |
|Usmaan Sonkoo     |PER      |
|Cees              |LOC      |
|atum 1974         |DATE     |
|Isaa Sàll         |PER      |
|Fatig             |LOC      |
|atiy 1990         |DATE     |
|IRÃ NDAW          |PER      |
|Ndaakaaru         |LOC      |
|Sentv             |ORG      |
|ati 60            |DATE     |
|Groupe de Grenoble|ORG      |
|Asan Silla        |PER      |
|Masàmba Sare      |PER      |
|Saaliyu Kànji     |PER      |
|Ablaay Wàdd       |PER      |
|Senegaal          |LOC      |
+------------------+---------+




[31m*************************  LANGUAGE_TEXT :  text_list_Yorùbá   ***********************[0m

+-----------------------------------------+---------+
|chunk                                    |ner_label|
+-----------------------------------------+---------+
|Ohùn Àgbáyé                              |ORG      |
|Luis Carlos                              |PER      |
|Venezuela                                |LOC      |
|Mohammed Sani Musa                       |PER      |
|Activate Technologies Limited            |ORG      |
|ọdún - un 2019                           |DATE     |
|All Progressives Congress                |ORG      |
|APC                                      |ORG      |
|Ìlà - Oòrùn Niger                        |LOC      |
|Premium Times                            |ORG      |
|Ishaku Elisha Abbo                       |PER      |
|People’s Democratic Party                |ORG      |
|PDP                                      |ORG      |
|Àríwá Adamawa                        

In [19]:
# with "distilbert_base_token_classifier_masakhaner" model
ner_masakhaner('distilbert_base_token_classifier_masakhaner', distilbert_text_list)

sentence_detector_dl download started this may take some time.
Approximate size to download 514.9 KB
[OK!]
distilbert_base_token_classifier_masakhaner download started this may take some time.
Approximate size to download 482.3 MB
[OK!]

[32m*************************  MODEL NAME :  distilbert_base_token_classifier_masakhaner   ***********************[32m


[31m*************************  LANGUAGE_TEXT :  text_list_hausa   ***********************[0m

+-----------------------------------+---------+
|chunk                              |ner_label|
+-----------------------------------+---------+
|Muryar Amurka                      |ORG      |
|Ibrahim Abdul'aziz                 |PER      |
|Najeriya                           |LOC      |
|Kungiyar Ma'aikatan Jami'o'i       |ORG      |
|Muaryar Amurka                     |ORG      |
|Mohammed Jaji                      |PER      |
|Majalisar Dinkin Duniya            |ORG      |
|Aliko Dangote                      |PER      |
|bankin raya Af


[31m*************************  LANGUAGE_TEXT :  text_list_igbo   ***********************[0m

+-----------------------------+---------+
|chunk                        |ner_label|
+-----------------------------+---------+
|Naịjirịa                     |LOC      |
|Ike Ekweremadu               |PER      |
|Sịnatị                       |PER      |
|otu nkeji                    |DATE     |
|Naịjirịa                     |LOC      |
|Buhari                       |PER      |
|2015                         |DATE     |
|National Bureau of Statistics|ORG      |
|NBS                          |ORG      |
|afọ                          |DATE     |
|2016                         |DATE     |
|Google                       |ORG      |
|Africa                       |LOC      |
|Twitter                      |ORG      |
|Taa                          |DATE     |
|Stephen Keshi                |PER      |
|Keshi                        |PER      |
|Super Eagles                 |ORG      |
|2011                 


[31m*************************  LANGUAGE_TEXT :  text_list_kinyarwanda   ***********************[0m

+-------------------------------------+---------+
|chunk                                |ner_label|
+-------------------------------------+---------+
|Rwanda                               |LOC      |
|Nicola Bellomo                       |PER      |
|u Rwanda                             |LOC      |
|kuwa 9 Mata                          |DATE     |
|Afurika yo-munsi y’Ubutayu bwa Sahara|LOC      |
|Twitter                              |ORG      |
|uyu kuwa Kane                        |DATE     |
|Mateke                               |PER      |
|Rwanda                               |LOC      |
|Mateke                               |PER      |
|Ruth Nankabirwa                      |PER      |
|uyu kuwa Gatatu                      |DATE     |
|Nteko Ishinga Amategeko              |ORG      |
|Uganda                               |LOC      |
|Rwanda                               |LOC     


[31m*************************  LANGUAGE_TEXT :  text_list_luganda   ***********************[0m

+-----------------------------------+---------+
|chunk                              |ner_label|
+-----------------------------------+---------+
|Phillip Wokorach                   |PER      |
|Justin Kimono                      |PER      |
|Adrian Kisito                      |PER      |
|Uganda                             |LOC      |
|omwaka oguwedde                    |DATE     |
|Zimbabwe                           |LOC      |
|David Mubiru                       |PER      |
|Uganda                             |LOC      |
|November 2016                      |DATE     |
|ekyemyaka ena nemyaka emirala ebiri|DATE     |
|Rakai                              |LOC      |
|Patience Baganzi                   |PER      |
|poliisi ye Katwe                   |ORG      |
|Kampala                            |LOC      |
|Mawokota                           |LOC      |
|Mawokota                           |


[31m*************************  LANGUAGE_TEXT :  text_list_Nigerian   ***********************[0m

+-----------------------------------------+---------+
|chunk                                    |ner_label|
+-----------------------------------------+---------+
|higni 20                                 |DATE     |
|Kevin Omondi                             |PER      |
|Shopie Anyango                           |PER      |
|higni 23                                 |DATE     |
|China                                    |LOC      |
|Ringa                                    |LOC      |
|jahigni 15                               |DATE     |
|Joseph Karanja                           |PER      |
|Noel Adhiambo                            |PER      |
|jahigni 11                               |DATE     |
|skul ma Kosele Community Christian Center|ORG      |
|od bura ma Kasipul                       |ORG      |
|dwee mokalo                              |DATE     |
|State House                        


[31m*************************  LANGUAGE_TEXT :  text_list_Pidgin   ***********************[0m

+-----------------------+---------+
|chunk                  |ner_label|
+-----------------------+---------+
|DsTV                   |ORG      |
|2019 / 2020            |DATE     |
|StarTimes              |ORG      |
|Sub - Saharan Africa   |LOC      |
|Bayern                 |ORG      |
|Karl - Heinz Rummenigge|PER      |
|weekend                |DATE     |
|Saturday               |DATE     |
|Spain                  |LOC      |
|Lionel Messi           |PER      |
|June 12                |DATE     |
|LA Lakers              |ORG      |
|Kobe Bryant            |PER      |
|Gianna                 |PER      |
|city of Calabasa       |LOC      |
|California             |LOC      |
|Sunday 26 January      |DATE     |
|Ighalo                 |PER      |
|Chinese Super League   |ORG      |
|2017                   |DATE     |
+-----------------------+---------+
only showing top 20 rows




[31m*************************  LANGUAGE_TEXT :  text_list_Swahilu   ***********************[0m

+-----------------------+---------+
|chunk                  |ner_label|
+-----------------------+---------+
|ADF                    |ORG      |
|Alhamisi               |DATE     |
|Mbau kaskazini         |LOC      |
|Beni                   |LOC      |
|Allied Democratic Force|ORG      |
|ADF                    |ORG      |
|November               |DATE     |
|Congo                  |LOC      |
|ADF                    |ORG      |
|Leon Richard Kasonga   |PER      |
|Jumatano               |DATE     |
|White House            |ORG      |
|Jumanne                |DATE     |
|Nancy Pelosi           |PER      |
|Marekani               |LOC      |
|Trump                  |PER      |
|Jumatano               |DATE     |
|Umoja wa Mataifa       |ORG      |
|Saudi Arabia           |LOC      |
|Bezos                  |PER      |
+-----------------------+---------+
only showing top 20 rows




[31m*************************  LANGUAGE_TEXT :  text_list_Wolof   ***********************[0m

+------------------+---------+
|chunk             |ner_label|
+------------------+---------+
|Farãs             |LOC      |
|Daniel Cohn       |PER      |
|Bendit            |PER      |
|atum 1969         |DATE     |
|Usmaan Sonkoo     |PER      |
|Cees              |LOC      |
|atum 1974         |DATE     |
|Isaa Sàll         |PER      |
|Fatig             |LOC      |
|atiy 1990         |DATE     |
|IRÃ NDAW          |ORG      |
|Ndaakaaru         |LOC      |
|Sentv             |ORG      |
|ati 60            |DATE     |
|Groupe de Grenoble|ORG      |
|Asan Silla        |PER      |
|Masàmba Sare      |PER      |
|Saaliyu Kànji     |PER      |
|Ablaay Wàdd       |PER      |
|Senegaal          |LOC      |
+------------------+---------+




[31m*************************  LANGUAGE_TEXT :  text_list_Yorùbá   ***********************[0m

+-----------------------------+---------+
|chunk                        |ner_label|
+-----------------------------+---------+
|Ohùn Àgbáyé                  |ORG      |
|Luis Carlos                  |PER      |
|Venezuela                    |LOC      |
|Mohammed Sani Musa           |PER      |
|Activate Technologies Limited|ORG      |
|Ìdìbò                        |ORG      |
|ọdún - un 2019               |DATE     |
|All Progressives Congress    |ORG      |
|APC                          |ORG      |
|Aṣojú Ìlà - Oòrùn Niger      |LOC      |
|Premium Times                |ORG      |
|Ishaku Elisha Abbo           |PER      |
|People’s Democratic Party    |ORG      |
|PDP                          |ORG      |
|Àríwá Adamawa                |PER      |
|Adamawa                      |LOC      |
|Nàìjíríà                     |LOC      |
|2019                         |DATE     |
|Abbo               