# Using stanza for Named Entity Recognition (continued)

## Installation

Run the code cell below to install stanza:

In [None]:
!pip install stanza

Collecting stanza
  Downloading stanza-1.10.1-py3-none-any.whl.metadata (13 kB)
Collecting emoji (from stanza)
  Downloading emoji-2.14.1-py3-none-any.whl.metadata (5.7 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.3.0->stanza)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.3.0->stanza)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.3.0->stanza)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.3.0->stanza)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.3.0->stanza)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata 

## Import library and download language model

After installing it, we import stanza into our notebook.

In [None]:
import stanza

## Creating the pipeline

Download the English language model and build the pipeline (we specify that it should only tokenize the text, separate multiword tokens and perform Named Entity Recognition):


In [None]:
# Download the language model:
stanza.download("en")

# Create the pipeline, specifying the language:
nlp = stanza.Pipeline(lang="en", processors='tokenize,mwt,ner')

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.10.0.json:   0%|  …

INFO:stanza:Downloaded file to /root/stanza_resources/resources.json
INFO:stanza:Downloading default packages for language: en (English) ...


Downloading https://huggingface.co/stanfordnlp/stanza-en/resolve/v1.10.0/models/default.zip:   0%|          | …

INFO:stanza:Downloaded file to /root/stanza_resources/en/default.zip
INFO:stanza:Finished downloading models and saved to /root/stanza_resources
INFO:stanza:Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES


Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.10.0.json:   0%|  …

INFO:stanza:Downloaded file to /root/stanza_resources/resources.json
INFO:stanza:Loading these models for language: en (English):
| Processor | Package                   |
-----------------------------------------
| tokenize  | combined                  |
| mwt       | combined                  |
| ner       | ontonotes-ww-multi_charlm |

INFO:stanza:Using device: cuda
INFO:stanza:Loading: tokenize
INFO:stanza:Loading: mwt
INFO:stanza:Loading: ner
INFO:stanza:Done loading processors!


## Named entities

Let's take the text of one of our Gaza articles, and see how well Stanza manages to find the Named Entities in it.

First, run the code cell below to load the article into memory:

In [None]:
article = """Displacement, death, hunger as Israel’s war on Gaza enters third month

-----

Fighting has escalated in Gaza’s second-largest city of Khan Younis as Israeli air strikes rain down throughout the enclave, forcing Palestinians to flee to increasingly crammed pockets of the territory’s southern edge where there is no promised security, as the war enters its third month.
“We are talking about a carpet bombardment of entire neighbourhoods and residential blocks,” Al Jazeera’s Hani Mahmoud, reporting from Rafah in southern Gaza, said on Thursday, following heavy overnight shelling there.
The Israeli army “ordered with a threatening tone to move to Rafah because it is safe”, he said, but residential homes “were destroyed”.
“[These strikes] are not concentrated in one area of Rafah … multiple locations were targeted, just sending waves of fear and concern that confirm what people have talked about and expressed before – there is literally no safe place in the Gaza Strip, including the areas Israel designated as safe.”
After more than two months of war, starting on October 7, Mahmoud said that “the mood of these more than 60 days has been death, destruction and displacement”.
“We’re talking about more than 60 days of constant movement and running for their lives from one place to another, from the extreme northern part of the Gazan city of Beit Hanoon to the extreme south by Rafah, where many people are being packed and squeezed.”

The United Nations World Food Programme (WFP) said that households in northern Gaza are “experiencing alarming levels of hunger”.
At least 97 percent of households in northern Gaza have “inadequate food consumption”, with nine out of 10 people going one full day and night without food.
In the southern governorates, a third of the households have reported high levels of severe or very severe hunger, with 53 percent experiencing moderate hunger.
“Palestinians lack everything they need to survive,” Mahmoud said.
While pursuing its offensive in the south, Israeli armed forces have attacked several refugee camps, among them the Jabalia camp in the north and the al-Maghazi camp in the centre. The attack in Jabalia killed 22 relatives of Al Jazeera journalist Momin Alshrafi, including his father, mother, three siblings, and children.
According to the Palestinian Red Crescent Society, 60 percent of the wounded require urgent medical treatment abroad, pointing to the collapse of the health sector in Gaza.
“The occupation forces are deliberately arresting and abusing the sick and wounded, including paramedics from our crews, and we are on the cusp of a health and environmental catastrophe in the Strip,” a statement said.



As the death toll mounts amid the humanitarian catastrophe, US Secretary of State Antony Blinken told officials in Israel’s war cabinet last week that the administration of US President Joe Biden believed the war should end in weeks – not months, according to The Wall Street Journal,
Israeli officials, in turn, expressed an interest in a return to normalcy, especially in the interest of economic stability, but did not make any guarantees, the report said.
However, Israeli Prime Minister Benjamin Netanyahu has said Israel could indefinitely occupy part of the Gaza Strip to create a “buffer zone”, a move that would put him on a collision course with regional allies and the United States.
Conflicting reports have also emerged on whether Israeli troops have surrounded the house of Hamas’s leader in Gaza, Yahya Sinwar, in Khan Younis.
Late on Wednesday, Netanyahu said it was “just a matter of time until we get him” and that Israeli soldiers had encircled his house.
Yet, military spokesperson Daniel Hagari later said Sinwar’s home is the entire “Khan Younis area”, giving no indication that a specific location had been surrounded.
Three names top Israel’s most-wanted men, namely Mohammed Deif, the head of Hamas’s military wing, the Qassam Brigades; his second-in-command, Marwan Issa; and Sinwar.
"""

Create a new stanza document by feeding the `article` variable to our `nlp` pipeline object. Then print each entity (let the code cell above the previous one inspire you):

In [None]:
# feed the `article` variable into our `nlp` analyzer to create a stanza document:
doc = nlp(article)

### Place names
Use a loop to print only the named entities that are place names.

If you don't remember how to do that, look back at last week's notebook!

In [None]:
for e in doc.entities:
  if e.type == "GPE":
    print(e)
  elif e.type == "LOC":
   print(e)
  if e.type in ["GPE", "LOC"]:
    print(e)

{
  "text": "Gaza",
  "type": "GPE",
  "start_char": 47,
  "end_char": 51
}
{
  "text": "Gaza",
  "type": "GPE",
  "start_char": 47,
  "end_char": 51
}
{
  "text": "Gaza’s",
  "type": "GPE",
  "start_char": 105,
  "end_char": 111
}
{
  "text": "Gaza’s",
  "type": "GPE",
  "start_char": 105,
  "end_char": 111
}
{
  "text": "Rafah",
  "type": "GPE",
  "start_char": 505,
  "end_char": 510
}
{
  "text": "Rafah",
  "type": "GPE",
  "start_char": 505,
  "end_char": 510
}
{
  "text": "Gaza",
  "type": "GPE",
  "start_char": 523,
  "end_char": 527
}
{
  "text": "Gaza",
  "type": "GPE",
  "start_char": 523,
  "end_char": 527
}
{
  "text": "Rafah",
  "type": "GPE",
  "start_char": 650,
  "end_char": 655
}
{
  "text": "Rafah",
  "type": "GPE",
  "start_char": 650,
  "end_char": 655
}
{
  "text": "Rafah",
  "type": "GPE",
  "start_char": 779,
  "end_char": 784
}
{
  "text": "Rafah",
  "type": "GPE",
  "start_char": 779,
  "end_char": 784
}
{
  "text": "the Gaza Strip",
  "type": "GPE",
  "start_ch

### Counting place names

We can now use a dictionary to count how many times each place is counted in the text, as we did with regular expressions:

In [None]:
# create an empty dictionary
places = {}

# loop through the entities:
for e in doc.entities:
  # add a condition so that only place names are processed:
  if e.type in ["GPE", "LOC"]:
    # add the count to the text:
    places[e.text] = places.get(e.text, 0) +1

print(places)


{'Gaza': 6, 'Gaza’s': 1, 'Rafah': 4, 'the Gaza Strip': 2, 'Israel': 2, 'Beit Hanoon': 1, 'Jabalia': 2, 'Strip': 1, 'US': 2, 'Israel’s': 1, 'the United States': 1, 'Khan Younis': 1}


### Evaluation

To check how well Stanza performed, we can put the tags into the text.

We can use the `start_char` property to insert the tag codes into the text; this way we can easily see which places Stanza missed.

We're going to loop over all the character indexes in the string;
and whenever we reach a character index where an entity starts,
we add the entity's tag to the text.

In [None]:
# first, we build a dictionary
# (keys: the the offset of the start character of the entity, values: entity type)
tag_start_chars = {}
for e in doc.entities:
  tag_start_chars[e.start_char] = e.type

# then we create a new variable that will contain the tagged text:
tagged_text = ""

# finally, we loop through the text, character by character
for i in range(len(doc.text)):
  # if an entity starts on this location: put its tag in the tagged_text
  if i in tag_start_chars:
    tagged_text += tag_start_chars[i] + " "
  # in any case, put the character itself into the tagged_text
  tagged_text += doc.text[i]

print(tagged_text)


Displacement, death, hunger as NORP Israel’s war on GPE Gaza enters DATE third month

-----

Fighting has escalated in GPE Gaza’s ORDINAL second-largest city of PERSON Khan Younis as NORP Israeli air strikes rain down throughout the enclave, forcing NORP Palestinians to flee to increasingly crammed pockets of the territory’s southern edge where there is no promised security, as the war enters DATE its third month.
“We are talking about a carpet bombardment of entire neighbourhoods and residential blocks,” PERSON Al Jazeera’s Hani Mahmoud, reporting from GPE Rafah in southern GPE Gaza, said on DATE Thursday, following heavy overnight shelling there.
The NORP Israeli army “ordered with a threatening tone to move to GPE Rafah because it is safe”, he said, but residential homes “were destroyed”.
“[These strikes] are not concentrated in CARDINAL one area of GPE Rafah … multiple locations were targeted, just sending waves of fear and concern that confirm what people have talked about and exp

We can improve the readability by adding xml-style opening and closing tags (e.g., `<GPE>Rafah</GPE>`) instead of only a tag at the beginning of the entity. Adapt the code below so that it adds xml-style start and end tags:

In [None]:
# first, we build another dictionary that holds the end positions of each entity
# (keys: the offset of the end character, values: entity type)
tag_end_chars = {}
for e in doc.entities:
  tag_end_chars[e.end_char] = e.type

# now we repeat the same procedure as above, but add a tag at the end of each entity as well:
tagged_text = ""

# finally, we loop through the text, character by character
for i in range(len(doc.text)):
  # if an entity ends on this location: put its (end) tag in the tagged_text
  if i in tag_end_chars:
    tagged_text += "</" + tag_end_chars[i] + ">"
  # if an entity starts on this location: put its (start) tag in the tagged_text
  if i in tag_start_chars:
    tagged_text += "<" + tag_start_chars[i] + ">"
  # in any case, put the character itself into the tagged_text
  tagged_text += doc.text[i]

print(tagged_text)

Displacement, death, hunger as <NORP>Israel’s</NORP> war on <GPE>Gaza</GPE> enters <DATE>third month</DATE>

-----

Fighting has escalated in <GPE>Gaza’s</GPE> <ORDINAL>second</ORDINAL>-largest city of <PERSON>Khan Younis</PERSON> as <NORP>Israeli</NORP> air strikes rain down throughout the enclave, forcing <NORP>Palestinians</NORP> to flee to increasingly crammed pockets of the territory’s southern edge where there is no promised security, as the war enters <DATE>its third month</DATE>.
“We are talking about a carpet bombardment of entire neighbourhoods and residential blocks,” <PERSON>Al Jazeera’s Hani Mahmoud</PERSON>, reporting from <GPE>Rafah</GPE> in southern <GPE>Gaza</GPE>, said on <DATE>Thursday</DATE>, following heavy overnight shelling there.
The <NORP>Israeli</NORP> army “ordered with a threatening tone to move to <GPE>Rafah</GPE> because it is safe”, he said, but residential homes “were destroyed”.
“[These strikes] are not concentrated in <CARDINAL>one</CARDINAL> area of <

Take a look at the tagged text and check whether stanza made mistakes (focus on the place names). Did it not tag any place names? Did it tag some place names as a different entity?

Write down the mistakes you see here:

Khan younis => should be GPE

We could come up with ways to fix errors like these.

One option would be to create a dictionary of known errors,
so that when we loop through the entities, we can fix them:

In [None]:
# manually create a dictionary of known errors
# (keys: entity name, values: corrected entity type)
known_errors = {
    "Khan Younis": "GPE",
    "the al-Maghazi": "GPE",
}

for e in doc.entities:
  # fix the entity type if the entity's name is in the known errors list:
  if e.text in known_errors:
    e.type = known_errors[e.text]
  # then print the relevant entities:
  if e.type in ["GPE", "LOC"]:
    print(e)
    print("------")

{
  "text": "Gaza",
  "type": "GPE",
  "start_char": 47,
  "end_char": 51
}
------
{
  "text": "Gaza’s",
  "type": "GPE",
  "start_char": 105,
  "end_char": 111
}
------
{
  "text": "Khan Younis",
  "type": "GPE",
  "start_char": 135,
  "end_char": 146
}
------
{
  "text": "Rafah",
  "type": "GPE",
  "start_char": 505,
  "end_char": 510
}
------
{
  "text": "Gaza",
  "type": "GPE",
  "start_char": 523,
  "end_char": 527
}
------
{
  "text": "Rafah",
  "type": "GPE",
  "start_char": 650,
  "end_char": 655
}
------
{
  "text": "Rafah",
  "type": "GPE",
  "start_char": 779,
  "end_char": 784
}
------
{
  "text": "the Gaza Strip",
  "type": "GPE",
  "start_char": 962,
  "end_char": 976
}
------
{
  "text": "Israel",
  "type": "GPE",
  "start_char": 998,
  "end_char": 1004
}
------
{
  "text": "Beit Hanoon",
  "type": "GPE",
  "start_char": 1353,
  "end_char": 1364
}
------
{
  "text": "Rafah",
  "type": "GPE",
  "start_char": 1389,
  "end_char": 1394
}
------
{
  "text": "Gaza",
  "type": 

### multiple files

Since we can do this in one file, we can also do this for a large number of files!

Let's download our FASDH25 git repository here. Because we don't use Python to clone a git repository, we add an exclamation mark before the command `git` in Colab (as we did with `pip`). Complete the command below and run it:



In [None]:
# clone our FASDH25 folder here:
!git clone https://github.com/OpenITI/FASDH25.git

Cloning into 'FASDH25'...
remote: Enumerating objects: 6719, done.[K
remote: Counting objects: 100% (357/357), done.[K
remote: Compressing objects: 100% (239/239), done.[K
remote: Total 6719 (delta 259), reused 204 (delta 117), pack-reused 6362 (from 2)[K
Receiving objects: 100% (6719/6719), 25.97 MiB | 15.74 MiB/s, done.
Resolving deltas: 100% (4455/4455), done.


We can now loop through the articles in the folder as we did when we were using regex to find filenames:

In [None]:
import os

# create an empty dictionary that will contain our places with their frequencies:
places = {}
# loop through all the files in the folder:
folder = "/content/FASDH25/python_exercises/lesson_11.1/aljazeera_articles"
for filename in os.listdir(folder):
  # create a path to the file:
  path = f"{folder}/{filename}"
  # open and read the file:
  with open(path, encoding="utf-8") as file:
    text = file.read()
    # use the nlp pipeline to analyse the text:
    doc = nlp(text)
    # select only the entities that are place names:
    for e in doc.entities:
      if e.type in ["GPE", "LOC"]:
        # add 1 to the count of the place in our dictionary
        # (and/or add the place to the dictionary if it was not there yet):
        places[e.text] = places.get(e.text, 0) +1
print(places)



{'Israel': 27, 'Gaza': 48, 'the Gaza Strip': 3, 'Maghazi': 8, 'Gaza Strip': 1, 'Bureij': 4, 'Rafah': 11, 'Bethlehem': 2, 'West Bank': 2, 'Southern Gaza': 1, 'East Jerusalem': 2, 'Beit Lahia': 1, 'Khan Younis': 4, 'Egypt': 7, 'Khan Younis city': 1, 'Hamad': 1, 'Gaza’s Jabalia': 1, 'Nuseirat': 4, 'US': 4, 'the United States': 1, 'Ashkelon': 1, 'Sderot': 1, 'Beersheba': 1, 'the West Bank': 1, 'Qatar': 3, 'Dubai': 1, 'Central Gaza Strip': 1, 'Gaza’s': 2, 'the Near East': 1, 'Jabalia': 3, 'Jenin': 1, 'Dheisheh': 1, 'Tall az-Zaatar': 1, 'al-Shifa': 1, 'al-Mawasi': 1, 'Twam': 1, 'Gaza City': 1, 'Tal al-Hawa': 1, 'Sinai': 2}


### Storing data in a tsv file

We can now store the counts in a tsv file, so we can reuse it in a different script.

Let's create a tsv file with two columns: "name" and "frequency".
We'll create the tsv file in two steps:

1. we create the header: that is, the column names, separated by tabs
2. we loop through all the place names, and we create a new row in the table for each place. Each row will contain the place name and its frequency, separated by a tab. Each row will have to start on a new line, so we'll also have to add a newline character \n to the row; should we add it at the beginning or end of the line, or both?

Fill in the blanks:

In [None]:
filename = "ner_counts.tsv"
# open the file in writing mode and with unicode UTF-8 encoding:
with open(filename, mode= 'w', encoding= 'utf-8') as file:
  # create a header of the tsv files, which consists of the column names separated by a tab:
  header = "name\tfrequency\n"
  # write the header to the file:
  file.write(header)
  # Now, loop through the places dictionary and create a new row for each item in the dictionary
  for name, frequency in places.items():
    row = f"{name}\t{frequency}\n"
    # finally, write the row to the file:
    file.write(row)

The file will now be stored in our colab's session environment. You can see it by clicking the folder icon in the left-hand tool bar in colab. Double-click it to view it in colab. Right-click it and choose "Download" to download the file.

To access it in your script, use the path `/content/ner_counts.tsv`

In [None]:
with open("/content/ner_counts.tsv", encoding="utf-8") as file:
  print(file.read())

name	frequency
Israel	27
Gaza	48
the Gaza Strip	3
Maghazi	8
Gaza Strip	1
Bureij	4
Rafah	11
Bethlehem	2
West Bank	2
Southern Gaza	1
East Jerusalem	2
Beit Lahia	1
Khan Younis	4
Egypt	7
Khan Younis city	1
Hamad	1
Gaza’s Jabalia	1
Nuseirat	4
US	4
the United States	1
Ashkelon	1
Sderot	1
Beersheba	1
the West Bank	1
Qatar	3
Dubai	1
Central Gaza Strip	1
Gaza’s	2
the Near East	1
Jabalia	3
Jenin	1
Dheisheh	1
Tall az-Zaatar	1
al-Shifa	1
al-Mawasi	1
Twam	1
Gaza City	1
Tal al-Hawa	1
Sinai	2



# Geocoding

Geocoding is the process of finding coordinates for a place.

The process uses APIs, Application Programming Interfaces,
which are internet services that are designed not for human reading
but for being called by applications.

There are many APIs that provide geocoding services. They typically have a database of place names and their coordinates. If you send a geocoding API a place name, it will return its coordinates (and perhaps some other data). Many of them are not free. In our case, we'll use the free GeoNames API to find our place names.

First, try it out by pasting the following URL in your browser (make sure to replace `<your_user_name>` with your geonames user name:

`http://api.geonames.org/searchJSON?q=Gaza&maxRows=5&username=<your_user_name>`

Paste the response here:

{
  "totalResultsCount": 5276,
  "geonames": [
    {
      "adminCode1": "GZ",
      "lng": "34.46672",
      "geonameId": 281133,
      "toponymName": "Gaza",
      "countryId": "6254930",
      "fcl": "P",
      "population": 410000,
      "countryCode": "PS",
      "name": "Gaza",
      "fclName": "city, village,...",
      "adminCodes1": {

      },
      "countryName": "Palestine",
      "fcodeName": "seat of a first-order administrative division",
      "adminName1": "Gaza Strip",
      "lat": "31.50161",
      "fcode": "PPLA"
    },
    {
      "adminCode1": "GZ",
      "lng": "34.48347",
      "geonameId": 281129,
      "toponymName": "Jabālyā",
      "countryId": "6254930",
      "fcl": "P",
      "population": 168568,
      "countryCode": "PS",
      "name": "Jabalia",
      "fclName": "city, village,...",
      "adminCodes1": {

      },
      "countryName": "Palestine",
      "fcodeName": "populated place",
      "adminName1": "Gaza Strip",
      "lat": "31.5272",
      "fcode": "PPL"
    },
    {
      "adminCode1": "GZ",
      "lng": "34.30627",
      "geonameId": 281124,
      "toponymName": "Khān Yūnis",
      "countryId": "6254930",
      "fcl": "P",
      "population": 173183,
      "countryCode": "PS",
      "name": "Khan Yunis",
      "fclName": "city, village,...",
      "adminCodes1": {

      },
      "countryName": "Palestine",
      "fcodeName": "seat of a second-order administrative division",
      "adminName1": "Gaza Strip",
      "lat": "31.34018",
      "fcode": "PPLA2"
    },
    {
      "adminCode1": "02",
      "lng": "33",
      "geonameId": 1046058,
      "toponymName": "Gaza Province",
      "countryId": "1036973",
      "fcl": "A",
      "population": 1422460,
      "countryCode": "MZ",
      "name": "Gaza Province",
      "fclName": "country, state, region,...",
      "adminCodes1": {
        "ISO3166_2": "G"
      },
      "countryName": "Mozambique",
      "fcodeName": "first-order administrative division",
      "adminName1": "Gaza Province",
      "lat": "-23.5",
      "fcode": "ADM1"
    },
    {
      "adminCode1": "GZ",
      "lng": "34.24357",
      "geonameId": 281102,
      "toponymName": "Rafaḩ",
      "countryId": "6254930",
      "fcl": "P",
      "population": 126305,
      "countryCode": "PS",
      "name": "Rafah",
      "fclName": "city, village,...",
      "adminCodes1": {

      },
      "countryName": "Palestine",
      "fcodeName": "seat of a second-order administrative division",
      "adminName1": "Gaza Strip",
      "lat": "31.29722",
      "fcode": "PPLA2"
    }
  ]
}



I have created a function, `get_coordinates` that will take your a place name and your Geonames user name as an argument and return the coordinates. Please fill in your user name and run the code cell to make the function available:

In [None]:
import requests
import time

geonames_username = "_saharmubeen_"

def get_coordinates(place, username=geonames_username, fuzzy=0, timeout=1):
  """This function gets a single set of coordinates from the geonames API.

  Args:
    place (str): the place name
    username (str): your geonames user name
    fuzzy (int): 0 = exact matching, 1 = fuzzy matching (allow similar but not exact matches)
    timeout (int): number of seconds to wait before a call to the geonames API
      (to avoid being blocked for overloading the server)

  Returns:
    dictionary: keys: latitude, longitude
  """
  # wait a short while, so that we don't overload the server:
  time.sleep(timeout)
  # make the API call:
  url = "http://api.geonames.org/searchJSON?"
  params = {"q": place, "username": username, "fuzzy": fuzzy, "maxRows": 1, "isNameRequired": True}
  response = requests.get(url, params=params)
  # convert the response into a dictionary:
  results = response.json()
  print(results)
  # get the first result:
  try:
    result = results["geonames"][0]
    return {"latitude": result["lat"], "longitude": result["lng"]}
  except (IndexError, KeyError):
    print("No results found for your API call", response.request.url)

Now, we can test this function on a list of file names:

In [None]:
test_names = ["Khan Younis", "Khān Yūnis", "United States", "blabla"]

# loop through the list of names and call the function:
for name in test_names:
  # call the function and assign the outcome to a variable called `coordinates`:
  coordinates = get_coordinates(name)
  # print the coordinates, only if they were found:
  if coordinates:
    print("=>", coordinates)

{'totalResultsCount': 287, 'geonames': [{'adminCode1': 'GZ', 'lng': '34.30627', 'geonameId': 281124, 'toponymName': 'Khān Yūnis', 'countryId': '6254930', 'fcl': 'P', 'population': 173183, 'countryCode': 'PS', 'name': 'Khan Yunis', 'fclName': 'city, village,...', 'adminCodes1': {}, 'countryName': 'Palestine', 'fcodeName': 'seat of a second-order administrative division', 'adminName1': 'Gaza Strip', 'lat': '31.34018', 'fcode': 'PPLA2'}]}
=> {'latitude': '31.34018', 'longitude': '34.30627'}
{'totalResultsCount': 318, 'geonames': [{'adminCode1': 'GZ', 'lng': '34.30627', 'geonameId': 281124, 'toponymName': 'Khān Yūnis', 'countryId': '6254930', 'fcl': 'P', 'population': 173183, 'countryCode': 'PS', 'name': 'Khan Yunis', 'fclName': 'city, village,...', 'adminCodes1': {}, 'countryName': 'Palestine', 'fcodeName': 'seat of a second-order administrative division', 'adminName1': 'Gaza Strip', 'lat': '31.34018', 'fcode': 'PPLA2'}]}
=> {'latitude': '31.34018', 'longitude': '34.30627'}
{'totalResults

Now, reuse the code above to get the coordinates for the place names from the places we stored in the `ner_counts.tsv` file.

Write a new tsv file, `ner_gazetteer.tsv`, which contains three columns: name, latitude, longitude.

In [None]:
# get the place names from the tsv file
place_names = "ner_counts.tsv"
# get the coordinates for each place
for name in place_names:
      coordinates = get_coordinates(name)
      if coordinates:
        print("=>", coordinates)
# write coordinates to tvs file
filename = "ner_gazetteer.tsv"
with open(filename, mode="w", encoding="utf-8") as file:
      header = "name\tlatitude\tlongitude\n"
      file.write(header)

{'totalResultsCount': 0, 'geonames': []}
No results found for your API call http://api.geonames.org/searchJSON?q=n&username=_saharmubeen_&fuzzy=0&maxRows=1&isNameRequired=True
{'totalResultsCount': 12, 'geonames': [{'adminCode1': '27', 'lng': '5.38518', 'geonameId': 3000658, 'toponymName': 'Les Hays', 'countryId': '3017382', 'fcl': 'P', 'population': 263, 'countryCode': 'FR', 'name': 'Les Hays', 'fclName': 'city, village,...', 'adminCodes1': {'ISO3166_2': 'BFC'}, 'countryName': 'France', 'fcodeName': 'populated place', 'adminName1': 'Bourgogne-Franche-Comté', 'lat': '46.90279', 'fcode': 'PPL'}]}
=> {'latitude': '46.90279', 'longitude': '5.38518'}
{'totalResultsCount': 2, 'geonames': [{'adminCode1': 'O2', 'lng': '13.85894', 'geonameId': 6931164, 'toponymName': 'R.', 'countryId': '3190538', 'fcl': 'T', 'population': 0, 'countryCode': 'SI', 'name': 'R.', 'fclName': 'mountain,hill,rock,... ', 'adminCodes1': {'ISO3166_2': '207'}, 'countryName': 'Slovenia', 'fcodeName': 'peak', 'adminName1':