# Using stanza for Named Entity Recognition (continued)

## Installation

Run the code cell below to install stanza:

In [19]:
!pip install stanza



## Import library and download language model

After installing it, we import stanza into our notebook.

In [20]:
import stanza
import os

## Creating the pipeline

Download the English language model and build the pipeline (we specify that it should only tokenize the text, separate multiword tokens and perform Named Entity Recognition):


In [21]:
# Download the language model:
stanza.download("en")

# Create the pipeline, specifying the language:
nlp = stanza.Pipeline(lang="en", processors='tokenize,mwt,ner')

Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.10.0.json:   0%|  …

INFO:stanza:Downloaded file to /root/stanza_resources/resources.json
INFO:stanza:Downloading default packages for language: en (English) ...
INFO:stanza:File exists: /root/stanza_resources/en/default.zip
INFO:stanza:Finished downloading models and saved to /root/stanza_resources
INFO:stanza:Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES


Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.10.0.json:   0%|  …

INFO:stanza:Downloaded file to /root/stanza_resources/resources.json
INFO:stanza:Loading these models for language: en (English):
| Processor | Package                   |
-----------------------------------------
| tokenize  | combined                  |
| mwt       | combined                  |
| ner       | ontonotes-ww-multi_charlm |

INFO:stanza:Using device: cpu
INFO:stanza:Loading: tokenize
INFO:stanza:Loading: mwt
INFO:stanza:Loading: ner
INFO:stanza:Done loading processors!


Create a new stanza document by feeding the `article` variable to our `nlp` pipeline object. Then print each entity (let the code cell above the previous one inspire you):

# Geocoding

Geocoding is the process of finding coordinates for a place.

The process uses APIs, Application Programming Interfaces,
which are internet services that are designed not for human reading
but for being called by applications.

There are many APIs that provide geocoding services. They typically have a database of place names and their coordinates. If you send a geocoding API a place name, it will return its coordinates (and perhaps some other data). Many of them are not free. In our case, we'll use the free GeoNames API to find our place names.

First, try it out by pasting the following URL in your browser (make sure to replace `<your_user_name>` with your geonames user name:

`http://api.geonames.org/searchJSON?q=Gaza&maxRows=5&username=shah_zaib

Paste the response here:


{
  "totalResultsCount": 5276,
  "geonames": [
    {
      "adminCode1": "GZ",
      "lng": "34.46672",
      "geonameId": 281133,
      "toponymName": "Gaza",
      "countryId": "6254930",
      "fcl": "P",
      "population": 410000,
      "countryCode": "PS",
      "name": "Gaza",
      "fclName": "city, village,...",
      "adminCodes1": {

      },
      "countryName": "Palestine",
      "fcodeName": "seat of a first-order administrative division",
      "adminName1": "Gaza Strip",
      "lat": "31.50161",
      "fcode": "PPLA"
    },
    {
      "adminCode1": "GZ",
      "lng": "34.48347",
      "geonameId": 281129,
      "toponymName": "Jabālyā",
      "countryId": "6254930",
      "fcl": "P",
      "population": 168568,
      "countryCode": "PS",
      "name": "Jabalia",
      "fclName": "city, village,...",
      "adminCodes1": {

      },
      "countryName": "Palestine",
      "fcodeName": "populated place",
      "adminName1": "Gaza Strip",
      "lat": "31.5272",
      "fcode": "PPL"
    },
    {
      "adminCode1": "GZ",
      "lng": "34.30627",
      "geonameId": 281124,
      "toponymName": "Khān Yūnis",
      "countryId": "6254930",
      "fcl": "P",
      "population": 173183,
      "countryCode": "PS",
      "name": "Khan Yunis",
      "fclName": "city, village,...",
      "adminCodes1": {

      },
      "countryName": "Palestine",
      "fcodeName": "seat of a second-order administrative division",
      "adminName1": "Gaza Strip",
      "lat": "31.34018",
      "fcode": "PPLA2"
    },
    {
      "adminCode1": "02",
      "lng": "33",
      "geonameId": 1046058,
      "toponymName": "Gaza Province",
      "countryId": "1036973",
      "fcl": "A",
      "population": 1422460,
      "countryCode": "MZ",
      "name": "Gaza Province",
      "fclName": "country, state, region,...",
      "adminCodes1": {
        "ISO3166_2": "G"
      },
      "countryName": "Mozambique",
      "fcodeName": "first-order administrative division",
      "adminName1": "Gaza Province",
      "lat": "-23.5",
      "fcode": "ADM1"
    },
    {
      "adminCode1": "GZ",
      "lng": "34.24357",
      "geonameId": 281102,
      "toponymName": "Rafaḩ",
      "countryId": "6254930",
      "fcl": "P",
      "population": 126305,
      "countryCode": "PS",
      "name": "Rafah",
      "fclName": "city, village,...",
      "adminCodes1": {

      },
      "countryName": "Palestine",
      "fcodeName": "seat of a second-order administrative division",
      "adminName1": "Gaza Strip",
      "lat": "31.29722",
      "fcode": "PPLA2"
    }
  ]
}

I have created a function, `get_coordinates` that will take your a place name and your Geonames user name as an argument and return the coordinates. Please fill in your user name and run the code cell to make the function available:

In [22]:
import requests
import time

geonames_username = "shah_zaib"

def get_coordinates(place, username=geonames_username, fuzzy=0, timeout=1):
  """This function gets a single set of coordinates from the geonames API."""
  time.sleep(timeout)  # avoid overloading the API
  url = "http://api.geonames.org/searchJSON?"
  params = {"q": place, "username": username, "fuzzy": fuzzy, "maxRows": 1, "isNameRequired": True}
  response = requests.get(url, params=params)
  results = response.json()
  try:
    result = results["geonames"][0]
    return {"latitude": result["lat"], "longitude": result["lng"]}
  except (IndexError, KeyError):
    print("No results found for:", place)
    return {"latitude": "NA", "longitude": "NA"}

Now, reuse the code above to get the coordinates for the place names from the places we stored in the `ner_counts.tsv` file.

Write a new tsv file, `ner_gazetteer.tsv`, which contains three columns: name, latitude, longitude.

In [None]:
# Read the place from the ner_counts.tsv file
place = []

with open("ner_counts.tsv", 'r', encoding="utf-8") as file:
    lines = file.readlines()

header = lines[0].strip().split('\t')
place_index = header.index('Place')

for line in lines[1:]:
    columns = line.strip().split('\t')
    if len(columns) > place_index:
        place.append(columns[place_index])

# Geocoding each place and store the results
coordinates_data = []
for place_name in place:
    coordinates = get_coordinates(place_name)
    coordinates_data.append({'Place': place_name, 'Latitude': coordinates['latitude'], 'Longitude': coordinates['longitude']})
    print(f"{place_name}: {coordinates['latitude']}, {coordinates['longitude']}")

# Write the results to a new TSV file
filename = "NER_gazetteer.tsv"
with open(filename, 'w', encoding="utf-8") as file:
    file.write('Place\tLatitude\tLongitude\n')
    for row in coordinates_data:
        file.write(f"{row['Place']}\t{row['Latitude']}\t{row['Longitude']}\n")








Israel: 31.5, 34.75
Gaza: 31.50161, 34.46672
Palestine: 31.92157, 35.20329
the United States: 34.31336, -79.63717
No results found for: Welch’s
Welch’s: NA, NA
US: 49.1, 1.96667
Iraq: 33, 44
United States: 39.76, -98.5
West: 42.77268, -116.9579
the Global South: 40.04866, -90.20842
Qatar: 25.5, 51.25
Gulf: -7.7246, 145.08545
Egypt: 27, 30
East Jerusalem: 31.78336, 35.23388
No results found for: Netanyahu’s
Netanyahu’s: NA, NA
Gaza Strip: 31.50161, 34.46672
the Gaza Strip: -26.25, 28
South Africa: 26.5, -81
Russia: 60, 100
Ukraine: 49, 32
China: 35, 105
South Africa’s: 26.5, -81
Malaysia: 2.5, 112.5
Turkey: 39, 35
Jordan: 31, 36
Bolivia: -17, -65
Maldives: 3.2, 73
Namibia: -22, 17
Pakistan: 30, 70
Columbia: 4, -73.25
Khan Younis: 31.34018, 34.30627
Middle East: 33.13906, 35.85869
The Hague: 52.07667, 4.29861
Bangladesh: 24, 90
Comoros: -12.23333, 44.44553
Djibouti: 11.83333, 42.5
Netherlands: 52.25, 5.75
The United States: 34.31336, -79.63717
The United Kingdom: 57.27694, -133.65278
Mya