# Introduction

Named Entity Recognition (NER) is a natural language processing (NLP) technique used to identify and classify key information (entities) in text into predefined categories such as names of people, organizations, locations, dates, and more. In our project, we apply NER for spatial tagging—extracting spatially relevant terms from policy documents to support geographic analysis and planning.

We selected NER as our primary approach because many existing NER frameworks already provide built-in support for recognizing geographical entities such as cities, countries, and locations. From this base, we can extend the models by adding domain-specific entities relevant to planning and zoning, making NER a natural fit for our spatial text analysis needs.


For a comprehensive overview of NER, see this resource: A Comprehensive Guide to Named Entity Recognition. https://www.turing.com/kb/a-comprehensive-guide-to-named-entity-recognition

We experimented with multiple NLP libraries to implement NER:

# spaCy

## *Built-In Model*

We first tested spaCy’s built-in NER model to evaluate its baseline performance on our documents.

Here are what spaCy’s NER entities offer:

`CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK_OF_ART`

[more detail here](https://spacy.io/models/en)

From spaCy’s recognized entity types, we focused on those relevant to spatial tagging:
* `GPE` (Geo-Political Entity): Cities, countries, regions
* `LOC` (Location): Non-political locations such as mountains or bodies of water
* `ORG` (Organization): Named businesses, agencies, and institutions
* `FAC` (Facility): Physical structures, buildings, and infrastructure
* MISC (Miscellaneous): Other entities that may not fit above categories


### Implementation Steps

1. Import necessary libraries
2. Load extracted policy documents
3. Load spaCy pretrained NER model
4. Process texts to extract entities
5. Save to CSV


#### Step 1: Import Libraries

In [None]:
import pandas as pd
import spacy
import csv

We import the necessary libraries:

* `pandas` for handling tabular data
* `spacy` for performing Named Entity Recognition (NER)
* `csv` for writing output data to a CSV file

#### Step 2: Load Input Data and Pre-trained NER Model

In [None]:
policies_df = pd.read_csv('napa_policies_cleaned.csv') # load in csv of napa gp policies
ner = spacy.load('en_core_web_sm') # load pre-trained NER model

* Load the Napa policy data from a cleaned CSV file into a Pandas DataFrame.

* Load spaCy’s **pre-trained English NER model** (en_core_web_sm) for entity extraction.

#### Step 3: Define NER Extraction Function

In [None]:
def extract_ner(policies):

    doc = ner(policies) # process NER

    # entity dictionary
    entities = {
        "GPE": [], # geo-political
        "LOC": [], # locations
        "ORG": [], # organizations
        "FAC": [], # facilities
        "MISC": []
    }

    # loop thru entities
    for ent in doc.ents:
        if ent.label_ == "GPE":
            entities["GPE"].append(ent.text)
        elif ent.label_ == "LOC":
            entities["LOC"].append(ent.text)
        elif ent.label_ == "ORG":
            entities["ORG"].append(ent.text)
        elif ent.label_ == "FAC":
            entities["FAC"].append(ent.text)

    # return policy & entities
    return entities

This function:

* Takes a policy (string) as input.

* Uses spaCy to extract named entities from the text.

* Filters for entity types we care about:

  * `GPE`: Geopolitical entities (cities, countries, states)
  * `LOC`: Locations (non-political, like "the river")
  * `ORG`: Organizations (agencies, companies)
  * `FAC`: Facilities (buildings, infrastructure)
  * `MISC`: Placeholder if needed for unexpected entities

* Returns a dictionary of extracted entities for each category.

#### Step 4: Apply NER Function to Each Policy

In [None]:
data = policies_df['Policy'].apply(extract_ner) # apply function to each policy

#### Step 5: Save Results to CSV

In [None]:
# extract entities from policies

# Prepare the CSV file name
policies_csv = 'napa_policies_entities.csv'

# Open the file for writing
with open(policies_csv, mode='w', newline='') as file:
    writer = csv.writer(file)

    # Write the header row: Column names for entity types
    header = ['Policy', 'GPE', 'LOC', 'ORG', 'FAC', 'MISC']
    writer.writerow(header)

    # Write each row of data
    for policy, entities in zip(policies_df['Policy'], data):
        row = [
            policy,
            ', '.join(entities['GPE']),  # Join entity lists into a single string
            ', '.join(entities['LOC']),
            ', '.join(entities['ORG']),
            ', '.join(entities['FAC']),
            ', '.join(entities['MISC'])
        ]
        writer.writerow(row)

The output file napa_policies_entities.csv will contain one row per policy, with separate columns showing the extracted named entities under each category.

The built-in spaCy model identified many relevant spatial entities but missed domain-specific spatial terms like zoning classifications, land use types, area measurements, and housing categories, which are critical for our analysis.

## *Custom Model*

To better capture domain-specific spatial terms, we created a custom NER model by extending entity types and manually labeling a dataset that was tailored for us.

So, we decided to add these entities:
* `AGRICULTURE`: Agricultural areas, practices, or terms
* `RESIDENTIAL`: Residential designations, zones, or related terms
* `OPEN_SPACE`: Parks, undeveloped land, or designated open space
* `ZONING`: Zoning classifications or codes
* `LANDUSE`: General land use categories (e.g., agricultural, residential, industrial)
* `FACILITY_STATUS`: Operational or functional status of facilities
* `HOUSING`: Terms related to housing types, development, or density
* `MAP_SOURCE`: References to maps, planning diagrams, or cartographic sources
* `PLANNING_AREA`: Named districts, planning zones, or defined geographic units

### Dataset Preparation

We built a custom labeled dataset using:
* Primary source: We used extracted scorecard policies, many of which already included underlined terms that pointed to potentially relevant spatial references.
* Limitations: While these underlined elements provided a helpful starting point, the markup was inconsistent and not always reliable, requiring further refinement.
* Manual annotation: To address this, we manually labeled additional entities across the dataset to ensure coverage of the new spatial categories. This step significantly improved entity recognition quality and consistency.

The initial version of the dataset focuses on agricultural-related content, allowing us to evaluate model performance in a specific and controlled domain before expanding to broader planning topics.


Here is the dataset as a JSON file:

In [None]:
{
    "training_data": [
      {
        "text": "Policy AG/LU-1: Agriculture and related activities are the primary land uses in Napa County",
        "entities": [
          { "start": 16, "end": 50, "label": "AGRICULTURE" },
          { "start": 77, "end": 91, "label": "GEO" }
        ]
      },
      {
        "text": "Policy AG/LU-3: The County’s planning concepts and zoning standards shall be designed to minimize conflicts arising from encroachment of urban uses into agricultural areas. Land in proximity to existing urbanized areas currently in mixed agricultural and rural residential uses will be treated as buffer areas and further parcelization of these areas will be discouraged.",
        "entities": [
          { "start": 119, "end": 131, "label": "RESIDENTIAL" },
          { "start": 137, "end": 156, "label": "AGRICULTURE" },
          { "start": 173, "end": 204, "label": "RESIDENTIAL" },
          { "start": 218, "end": 242, "label": "AGRICULTURE" },
          { "start": 247, "end": 271, "label": "RESIDENTIAL" }
        ]
      },
      {
        "text": "Policy AG/LU-4: The County will reserve agricultural lands for agricultural use including lands used for grazing and watershed/open space, except for those lands which are shown on the Land Use Map as planned for urban development.",
        "entities": [
          { "start": 33, "end": 52, "label": "AGRICULTURE" },
          { "start": 57, "end": 77, "label": "AGRICULTURE" },
          { "start": 96, "end": 111, "label": "AGRICULTURE" },
          { "start": 116, "end": 138, "label": "OPEN_SPACE" },
          { "start": 199, "end": 217, "label": "RESIDENTIAL" }
        ]
      },
      {
        "text": "Policy AG/LU-7: The County will research, evaluate, and pursue new approaches to ensure ever stronger protections for the County’s finite and irreplaceable agricultural resources. Approaches to be evaluated shall include implementation of a “Super Williamson Act” program, a conservation easement program or other permanent protections, and programs promoting the economic viability of agriculture.",
        "entities": [
          { "start": 118, "end": 178, "label": "AGRICULTURE" }
        ]
      },
      {
        "text": "Policy AG/LU-8: The County’s minimum agricultural parcel sizes shall ensure that agricultural areas can be maintained as economic units.",
        "entities": [
          { "start": 32, "end": 54, "label": "AGRICULTURE" },
          { "start": 81, "end": 99, "label": "AGRICULTURE" }
        ]
      },
      {
        "text": "Policy AG/LU-9: The County shall evaluate discretionary development projects, re-zonings, and public projects to determine their potential for impacts on farmlands mapped by the State Farmland Mapping and Monitoring Program, while recognizing that the state’s farmland terminology and definitions are not always the most relevant to Napa County, and shall avoid converting farmland where feasible. Where conversion of farmlands mapped by the state cannot be avoided, the County shall require long-term preservation of one acre of existing farm land of equal or higher quality for each acre of state-designated farmland that would be converted to nonagricultural uses. This protection may consist of establishment of farmland easements or other similar mechanism, and the farmland to be preserved shall be located within the County and preserved prior to the proposed conversion. The County shall recommend this measure for implementation by the cities and town and LAFCO as part of annexations involving state-designated farmlands.",
        "entities": [
          { "start": 17, "end": 56, "label": "AGRICULTURE" },
          { "start": 83, "end": 112, "label": "AGRICULTURE" },
          { "start": 116, "end": 134, "label": "AGRICULTURE" },
          { "start": 154, "end": 223, "label": "AGRICULTURE" }
        ]
      },
      {
        "text": "Policy AG/LU-10: New wineries and other agricultural processing facilities as well as expansions of existing wineries and facilities in agricultural areas should be designed to convey their permanence and attractiveness.",
        "entities": [
          { "start": 17, "end": 154, "label": "LANDUSE" }
        ]
      },
      {
        "text": "Policy AG/LU-11: Agricultural employee housing shall be permitted in agricultural zoning districts in conformance with state law. Seasonal farm labor housing may be provided in agricultural areas without regard to the location of farm employment in Napa County when the housing is under local public agency ownership or control.",
        "entities": [
          { "start": 69, "end": 88, "label": "ZONING" }
        ]
      },
      {
        "text": "Policy AG/LU-13: The 1990 Winery Definition Ordinance, recognized certain pre-existing wineries and winery uses as well as new wineries. For wineries approved after the effective date of that ordinance, agricultural processing includes tours and tastings by appointment only, retail sales of wine produced by or for the winery partially or totally from Napa County grapes, retail sale of wine-related items, activities for the education and development of consumers and members of the wine trade with respect to wine produced by or at the winery, and limited non-commercial food service. The later activity may include winefood pairings. All tours and tastings, retail sales, marketing activities, and noncommercial food service must be accessory to the principal use of the facility as an agricultural processing facility. Nothing in this policy shall alter the definition of “agriculture” set forth in Policy AG/LU-2",
        "entities": [
          { "start": 141, "end": 201, "label": "FACILITY_STATUS" }
        ]
      },
      {
        "text": "Policy AG/LU-14: The same location, design, and other considerations applied to wineries shall apply to all other food processing businesses or industrial uses located in agricultural areas.",
        "entities": [
          { "start": 104, "end": 189, "label": "LANDUSE" }
        ]
      },
      {
        "text": "Policy AG/LU-17: The County encourages active, sustainable forest management practices, including timely harvesting to preserve existing forests, retaining their health, product, and value. The County also encourages timber plantations for fuel wood and lumber production. (For more policies related to the managed production of resources and forest management practices, please see the Conservation Element.",
        "entities": [
          { "start": 128, "end": 144, "label": "LANDUSE" }
        ]
      },
      {
        "text": "Policy AG/LU-18: Timber production areas in the County shall be considered to be those defined in the most recent adopted mapping available from CAL FIRE unless local areas are defined through a public planning process.",
        "entities": [
          { "start": 102, "end": 148, "label": "MAP_SOURCE" },
          { "start": 161, "end": 184, "label": "PLANNING_AREA" }
        ]
      },
      {
        "text": "Policy AG/LU-20: The following standards shall apply to lands designated as Agriculture, Watershed, and Open Space on the Land Use Map of this General Plan. Intent: To provide areas where the predominant use is agriculturally oriented; where watersheds are protected and enhanced; where reservoirs, floodplain tributaries, geologic hazards, soil conditions, and other constraints make the land relatively unsuitable for urban development; where urban development would adversely impact all such uses; and where the protection of agriculture, watersheds, and floodplain tributaries from fire, pollution, and erosion is essential to the general health, safety, and welfare. General Uses: Agriculture, processing of agricultural products, single-family dwellings. Minimum Parcel Size: 160 acres, except that parcels with a minimum size of 2 acres may be created for the sole purpose of developing farm labor camps by a local government agency authorized to own or operate farm labor camps, so long as the division is accomplished by securing the written consent of a local government agency authorized to own or operate farm labor camps that it will accept a conveyance of the fee interest of the parcel to be created and thereafter conveying the fee interest of such parcel directly to said local government agency, or entering into a long-term lease of such parcels directly with said local government agency, or entering into a long-term lease of such parcels directly with said local government agency. Every lease or deed creating such parcels must contain language ensuring that if the parcel is not used as a farm labor camp within three years of the conveyance or lease being executed or permanently ceases to be used as a farm labor camp by a local government agency authorized to develop farm labor camps, the parcel will automatically revert to, and merge into, the original parent parcel. Maximum Building Intensity: One dwelling per parcel (except as specified in the Housing Element). Nonresidential building intensity is non-applicable. Pursuant to Measure Z (1996), the sale to the public of agricultural produce, fruits, vegetables, and Christmas trees, grown on or off premises, and items related thereto, as well as the recreation and educational uses by children of animals, such as children’s pony rides and petting zoos, and construction of buildings to accommodate such sales and animals shall be permitted on any parcel designated as agricultural produce stand combination district. (See Policy AG/LU-132.)",
        "entities": [
          { "start": 56, "end": 155, "label": "LANDUSE" }
        ]
      },
      {
        "text": "Policy AG/LU-20.5: New public safety facilities shall be located within existing urbanized (i.e. nonagricultural) areas of the County and the County shall require site-specific analysis of new public safety facilities prior to their construction.",
        "entities": [
          { "start": 18, "end": 112, "label": "LANDUSE" }
        ]
      },
      {
        "text": "Policy AG/LU-21: The following standards shall apply to lands designated as Agricultural Resource on the Land Use Map of this General Plan. Intent: To identify areas in the fertile valley and foothill areas of the county in which agriculture is and should continue to be the predominant land use, where uses incompatible with agriculture should be precluded, and where the development of urban type uses would be detrimental to the continuance of agriculture and the maintenance of open space which are economic and aesthetic attributes and assets of the County of Napa. General Uses: Agriculture, processing of agricultural products, single-family dwellings. Minimum Parcel Size: 40 acres, except that parcels with a minimum size of 2 acres may be created for the sole purpose of developing farm labor camps by a local government agency authorized to own or operate farm labor camps, so long as the division is accomplished by securing the written consent of a local government agency authorized to own or operate farm labor camps that it will accept a conveyance of the fee interest of the parcel to be created and thereafter conveying the fee interest of such parcel directly to said local government agency, or entering into a long-term lease of such parcels directly with said local government agency, or entering into a long-term lease of such parcels directly with said local government agency. Every lease or deed creating such parcels must contain language ensuring that if the parcel is not used as a farm labor camp within three years of the conveyance or lease being executed or permanently ceases to be used as a farm labor camp by a local government agency authorized to develop farm labor camps, the parcel will automatically revert to, and merge into, the original parent parcel. Maximum Building Intensity: One dwelling per parcel (except as specified in the Housing Element). Nonresidential building intensity is non-applicable. Pursuant to Measure Z (1996), the sale to the public of agricultural produce, fruits, vegetables, and Christmas trees, grown on or off premises, and items related thereto, as well as the recreation and educational uses by children of animals, such as children’s pony rides and petting zoos, and construction of buildings to accommodate such sales and animals shall be permitted on any parcel designated as agricultural produce stand combination district.",
        "entities": [
          { "start": 56, "end": 137, "label": "LANDUSE" }
        ]
      },
      {
        "text": "Policy AG/LU-22: Urban uses shall be concentrated in the incorporated cities and town and designated urbanized areas of the unincorporated County in order to preserve agriculture and open space, encourage transit-oriented development, conserve energy, and provide for healthy, “walkable” communities.",
        "entities": [
            { "start": 50, "end": 145, "label": "LANDUSE" }
        ]
      },
      {
        "text": "Policy AG/LU-23: Consistent with longstanding practice and “smart growth” principles, the County will enact and enforce regulations that will encourage the concentration of residential growth within the County’s existing cities and town and urbanized areas designated on the Land Use Map",
        "entities": [
            { "start": 192, "end": 287, "label": "LANDUSE" }
        ]
      },
      {
        "text": "Policy AG/LU-24: Commercial uses will be grouped in areas outside of those designated for agricultural uses in the General Plan (subject to exceptions contained in Policies AG/LU-43 through 45 of this General Plan).",
        "entities": [
            { "start": 52, "end": 127, "label": "LANDUSE" }
        ]
      },
      {
        "text": "Policy AG/LU-25: The County opposes the creation of new special districts planned to accommodate new residential developments outside existing urbanized areas, except as specified in the Housing Element or as permitted within the Napa Pipe Mixed Use designation.",
        "entities": [
            { "start": 126, "end": 158, "label": "LANDUSE" },
            { "start": 160, "end": 202, "label": "HOUSING" },
            { "start": 219, "end": 261, "label": "LANDUSE" }
        ]
      },
      {
        "text": "Policy AG/LU-26: The County will discourage proposed urban developments which require urban services outside of existing urbanized areas. However, nothing in this Agricultural Preservation and Land Use Element is intended to preclude the construction of a single-family residence, on an existing, vacant, legal parcel of land in compliance with adopted County ordinances and other applicable regulations, except on designated park land. Pursuant to State law, small child care centers are considered residential uses. Where maximum dwelling unit densities are specified in this General Plan, the population density is determined by multiplying the allowable number of dwelling units times the average persons per household in the unincorporated County as determined by the most recent U.S. Census",
        "entities": [
            { "start": 101, "end": 136, "label": "LANDUSE" }
        ]
      },
      {
        "text": "Policy AG/LU-43: Lands along the west bank of the Napa River south of the City of Napa and specific urban areas within four miles of the high water mark of Lake Berryessa are appropriate areas for marine commercial zoning and development. Action Item AG/LU 43.1: Consider amendments to the Zoning Code to allow additional commercial, residential, and mixed uses in the areas currently zoned for commercial use in the Spanish Flat, Moskowite Corners, and southern Pope Creek areas in order to complement recreation activities at Lake Berryessa.",
        "entities": [
          { "start": 30, "end": 39, "label": "GEO_FEATURE" },
          { "start": 47, "end": 57, "label": "GEO_FEATURE" },
          { "start": 63, "end": 85, "label": "PLACE_SPECIFIC" },
          { "start": 110, "end": 124, "label": "MEASURE" },
          { "start": 132, "end": 149, "label": "MEASURE" },
          { "start": 153, "end": 168, "label": "PLACE_SPECIFIC" },
          { "start": 207, "end": 234, "label": "ZONING" },
          { "start": 286, "end": 297, "label": "ZONING" },
          { "start": 309, "end": 350, "label": "LANDUSE" },
          { "start": 397, "end": 410, "label": "PLACE_SPECIFIC" },
          { "start": 412, "end": 431, "label": "PLACE_SPECIFIC" },
          { "start": 441, "end": 461, "label": "PLACE_SPECIFIC" },
          { "start": 540, "end": 555, "label": "PLACE_SPECIFIC" }
        ]
      }


    ]
  }

As you can see we labelled each policy with a corresponding underlined part of the policy with a specific entity.

### Implementation Steps?

First, take in JSON dataset, and run through to convert to a spaCy file: `training_data.spacy`.

In [None]:
import json
import spacy
from spacy.tokens import DocBin
from tqdm import tqdm
from spacy.util import filter_spans

# Load your JSON data
with open('napagp_training.json', 'r') as f:
    data = json.load(f)

# Prepare the structure for training data
training_data = {'annotations': []}

for example in data['training_data']:
    temp_dict = {}
    temp_dict['text'] = example['text']
    temp_dict['entities'] = []

    for entity in example['entities']:
        start = entity['start']
        end = entity['end']
        label = entity['label'].upper()
        temp_dict['entities'].append((start, end, label))

    training_data['annotations'].append(temp_dict)

# Load pretrained model
nlp = spacy.load("en_core_web_sm")

# Add custom labels to the NER component
if "ner" not in nlp.pipe_names:
    ner = nlp.add_pipe("ner")
else:
    ner = nlp.get_pipe("ner")

custom_labels = [
    'LANDUSE', 'AGRICULTURE', 'GEO', 'RESIDENTIAL', 'OPEN_SPACE',
    'PLANNING_AREA', 'ZONING', 'FACILITY_STATUS', 'MAP_SOURCE',
    'HOUSING', 'DIRECTION', 'MEASURE'
]

for label in custom_labels:
    ner.add_label(label)

# Convert to spaCy DocBin format
doc_bin = DocBin()

for training_example in tqdm(training_data['annotations']):
    text = training_example['text']
    labels = training_example['entities']
    doc = nlp.make_doc(text)
    ents = []

    for start, end, label in labels:
        span = doc.char_span(start, end, label=label, alignment_mode="contract")
        if span is None:
            print(f"Skipping invalid span for text: '{text[start:end]}' (start: {start}, end: {end})")
        else:
            ents.append(span)

    doc.ents = filter_spans(ents)
    doc_bin.add(doc)

# Save the training data
doc_bin.to_disk("training_data.spacy")
print("Training data saved as training_data.spacy")


Second, using command line, train a NER model on this new spaCY file.

In [None]:
!python -m spacy init config config.cfg --lang en --pipeline ner
!python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy

Third, run your model of some of the testing data. Test data is a subset of unseen data to see how well the model does.

In [None]:
import spacy
from spacy import displacy

# Load the trained model
nlp = spacy.load('output/model-best')  # trained model's directory

# List of sample policies to test
policies = [
    "Policy AG/LU-120: The County shall work with the school districts serving students in the County to coordinate the provision of school facilities in conjunction with demographic changes and student populations. The County shall also encourage incorporated jurisdictions to reserve school sites within their boundaries.",
    "Policy AG/LU-121: The County shall coordinate an exchange of information with the school districts regarding school needs and new residential developments in the unincorporated area.",
    "Policy AG/LU-122: The County shall consider school districts’ proposed school sites in relation to: a) General Plan designations. b) Geology and seismic considerations, topography, drainage, soils. c) Location and general utility of land; population distribution. d) Access, transportation facilities, utilities. e) Conflicting or hazardous conditions (e.g., noise, traffic). f) Protection of agricultural lands. The results of the review are to be forwarded to the appropriate school district board within 30 days from the receipt of the referral.",
    "Policy AG/LU-123: The County shall establish general school site location criteria such as: a) New school facilities shall not be located within two miles of an airport unless approved by the State Department of Education. b) School facilities shall, whenever practical, be located in areas designated in the appropriate general plan for urban development. c) Coordinate County plans and ordinances to be supportive of school use and to minimize the need for busing students. d) Ensure that proposals for multi-family housing or multiple-lot subdivisions within the unincorporated area are evaluated to determine their impact on schools and are modified to address potential impacts, including the need for new facilities, if any.",
    "Policy AG/LU-124: New churches or institutions providing religious instruction shall not be located within proximity to an airport, unless they are located in an area where residential uses would be compatible under the applicable Airport Land Use Compatibility Plan. June 23, 2009 Napa County General Plan AG/LU–77",
    "Policy AG/LU-125: New churches or other religious institutions should generally be located within or adjacent to urbanized areas, minimizing the transportation needs of parishioners/members and the potential for loss of agricultural lands. Action Item AG/LU-125.1: Consider amendments to the Zoning Code that would reduce the number of zoning districts in which new churches and religious institutions may be located and provide siting criteria as part of the use permit process. REGIONAL PLANNING ISSUES",
    "Policy AG/LU-126: State law charges LAFCO with planning the orderly development of local government agencies to advantageously provide for the present and future needs of the community while protecting against the inappropriate conversion of agricultural and open space lands. A principal planning responsibility of LAFCO is to determine a sphere of influence for each city and special district under its jurisdiction. State law defines a sphere of influence as 'a plan for the probably physical boundaries and service area of a local agency, as determined by' LAFCO. LAFCO is required to review and update, as necessary, each agency’s sphere of influence every five years, and the County will work collaboratively with LAFCO in its reviews of spheres to encourage orderly, city-centered growth and development in Napa County and the preservation of agricultural land. Policy AG/LU-126.5: The County seeks to engage incorporated jurisdictions and other agencies in collaborative planning efforts, particularly efforts aimed at ensuring adequate infrastructure capacity, vibrant city-centers, sufficient housing and agricultural lands and natural resource protection.",
    "Policy AG/LU-127: The County will coordinate with the cities and town to establish land use policies for unincorporated lands located within their respective spheres of influence and will do likewise for unincorporated lands within any locally-adopted urban growth boundaries.",
    "Policy AG/LU-128: The County recognizes the urban limit line or Rural Urban Limit (RUL) established for the City of Napa (See Figure LU-4), and agrees that unincorporated land located within the RUL will not be further urbanized without annexation to the City. For purposes of this policy only, engaging in uses that are permitted in the applicable zoning district without the issuance of a use permit shall not be considered urbanizing. In all cases, subdividing property shall be deemed urbanizing for purposes of this policy."
]

# Process and visualize each policy
for policy in policies:
    print(f"Visualizing: {policy[:60]}...")  # Show part of the policy text for reference
    doc = nlp(policy)

    # Visualize the named entities
    displacy.render(doc, style='ent', page=True)
    print("-" * 50)  # Separator between policies

Here we will be able to visualize the results of what entities were captured.

# flair

The code below will install flair, the NER package we will be working with. It might take a while to run. This is completely normal.

In [None]:
!pip install transformers
!pip install flair



This code will import the modules we will be working with.

In [None]:
from flair.models import SequenceTagger
from flair.data import Sentence

AttributeError: module 'numpy' has no attribute 'dtypes'

In [None]:
ner_model = SequenceTagger.load('ner')

ImportError: cannot import name 'requires_backends' from 'transformers' (/usr/local/lib/python3.11/dist-packages/transformers/__init__.py)

Assuming we already have a csv file of extracted policies, next we will iterate through each one and identify the spatial terms in each one. This function takes the text of each policy, tokenizes the text into individual

In [None]:
def add_to_csv(filename, nlp_column, new_column_name, output_filename=None):
    dataframe = pd.read_csv(filename)

    results = []

    for _, row in dataframe.iterrows():
        text = row[nlp_column]
        sentence = Sentence(str(text))
        ner_model.predict(sentence)
        entities = sentence.get_spans('ner')
        results.append(entities)

    dataframe[new_column_name] = results

    if output_filename is None:
        output_filename = filename
    dataframe.to_csv(output_filename, index=False)

Next, we will call the function we developed in the previous section. You will first have to upload the spreadsheet you are working with by clicking the file upload button on the left of your screen and then selecting the file from your local drive. Once that upload is done, you will then right click the file and select "Copy Path." This file path will go in place of the one currently being stored in the hazard_plan_csv variable, and the variable should be renamed accordingly.

In [None]:
hazard_plan_csv = "/content/Napa Hazard Plan NER.csv"
file = "napa_hazard_plan_tables_with_flair.csv"
csv_filename = '/content/drive/MyDrive/' + file
add_to_csv(hazard_plan_csv, "Description", "Entities", csv_filename)
print("New column added.")

NameError: name 'ner_model' is not defined

In [None]:
!pip uninstall -y transformers flair pandas numpy

!pip install numpy==1.24.4

!pip install transformers==4.26.1
!pip install flair==0.12.2
!pip install pandas==2.0.3

import pandas as pd
from flair.models import SequenceTagger
from flair.data import Sentence

ner_model = SequenceTagger.load('ner')

def add_to_csv(filename, nlp_column, new_column_name, output_filename=None):
    dataframe = pd.read_csv(filename)

    results = []

    for _, row in dataframe.iterrows():
        text = row[nlp_column]
        sentence = Sentence(str(text))
        ner_model.predict(sentence)
        entities = sentence.get_spans('ner')
        results.append(entities)

    dataframe[new_column_name] = results

    if output_filename is None:
        output_filename = filename
    dataframe.to_csv(output_filename, index=False)


hazard_plan_csv = "/content/Napa Hazard Plan NER.csv"
file = "napa_hazard_plan_tables_with_flair.csv"
csv_filename = '/content/drive/MyDrive/' + file
add_to_csv(hazard_plan_csv, "Description", "Entities", csv_filename)
print("New column added.")

[0mFound existing installation: numpy 1.24.4
Uninstalling numpy-1.24.4:
  Successfully uninstalled numpy-1.24.4
Collecting numpy==1.24.4
  Using cached numpy-1.24.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.6 kB)
Using cached numpy-1.24.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
Installing collected packages: numpy
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
db-dtypes 1.4.3 requires pandas>=1.5.3, which is not installed.
shap 0.47.2 requires pandas, which is not installed.
mlxtend 0.23.4 requires pandas>=0.24.2, which is not installed.
cudf-cu12 25.2.1 requires pandas<2.2.4dev0,>=2.0, which is not installed.
sklearn-pandas 2.2.0 requires pandas>=1.1.4, which is not installed.
statsmodels 0.14.4 requires pandas!=2.1.0,>=1.4, which is not installed.
dask-cudf-cu12 25.2.2 requires pandas<2.2.4dev0,

Collecting transformers==4.26.1
  Using cached transformers-4.26.1-py3-none-any.whl.metadata (100 kB)
Using cached transformers-4.26.1-py3-none-any.whl (6.3 MB)
Installing collected packages: transformers
[31mERROR: Operation cancelled by user[0m[31m
[0mException ignored in: <function _get_module_lock.<locals>.cb at 0x7b51ab8cac00>
Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 207, in cb
KeyboardInterrupt: 
Traceback (most recent call last):
  File "/usr/local/bin/pip3", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pip/_internal/cli/main.py", line 78, in main
    command = create_command(cmd_name, isolated=("--isolated" in cmd_args))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pip/_internal/commands/__init__.py", line 114, in create_command
    module = importlib.import_module(module_path)
             ^^^

ModuleNotFoundError: No module named 'flair'

# Integration & Future Work

We have started working with the Gemini AI model, which has shown promising results. It has been able to fully identify nuanced or context-dependent spatial terms. The model performs well out of the box, and we’re currently exploring how its capabilities can complement our custom NER system.


** Reference how to create and run the Gemini AI model from the LLM Documentation.

Replace the prompt with a new prompt specifying to extract specific types of spatial terms:

In [None]:
prompt = f"""
    Extract the following types of spatial (mappable) terms from the policy text:
    1. Place names (either general, like "downtown", or specific, like "Angwin").
    2. Land use or zoning classifications.
    3. Geographical features (e.g., creeks, mountains, rivers).
    4. Structures, including facilities, buildings, and infrastructure.
    5. Mappable units of measure (e.g., distances, areas such as acres, hectares, square miles, parcels, buffers).
    6. Geospatial terms (e.g., raster, point, polygon, line, or file types such as GeoJSON, .shp, etc.).
    The response should be a comma-separated list of these terms from the following policy text:\n\n{policy_text}
    """

### Future Steps

**Model Ensembling:** Combine the strengths of our custom NER and Gemini through ensembling techniques. This hybrid approach aims to boost accuracy and handle more complex or ambiguous spatial terms.

**Expanded Entity Set & Dataset:** Continue expanding the entity taxonomy to cover additional planning domains, such as fire, land use, safety element, etc. At the same time, grow and diversify the labeled policy dataset by including the rest of the policies from other plans (maybe?).

**Text-to-GIS Mapping:** Develop a pipeline to connect extracted spatial entities directly to geographic features. This involves:
* Mapping NER outputs to GIS layers (e.g., zoning districts, land use boundaries)
* Implementing geocoding for named places and address-based entities
* Exploring spatial ontologies or standardized vocabularies to support reliable mapping.

Ultimately, the goal is to enable workflows that move from policy text → structured entities → mapped outputs, unlocking powerful new capabilities for planning and analysis.