## Collect Metadata from Michigan PLSS Plats Survey Maps

This Jupyter Notebook is intended to collect useful metadata from separated `OPEX` files of Michigan PLSS Plats. The contents for these `OPEX` files are formatted as `XML`. Thus, in this documentation, we mainly use the **BeautifulSoup** python library to help pull data out of it.

### file structure
- **harvest.ipynb**
- **RG_87-155_GLO_Survey_Maps_1816-1860** contains different subdirectories for all Michigan townships. There might be one or more than one `.pax.zip.opex` file indicating the plat of each township.
- **bbox of MI townships.csv** offers the bounding box for each Michigan township.
- **metadata.csv** is the output CSV file with all plats' metadata

> Original created on May 27 2021
> @author: Gene/Ziying Cheng @Ziiiiing


In [9]:
import os
from bs4 import BeautifulSoup
import time
import csv
import geocoder

In [10]:
subdirs =  os.listdir('RG_87-155_GLO_Survey_Maps_1816-1860') 

fieldnames = ['Title', 'Alternative Title', 'Description', 'Language', 'Creator', 'Publisher',
              'Resource Type', 'Date Issued', 'Temporal Coverage', 'Date Range',
              'Spatial Coverage', 'Bounding Box', 'Information', 'Download', 'Image', 
              'Identifier', 'ID', 'Access Rights', 'Provider', 'Code', 'Member Of', 'Status', 
              'Accrual Method', 'Date Accessioned', 'Rights', 'Resource Class', 'Format',
              'Suppressed', 'Child Record'] 
              

In [11]:
bboxDict = {}
with open('bbox of MI townships.csv') as fr:
    reader = csv.reader(fr)
    fields = next(reader)
    for row in reader:
        bboxDict[row[11]] = row[12]

In [22]:
def collectMetadata(data):
    metadata = []
    soup = BeautifulSoup(data,'xml')
    
    alternativeTitle = soup.find('title').get_text()
    description = soup.find('abstract').get_text()
    language = soup.find('languageTerm').get_text()
    if language == 'English':
        language = 'eng'

    creator = ''
    creatorElem = soup.find_all(displayLabel="Surveyor")
    for elem in creatorElem:
        if creator:
            creator += '|' + elem.get_text()
        else:
            creator = elem.get_text()

    publisher = soup.find(displayLabel="Agency").get_text()
        
    try:
        dateIssued = soup.find('dateCreated').get_text()
    except:
        dateIssued = ''        
        
    township = soup.find('township').get_text()
    county = soup.find('county').get_text()
    spatialCoverage = '{0}, {1}, Michigan|{1}, Michigan|Michigan'.format(township, county)
        
    TWNSHPLAB =  subdir.split('_')[0].strip('0') + ' ' + subdir.split('_')[1].strip('0')
    try:
        bbox = bboxDict[TWNSHPLAB]
    except:
        bbox = ''
    
    identifier = soup.find('Identifier').get_text()
    ID = soup.find('SourceID').get_text()
    information = 'https://michiganology.org/uncategorized/IO_' + ID
    download = 'https://michiganology.org/download/file/IO_' + ID
    image = 'https://michiganology.org/download/thumbnail/IO_' + ID
    
    rights = soup.find(type="restriction on access").get_text() + ' ' + soup.find(type="use").get_text()
    
    title = ''
    temporalCoverage = ''
    dateRange = ''
    resourceType = 'Cadastral maps'
    accessRights = 'Public'
    provider = 'Michigan State University'
    code = '06a-03'
    memberOf = '06a-03'
    status = 'Active'
    accrualMethod = 'OPEX'
    dateAccessioned = time.strftime('%Y-%m-%d')
    resourceClass = 'Maps'
    format = 'JPEG2000'
    suppressed = 'FALSE'
    childRecord = 'FALSE'
    
    
    metadata = [title, alternativeTitle, description, language, creator, publisher, resourceType,
                dateIssued, temporalCoverage, dateRange,spatialCoverage, bbox, information,
                download, image, identifier, ID, accessRights, provider, code, memberOf, status,
                accrualMethod, dateAccessioned, rights, resourceClass, format, suppressed, childRecord]

    All_Metadata.append(metadata)

In [23]:
All_Metadata = []

for subdir in subdirs:
    # skip hidden files
    if '.' in subdir:
        continue
    
    print('SEARCHING {}'.format(subdir))
    dirpath = 'RG_87-155_GLO_Survey_Maps_1816-1860/{}'.format(subdir)
    filepaths = ['{}/{}'.format(dirpath, f) for f in os.listdir(dirpath) if f.endswith('.pax.zip.opex')]
    for filepath in filepaths:
        print('>>> reading {}'.format(filepath.split('/')[-1]))
        with open(filepath) as fr:
            contents = fr.read()
            collectMetadata(contents)

SEARCHING 45N_26W_-_Survey_Map_of_Forsyth_Township_Marquette_County
>>> reading 2208.pax.zip.opex
SEARCHING 34N_27W_-_Survey_Map_of_Mellen_Township_Menominee_County
>>> reading 2746.pax.zip.opex
SEARCHING 08N_08E_-_Survey_Map_or_Richfield_Township_Genesee_County
>>> reading 1836.pax.zip.opex
SEARCHING 47N_42W_-_Survey_Map_of_Mareninsco_Township_Gogebic_County
>>> reading 2456.pax.zip.opex
SEARCHING 43N_32W_-_Survey_Map_of_Crystal_Falls_Township_Iron_County
>>> reading 2948.pax.zip.opex
>>> reading 2949.pax.zip.opex
SEARCHING 48N_38W_-_Survey_Map_of_Stannard_Township_and_Interior_Township_
>>> reading 2562.pax.zip.opex
SEARCHING 12N_03W_-_Survey_Map_of_Pine_River_Township_Gratiot_County
>>> reading 719.pax.zip.opex
SEARCHING 34N_02E_-_Survey_Map_of_Allis_Township_Presque_Isle_County
>>> reading 1131.pax.zip.opex
SEARCHING 32N_03W_-_Survey_Map_of_Corwith_Township_Otsego_County
>>> reading 1072.pax.zip.opex
SEARCHING 02N_07W_-_Survey_Map_of_Maple_Grove_Township_Barry_County
>>> reading 34

>>> reading 1255.pax.zip.opex
SEARCHING 30N_08W_-_Survey_Map_of_Forest_Home_Township_Antrim_County
>>> reading 845.pax.zip.opex
SEARCHING 04S_06W_-_Survey_Map_of_Tekonsha_Township_Calhoun_County
>>> reading 188.pax.zip.opex
>>> reading 187.pax.zip.opex
SEARCHING 35N_01E_-_Survey_Map_of_Waverly_Township_Cheboygan_County
>>> reading 1108.pax.zip.opex
SEARCHING 47N_43W_-_Survey_Map_of_Mareninsco_Township_Gogebic_County
>>> reading 2448.pax.zip.opex
SEARCHING 43N_33W_-_Survey_Map_of_Crystal_Falls_Township_Iron_County
>>> reading 2952.pax.zip.opex
>>> reading 2951.pax.zip.opex
SEARCHING 08S_08W_-_Survey_Map_of_Nobel_Township_Branch_County
>>> reading 234.pax.zip.opex
SEARCHING 31N_08W_-_Survey_Map_of_Central_Lake_Township_Antrim_County
>>> reading 1077.pax.zip.opex
SEARCHING 31N_08E_-_Survey_Map_of_Alpena_Township_Alpena_County
>>> reading 1147.pax.zip.opex
SEARCHING 34N_25W_-_Survey_Map_of_Ingallston_Township_Menominee_County
>>> reading 2745.pax.zip.opex
SEARCHING 02N_01E_-_Survey_Map_of_

>>> reading 1393.pax.zip.opex
>>> reading 1392.pax.zip.opex
>>> reading 1395.pax.zip.opex
>>> reading 1391.pax.zip.opex
SEARCHING 48N_47W_-_Survey_Map_of_Ironwood_Township_Gogebic_County
>>> reading 2520.pax.zip.opex
SEARCHING 11N_11W_-_Survey_Map_of_Ensley_Township_Newaygo_County
>>> reading 609.pax.zip.opex
SEARCHING 37N_20W_-_Survey_Map_of_Island_in_Fairbanks_Township_Delta_Count
>>> reading 2793.pax.zip.opex
SEARCHING 14N_02E_-_Survey_Map_of_Midland_Township_Midland_County
>>> reading 1280.pax.zip.opex
>>> reading 1279.pax.zip.opex
SEARCHING 42N_02E_-_Survey_Map_of_Raber_Township_Chippewa_County
>>> reading 1891.pax.zip.opex
SEARCHING 43N_34W_-_Survey_Map_of_Bates_Township_Iron_County
>>> reading 2881.pax.zip.opex
SEARCHING 51N_32W_-_Survey_Map_of_LAnse_and_Arvon_Townships_Baraga_County
>>> reading 2271.pax.zip.opex
SEARCHING 11N_13E_-_Survey_Map_of_Elmer_Township_Sanilac_County
>>> reading 1524.pax.zip.opex
SEARCHING 40N_23W_-_Survey_Map_of_Wells_Township_Delta_County
>>> reading 

>>> reading 1887.pax.zip.opex
SEARCHING 42N_16W_-_Survey_Map_of_Thompson_Township_and_Hiawatha_Township_
>>> reading 2230.pax.zip.opex
SEARCHING 07S_03E_-_Survey_Map_of_Madison_Township_Lenawee_County
>>> reading 1788.pax.zip.opex
SEARCHING 16N_03W_-_Survey_Map_of_Wise_Township_Isabella_County
>>> reading 707.pax.zip.opex
>>> reading 706.pax.zip.opex
SEARCHING 05N_03E_-_Survey_Map_of_Antrim_Township_Shiawassee_County
>>> reading 1581.pax.zip.opex
SEARCHING 41N_01W_-_Survey_Map_of_Clark_Township_and_Marquette_Island_Mack
>>> reading 1995.pax.zip.opex
>>> reading 1994.pax.zip.opex
>>> reading 1993.pax.zip.opex
SEARCHING 13N_18W_-_Survey_Map_of_Claybanks_Township_Oceana_County
>>> reading 510.pax.zip.opex
SEARCHING 45N_31W_-_Survey_Map_of_Mansfield_Township_Iron_County
>>> reading 2995.pax.zip.opex
SEARCHING 28N_11W_-_Survey_Map_of_Elmwood_Township_Leelanau_County
>>> reading 889.pax.zip.opex
SEARCHING 35N_27W_-_Survey_Map_of_Lake_Township_and_Stephenson_Township_Me
>>> reading 2741.pax.z

IndexError: list index out of range

In [24]:
with open('metadata.csv', 'w') as f:
    write = csv.writer(f)
      
    write.writerow(fieldnames)
    write.writerows(All_Metadata)