# Using the pymdwizard core functionality in scripts

### An FGDC metadata record can be loaded as an XMLRecord object for inspection and editing

### Advanced users are sometimes tasked with complex metadata tasks that can be facilitated with a scripted solution.  
Examples could include:
    * Batch updating the contact info in a folder full of records.  
    * Creating a frequency list of all of the keywords used in a set of records.
    * Creating a report of the number of schema errors in each of a list of files.

The underlying Python components of the MD Wizard can be used to accomplish this.

In [1]:
from pymdwizard.core.xml_utils import XMLRecord, XMLNode

In [2]:
fname =  r"..\tests\data\USGS_ASC_PolarBears_FGDC.xml"
original_md = XMLRecord(fname)

### Drilling down into the structure is easy as each childnode is an child object of the record

simply chain a list of the FGDC 'shortnames' to get to the desired element.
code completion (tab in a Jupyter notebook) makes this relatively easy

In [3]:
print(original_md.metadata.idinfo.spdom.bounding)

<bounding>
  <westbc>178.2167</westbc>
  <eastbc>-178.9167</eastbc>
  <northbc>83.921</northbc>
  <southbc>63.3667</southbc>
</bounding>


### Editing can be done directly on the 'text' of a node

In [4]:
original_md.metadata.idinfo.spdom.bounding.eastbc.text = '-99'
print(original_md.metadata.idinfo.spdom.bounding)

<bounding>
  <westbc>178.2167</westbc>
  <eastbc>-99</eastbc>
  <northbc>83.921</northbc>
  <southbc>63.3667</southbc>
</bounding>


### Copying elements from one record to another is easy as well

In [5]:
template_fname = r"..\tests\\data\Onshore_Industrial_Wind_Turbine_Locations_for_the_United_States_through_July2013.xml"
template_md = XMLRecord(template_fname)

In [6]:
template_bounding = template_md.metadata.idinfo.spdom.bounding
print(template_bounding)

<bounding>
  <westbc>-180.000000</westbc>
  <eastbc>180.000000</eastbc>
  <northbc>79.972399</northbc>
  <southbc>10.273857</southbc>
</bounding>


In [7]:
original_md.metadata.idinfo.spdom.bounding = template_bounding

In [8]:
print(original_md.metadata.idinfo.spdom.bounding)

<bounding>
  <westbc>-180.000000</westbc>
  <eastbc>180.000000</eastbc>
  <northbc>79.972399</northbc>
  <southbc>10.273857</southbc>
</bounding>


### Note that multiple elements will be returned as a list, which can make things tricky but also handy

In [9]:
print(original_md.metadata.idinfo.keywords.theme,'\n')
print(type(original_md.metadata.idinfo.keywords.theme))

<theme>
  <themekt>None</themekt>
  <themekey>Polar Bear</themekey>
  <themekey>Ursus maritimum</themekey>
  <themekey>maternal denning</themekey>
</theme> 

<class 'pymdwizard.core.xml_utils.XMLNode'>


In [10]:
print(original_md.metadata.idinfo.keywords.theme.themekey,'\n')
print(type(original_md.metadata.idinfo.keywords.theme.themekey))

[<themekey>Polar Bear</themekey>, <themekey>Ursus maritimum</themekey>, <themekey>maternal denning</themekey>] 

<class 'list'>


### individual items in this list can be accessed with standard Python list indexing

In [11]:
print(original_md.metadata.idinfo.keywords.theme.themekey[2],)

<themekey>maternal denning</themekey>


### You can manipulate the contents of a item by using a set of functions to clear_children, add_child, or 
Manipulate it's child elements directly by accessing it's children attribute.

In [12]:
theme_kws = original_md.metadata.idinfo.keywords.theme
theme_kws

<theme>
  <themekt>None</themekt>
  <themekey>Polar Bear</themekey>
  <themekey>Ursus maritimum</themekey>
  <themekey>maternal denning</themekey>
</theme>

In [13]:
theme_kws.clear_children('themekey')
theme_kws

<theme>
  <themekt>None</themekt>
</theme>

In [14]:
for template_kw in template_md.metadata.idinfo.keywords.theme[1].themekey:
    print(template_kw.text)
    theme_kws.add_child(template_kw)
    
theme_kws

turbine
wind
shapefile
dataset
wind farm
windfarm
wind facility
geospatial datasets
energy
GIS
renewable


<theme>
  <themekt>None</themekt>
  <themekey>turbine</themekey>
  <themekey>wind</themekey>
  <themekey>shapefile</themekey>
  <themekey>dataset</themekey>
  <themekey>wind farm</themekey>
  <themekey>windfarm</themekey>
  <themekey>wind facility</themekey>
  <themekey>geospatial datasets</themekey>
  <themekey>energy</themekey>
  <themekey>GIS</themekey>
  <themekey>renewable</themekey>
</theme>

### When you're finished you might need to check for schema errors,
this record doesn't have any so lets add some first

In [15]:
original_md.metadata.idinfo.citation.citeinfo.title.text = ''
original_md.metadata.idinfo.timeperd.timeinfo.rngdates.begdate.text = 'bad date'
original_md.metadata.clear_children('metainfo') # this removes the entire metadata info section!


In [16]:
#Note that you can specify either 'fgdc' or 'bdp' for a schema, or the errors returned as a list instead of a data frame
original_md.validate(schema='bdp')

Unnamed: 0,xpath,message,line number
0,metadata/idinfo/citation/citeinfo/title,Element 'title': [facet 'pattern'] The value '...,7
1,metadata/idinfo/citation/citeinfo/title,Element 'title': '\n\n ' is not a valid...,7
2,metadata/idinfo/timeperd/timeinfo/rngdates/beg...,Element 'begdate': 'bad date' is not a valid v...,24
3,metadata/idinfo/ptcontac/cntinfo/cntorgp/cntorg,Element 'cntorg': [facet 'pattern'] The value ...,119
4,metadata/idinfo/ptcontac/cntinfo/cntorgp/cntorg,Element 'cntorg': '\n\n ' is not a va...,119
5,metadata,The 'metadata' is missing the expected element...,1


### It's possible to create nodes from scratch in the following manner

In [17]:
new_metainfo  = XMLNode(tag='metainfo')
new_metd = XMLNode(tag='metd', text='201704')
new_metainfo.add_child(new_metd)

new_metc = XMLNode(tag='metc')
new_metc.add_child(original_md.metadata.idinfo.ptcontac.cntinfo)
new_metainfo.add_child(new_metc)

new_metstdn = XMLNode(tag='metstdn', text='FGDC Content Standard for Digital Geospatial Metadata')
new_metainfo.add_child(new_metstdn)
new_metstdv = XMLNode(tag='metstdv', text='FGDC-STD-001-1998')
new_metainfo.add_child(new_metstdv)

original_md.metadata.add_child(new_metainfo)

### but usually it will be easier to either use/modify an exsiting compound element or create it from a chunk of xml represented as a string

In [18]:
metainfo_str = """<metainfo>
  <metd>20140609</metd>
  <metc>
    <cntinfo>
      <cntperp>
        <cntper>John Doe</cntper>
        <cntorg>U.S. Geological Survey, Core Science Systems</cntorg>
      </cntperp>
      <cntpos>Biologist</cntpos>
      <cntaddr>
        <addrtype>mailing address</addrtype>
        <address>Mail Stop 306, West 6th Ave. &amp; Kipling St., DFC Bldg. 810</address>
        <city>Lakewood</city>
        <state>CO</state>
        <postal>80225-0046</postal>
        <country>US</country>
      </cntaddr>
      <cntvoice>123-456-7890</cntvoice>
      <cntemail>jdoe@usgs.gov</cntemail>
    </cntinfo>
  </metc>
  <metstdn>FGDC Content Standard for Digital Geospatial Metadata</metstdn>
  <metstdv>FGDC-STD-001-1998</metstdv>
</metainfo>"""

new_metainfo = XMLNode(metainfo_str)

original_md.metadata.clear_children('metainfo')
original_md.metadata.add_child(new_metainfo)

#### Once you're all done you can save back out to the original file or save as a new file

In [19]:
# original_md.save()  #this would overwrite the original file
original_md.save(fname=r'c:\temp\mdwiz_example_final_md.xml')