## SquamataAssemballyAMT - Jupyter notebook for batch releasing Audio Magnetotellic (AMT) data to ScienceBase

This module performs the following operations:
- Create list of data directories.
- Identify files accompanying data release.
- Create file listing for metadata XML markup.
- Identify and load MT EDI file.
- Collect and harvest release parameters common to ALL metadata childs.
- Create entity and attribute XML markup.
- Poplulate metadata template
- Validate metadata; create error log; create HTML and FGDC Text versions of the metadata.
- Create all child metadata files from first example created in previous steps. (In development)
- Perhaps upload files to ScienceBase (In development)
- Change ScienceBase parameters such as citation information, add orcid ids, add USGS CMS tags, etc. (In development)

### To execute a function/command select a cell and Hold-Shift + Press-Enter

**The 'r' signifies a string literal. Use for paths.**

Metadata wizard:  Advanced, Open In a jupyter Notebook?
Metadata Wizard 2.o from ScienceBase

In [6]:
# Phil Brown (pbrown@usgs.gov) 2018
# Working Python 3 Notebook used to facilitate the release of Audio Magnetotelluric (AMT) Data to ScienceBase.

In [1]:
# Test Cell
print ("Jupyter is working.") #To run this cell, hold down Shift and press Enter.

Jupyter is working.


In [2]:
# Load required Libraries
import sys
import os
import zipfile
import csv
import pysb
import requests
import shutil
from shutil import copyfile
import zipfile
import datetime
import glob
from lxml import etree
import json
import pickle
import shutil
import fileinput
import json
import pandas as pd
import numpy as np
from IPython.core.display import display
from IPython.core.display import HTML
from lxml import etree
##from pymdwizard.core.xml_utils import XMLRecord
##from pymdwizard.core.xml_utils import XMLNode
import re
from ipywidgets import *
from IPython.display import display
from IPython.html.widgets import widgets



# 1) Step One - Set Directory Paths
## Please set directory paths below
### Directory paths include
- Data Path
    This is the path to the data, data structure should have a directory for each station
- Template Path
    The path to the XML metadata template being used for the data.  This template should already include all information common to all child metadata files e.g. originators, larger work citation, etc.

In [3]:
#Set Data Paths - perhaps we'll get a user form to do this some day?
mtDataPath = r"C:\CurrentWork\DataReleases\SquamataMT_TEST" #The 'r' signifies a string literal. Use for paths.
mtMataDataTemplatePath = r"C:\CurrentWork\DataManagement\SquamataMT"
mtMataDataTemplateName = "MT-MetaData_TEMPLATE.xml"

In [4]:
#Check Paths for the fun of it
print ('The MT Data Path is: ' + '"' + mtDataPath + '"')
mtMataDataTemplatePath + mtMataDataTemplateName

The MT Data Path is: "C:\CurrentWork\DataReleases\SquamataMT_TEST"


'C:\\CurrentWork\\DataManagement\\SquamataMTMT-MetaData_TEMPLATE.xml'

# 2) Step Two - Collect Common Parameters
## The first step is collect the information common to all child metadata sets
### Values Include:
- Data Release Title
    - Title may need to include station number in child item, need to come up with the best way to address this
- Data Release Originator(s)
- Larger Work Title
- Larger Work Originator(s)
- Larger Work URL
- Theme Keywords
- Location Keyword
- etc. etc

**Note that much of this can be obtained from the EDI file - this file can be viewed and values imported below...**



## Now, let's explore our data. 
- What files do we have? 
- What files do we import values from?

In [5]:
#Review content in file explorer

In [6]:
#Produce directory listing of station (SB Object Children)
#Either set up the root directory with station subdirectories only or delete non-station directories from the list array
mtDataDirList = os.listdir(mtDataPath)
mtDataDirList

['AMT01', 'AMT02', 'AMT03', 'AMT04', 'AMT05']

In [7]:
#Let's start with the first staion and check the result - we can then loop through the process 
#for the remaining stations in the list.
mtStationPath = mtDataPath + '\\' + mtDataDirList[1]
mtStationPath

'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT02'

In [8]:
#Look for EDI file to load
ediList = glob.glob(os.path.join(mtStationPath, '**/*MT*.edi'),  recursive=True)
ediPath = ediList[0]
print ('EDI File List:\n')
ediList
print ('EDI File Path:\n' + ediPath)       

EDI File List:

EDI File Path:
C:\CurrentWork\DataReleases\SquamataMT_TEST\AMT02\USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT02.edi


## Enter the information unique to this data set but common to all metadata files
### These include:
- Data Release Title
- Data Release Authors
- Theme Keywords
- Location Keywords

## After this step, information will be harvested from the MT EDI file. 
### These include:
- ProductId=USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT01.edi
- ExternalUrl Url=https://doi.org/10.5066/F72F7MQ7
- Attachment Filename=https://pubs.usgs.gov/of/2011/1264/report/OF11-1264.pdf
- Survey Purpose Description: 
- Data Description:
- Citation Title=Audiomagnetotelluric data, Taos Plateau Volcanic Field, New Mexico
- Citation Authors=Chad E. Ailes, Brian D. Rodriguez
- Citation Year=2011
- YearCollected=2009
- Country=USA                                  
- Ellipsoid=Clarke 1866                                                          
- Location datum=NAD27 CONUS                                                     
- SITE LATITUDE=36.752985000                                                     
- SITE LONGITUDE=-105.560966167                                                  
- Elevation units="meters"=2608.00                                                                     
- Start=2009-07-21T19:52:03 UTC/GMT
- End=2009-07-21T20:34:20 UTC/GMT
- ProcessingTimeSeriesUsed:
         wp01A1.bp1                                                                     
         wp01A2.bp1                                                                     
         wp01A1.sd6                                                                     
         wp01A2.sd6                                                                     
         wp01A1.sd7                                                                     
         wp01A2.sd8                                                                     
         wp01A2_3.sd9 
- Entities and Attributes:
    - FREQUENCIES
    - IMPEDANCE ROTATION ANGLES
    - IMPEDANCES
    - TIPPER PARAMETERS
    - COMPUTED PARAMETERS


In [9]:
## Create editable keywords example.  
## Example text is created after running this cell
## This text is displayed by running "display(keywords) below
keywords = widgets.Textarea(
    value='\t\t<keywords>\n\t\t\t<theme>\n\t\t\t\t<themekt>ISO 19115 Topic Category</themekt>\n\t\t\t\t<themekey>biota</themekey>\n\t\t\t</theme>\n\t\t\t<theme>\n\t\t\t\t<themekt>None</themekt>\n\t\t\t\t<themekey>impedance</themekey>\n\t\t\t\t<themekey>tipper</themekey>\n\t\t\t\t<themekey>apparent resistivity</themekey>\n\t\t\t\t<themekey>impedance phase</themekey>\n\t\t\t\t<themekey>impedance strike</themekey>\n\t\t\t\t<themekey>MT</themekey>\n\t\t\t\t<themekey>audiomagnetotelluric</themekey>\n\t\t\t\t<themekey>magnetotelluric</themekey>\n\t\t\t\t<themekey>AMT</themekey>\n\t\t\t\t<themekey>sounding</themekey>\n\t\t\t\t<themekey>Geology, Geophysics, and Geochemistry Science Center</themekey>\n\t\t\t\t<themekey>GGGSC</themekey>\n\t\t\t\t<themekey>Mineral Resources Program</themekey>\n\t\t\t\t<themekey>MRP</themekey>\n\t\t\t</theme>\n\t\t\t<theme>\n\t\t\t\t<themekt>USGS Thesaurus</themekt>\n\t\t\t\t<themekey>Magnetic field (earth)</themekey>\n\t\t\t\t<themekey>Geophysics</themekey>\n\t\t\t\t<themekey>GPS measurement</themekey>\n\t\t\t\t<themekey>Electromagnetic surveying</themekey>\n\t\t\t\t<themekey>Magnetic surveying</themekey>\n\t\t\t</theme>\n\t\t\t<place>\n\t\t\t\t<placekt>USGS Geographic Names Information System (GNIS)</placekt>\n\t\t\t\t<placekey>New Mexico</placekey>\n\t\t\t\t<placekey>Rio Grande del Norte National Monument</placekey>\n\t\t\t\t<placekey>{county}</placekey>\n\t\t\t\t<placekey>Rio Grande</placekey>\n\t\t\t</place>\n\t\t</keywords>\n',
    placeholder='Type something',
    #description='String:',
    layout=Layout(width='100%', height='666px'),
    disabled=False
)
print ('Keywords list created.')

Keywords list created.


### Change the text in the textbox below to relflect what should be included as the key words for all child items
**Please leave the {county} tag as is.  This value will be filled in from the edi file later**

Note that changing the text below at any time creates a keywords section of the metadata seen EXACTLY as it is shown below

In [10]:
# Run this cell for key word text to edit.  
# Edit the text in place.  
# When complete move on to the next step

display(keywords)

## Lets now import and index values from the EDI Files
- We need these values for the metadata template.  
- We also want to run stats on some of these values for the entity and attributes section

In [11]:
    #Load EDI File and Read It
    ediFile = open(ediPath, 'r')
    ediContent = ediFile.read()
    ediFile.close()
    print(ediContent)


>HEAD                                                                           
                                                                                
  DATAID="Wheeler Peak"                                                         
  ACQBY=USGS                                                                    
  ACQDATE=2009-07-24
  STATE="New Mexico"                                                            
  COUNTY=Taos                                                                   
  UNITS=M                                                                       
  STDVERS=1.0                                                                   
  PROGVERS=GEOTOOLS_2.3                                                         
  PROGDATE=09/16/94                                                             
                                                                                
>INFO   MAXLINES=1000                                                           
       

In [12]:
#Now assign values to the SB MetaDataWizard Template unknowns
list_ = ediContent.splitlines()
list_length = len (list_)

# there are probally easier ways to loop through the below but I like having it all hard coded upfront
# it's easire to track an change for me
# use the example below to extract additional parameters
# Not that all variables being collected are not necessarily used in populating the template.
# Values can be hardcoded into the metadata xml template and/or harvested from the edi file

for X in list_:
  if "ProductId" in X:
    productArray = X.split('=')
    productIdArray = productArray[1].split('.')
    productId = productIdArray[0]
    # We may want to reformat this are parse out this name further for use with a root name based on the Data Release Title?
    productId = productId.replace("-", " ")
    productId = productId.replace("_", " ")
    drTitle = productId
    print ('Child Title: ' + productId)
  if "ExternalUrl Url" in X:
    externalURLArray = X.split('=')
    externalURL = externalURLArray[1]
    print ('<onlink>: ' + externalURL)
  if "STATE" in X:
    stateArray = X.split('=')
    state = stateArray[1].replace('"', "") #remove quotes around state
    print ('State: ' + state)
  if "COUNTY" in X:
    countyArray = X.split('=')
    county = countyArray[1]
    print ('County: ' + county)
  if "Attachment Filename" in X and "http" in X:
    lgwrklinkArray = X.split('=')
    lgwrklink = lgwrklinkArray[1]
    print ('Attachment Filename Link: ' + lgwrklink)
  if "Citation Title" in X:
    citTitArray = X.split('=')
    citTit = citTitArray[1]
    print ('Citation Title: ' + citTit)
  if "Citation Authors" in X:
    citNamesArray = X.split('=')
    citAuthorsArray = citNamesArray[1].split(',')
    for author in citAuthorsArray:
     author = author.strip()
     print ('Author: '+ author)
  if "Citation Year" in X:
    citYearArray = X.split('=')
    citYear = citYearArray[1]
    print ('Citation Year: ' + citYear)
  if "YearCollected" in X:
    yearColArray = X.split('=')
    yearCol = yearColArray[1]
    print ('Year Collected: ' + yearCol)
  if "Ellipsoid" in X:
    ellipsoidArray = X.split('=')
    ellipsoid = ellipsoidArray[1]
    print ('Ellipsoid: ' + ellipsoid)
  if "Location datum" in X:
    locDatumArray = X.split('=')
    locDatum = locDatumArray[1]
    print ('Local datum: ' + locDatum)
  if "SITE LATITUDE" in X:
    sitLatArray = X.split('=')
    sitLat = sitLatArray[1] # !!! probally need to reformat this to have only 6 significant digits !!!
    print ('Site latitude: ' + sitLat)
  if "SITE LONGITUDE" in X:
    sitLonArray = X.split('=')
    sitLon = sitLonArray[1] # !!! probally need to reformat this to have only 6 significant digits !!!
    print ('Site longitude: ' + sitLon)
  if "Elevation units" in X:
    elevationStringArray = X.split('=')
    siteElevation = elevationStringArray[2] 
    print ('Site Elevation: ' + siteElevation)
    elevationUnits = elevationStringArray[1].replace('"', "")
    print ('Elevation Units: ' + elevationUnits)
    
# Code below returns values that occupy more than one line
    
for i in range(list_length):
 value = list_[i] 
 if value.replace(" ", "") == 'SurveyPurposeDescription:':
   startIndPurpose = i + 1
   #print ('startIndPurpose: ' + str(startIndPurpose))
 if value.replace(" ", "") == 'DataDescription:':
   endIndPurpose = i - 1
   #print ('endIndPurpose: ' + str(endIndPurpose))
purpose = list_[startIndPurpose]
for j in range(startIndPurpose + 1,endIndPurpose): 
    purpose = purpose + list_[j]
    purposeClean = re.sub(' +', ' ',purpose)
print ('\nAbstract:\n\t' + purposeClean)

for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == 'DataDescription:':
   startIndDescription = k + 1
   #print ('startIndDescription: ' + str(startIndDescription))
 if value.replace(" ", "") == 'FILECREATOR:':
   endIndDescription = k - 9
   #print ('endIndDescription: ' + str(endIndDescription))
description = list_[startIndDescription]
for l in range(startIndDescription + 1,endIndDescription): 
    description = description + list_[l]
    descriptionClean = re.sub(' +', ' ',description)
print ('\nPurpose:\n\t' + descriptionClean)
    

State: New Mexico                                                            
County: Taos                                                                   
Child Title: USA New Mexico Rio Grande Rift San Luis Basin 2009 AMT02
<onlink>: https://doi.org/10.5066/F72F7MQ7 
Attachment Filename Link: https://pubs.usgs.gov/of/2011/1264/report/OF11-1264.pdf    
Citation Title: Audiomagnetotelluric data, Taos Plateau Volcanic Field, New Mexico
Author: Chad E. Ailes
Author: Brian D. Rodriguez
Citation Year: 2011                                                             
Year Collected: 2009
Ellipsoid: Clarke 1866                                                          
Local datum: NAD27 CONUS                                                     
Site latitude: 36.719101667                                                     
Site longitude: -105.627109500                                                  
Site Elevation: 2325.00                                               
Elevation Units:

Entity and Attribute Values for the EDI file.  List !****FREQUENCIES****!,!****IMPEDANCE ROTATION ANGLES****!,!****IMPEDANCES****!,!****COMPUTED PARAMETERS****!

Here we load the frequencies
>!****FREQUENCIES****!

In [13]:
# Import entity and attributes - !****FREQUENCIES****! plan to break some of these individual chunks into objects/functions

# Get Range of Frequency Values in EDI File
for k in range(list_length):

 value = list_[k] 
 if value.replace(" ", "") == '>!****FREQUENCIES****!':
   startIndFrequencies = k + 3
   print ('startIndFrequencies: ' + str(startIndFrequencies))
 
 if value.replace(" ", "") ==  '>!****IMPEDANCEROTATIONANGLES****!':
   endIndFrequencies = k - 1
   print ('endIndFrequencies: ' + str(endIndFrequencies))

frequencyData = []
fdata = []
fdataTemp = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
frequencyDF = pd.DataFrame(fdata)
for j in range(startIndFrequencies,endIndFrequencies):
    fdataTemp = list_[j]
    fdataTemp = re.sub(' +', ' ',fdataTemp)
    fdataTemp = fdataTemp.split(" ")
    del fdataTemp[0]
    fdata = fdata + fdataTemp
    
print (fdata)  
fdata = np.array(fdata).astype(np.float) #convert String to floats
frequencyDF = pd.DataFrame(fdata,columns=['Frequencies'])
frequencyDF


startIndFrequencies: 290
endIndFrequencies: 295
['2.20000000E+03', '1.87000000E+03', '1.50000000E+03', '1.17000000E+03', '8.85000000E+02', '7.20000000E+02', '5.80000000E+02', '4.60000000E+02', '3.40000000E+02', '2.70000000E+02', '2.10000000E+02', '1.72399994E+02', '1.50000000E+02', '1.22099998E+02', '1.00000000E+02', '8.59400024E+01', '7.90000000E+01', '6.00600014E+01', '4.15000000E+01', '2.83199997E+01', '1.90400009E+01', '1.22100000E+01', '7.32399988E+00', '4.39400005E+00']


Unnamed: 0,Frequencies
0,2200.0
1,1870.0
2,1500.0
3,1170.0
4,885.0
5,720.0
6,580.0
7,460.0
8,340.0
9,270.0


In [14]:
# Now lets get the stats of the frequency data
#Make Array of Max Vallues
frequencyMax = frequencyDF[('Frequencies')].max()
print ('Max. Frequency: ' + str(frequencyMax))
frequencyMin = frequencyDF[('Frequencies')].min()
print ('Min. Frequency: ' + str(frequencyMin))

Max. Frequency: 2200.0
Min. Frequency: 4.39400005


Here we load the Impedance Rotation Angles
>!****IMPEDANCE ROTATION ANGLES****!

In [15]:
# Import entity and attributes - !****IMPEDANCE ROTATION ANGLES****! plan to break some of these individual chunks into objects/functions

# Get Range of Frequency Values in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****IMPEDANCEROTATIONANGLES****!':
   startIndROT = k + 3
   print ('startIndROT: ' + str(startIndROT))
 
 if value.replace(" ", "") ==  '>!****IMPEDANCES****!':
   endIndROT = k - 1
   print ('endIndROT: ' + str(endIndROT))

rdata = []
rdataTemp = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
rotationDF = pd.DataFrame(rdata)
for j in range(startIndROT,endIndROT):
    rdataTemp = list_[j]
    rdataTemp = re.sub(' +', ' ',rdataTemp)
    rdataTemp = rdataTemp.split(" ")
    del rdataTemp[0]
    rdata = rdata + rdataTemp
    
print (rdata)  
rdata = np.array(rdata).astype(np.float) #convert String to floats
rotationDF = pd.DataFrame(rdata,columns=['ZROT'])
rotationDF

startIndROT: 299
endIndROT: 304
['0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00']


Unnamed: 0,ZROT
0,0.0
1,0.0
2,0.0
3,0.0
4,0.0
5,0.0
6,0.0
7,0.0
8,0.0
9,0.0


In [16]:
# Now lets get the stats of the rotation data
#Make Array of Max Vallues
rotationMax = rotationDF[('ZROT')].max()
print ('Max. ZROT: ' + str(frequencyMax))
rotationMin = rotationDF[('ZROT')].min()
print ('Min. ZROT: ' + str(frequencyMin))

Max. ZROT: 2200.0
Min. ZROT: 4.39400005


Here we load the impedances
>!****IMPEDANCES****!

In [17]:
# Import entity and attributes - !****IMPEDANCES****! plan to break some of these individual chunks into objects/functions

# Get Range of Impedance Values in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****IMPEDANCES****!':
   startIndImpedances = k + 1
   print ('startIndImpedances: ' + str(startIndImpedances))
 
 if value.replace(" ", "") ==  '>!****TIPPERPARAMETERS****!':
   endIndImpedances = k - 1
   print ('endIndImpedances: ' + str(endIndImpedances))

#Construct Array of Channel Headers   
count = 0
impedanceLabel = []
impedanceData = []
data = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
impedanceDF = pd.DataFrame(data)
for l in range(startIndImpedances,endIndImpedances): 
    if list_[l][0] == '>':
     temp = list_[l].split(" ", 1)
     #print (temp)
     impedanceLabel.append((temp[0].split(">"))[1])
     dataTemp = list_[l+1]
     for j in range(l+2,l+8):
      dataTemp = dataTemp + list_[j]
      dataTemp = re.sub(' +', ' ',dataTemp)
     data = dataTemp.split(" ")
     del data[0] # need to check for empty strings and delete these from the array of the string can't be converted to a float
     del data[len(data)-1] # need to check for empty strings and delete
     #print (data)
     data = np.array(data).astype(np.float) #convert String to floats
     se = pd.Series(data)
     print ((temp[0].split(">"))[1])   
     impedanceDF[((temp[0].split(">"))[1])] = se.values
    
    count = count + 1

#impedanceDF = pd.DataFrame(data, columns=(impedanceLabel))
impedanceDF
#data
#se 

startIndImpedances: 306
endIndImpedances: 401
ZXXR
ZXXI
ZXX.VAR
ZXYR
ZXYI
ZXY.VAR
ZYXR
ZYXI
ZYX.VAR
ZYYR
ZYYI
ZYY.VAR


Unnamed: 0,ZXXR,ZXXI,ZXX.VAR,ZXYR,ZXYI,ZXY.VAR,ZYXR,ZYXI,ZYX.VAR,ZYYR,ZYYI,ZYY.VAR
0,3924.94263,-2015.11523,2144984.0,3797.97852,330.921692,2851095.0,1239.04382,-874.722595,2372985.0,701.730225,1045.36267,3154153.0
1,2687.30688,-1647.26038,201653.3,4333.03271,204.971909,136748.5,-183.585144,-1718.01868,209940.2,681.476807,294.824493,142368.2
2,35.448586,1089.61707,1228831.0,1339.37964,2639.52856,1461333.0,-1018.27338,1666.60852,2204710.0,56.535198,3195.12622,2621854.0
3,376.676208,408.044647,738545.9,1880.63391,1603.45276,780102.5,483.134064,644.262634,770453.1,1923.92896,1905.83447,813805.0
4,-1783.88586,-175.149857,747063.5,-301.868927,574.8302,868995.4,-1967.04712,-334.691895,1401709.0,-570.220154,388.645142,1630489.0
5,-93.253197,-95.225113,751183.4,1949.94836,687.834412,834806.8,-440.217834,41.43185,1561262.0,1600.9856,865.562622,1735065.0
6,1126.8894,-402.942719,383099.6,3019.24805,444.371063,481927.3,1582.92761,-354.733459,650304.0,3430.46509,642.432922,818062.2
7,172.338776,-1180.65698,333517.5,1893.68384,-381.168182,361048.6,411.245697,-1849.56995,706594.9,2000.53577,-867.789978,764922.9
8,5.998157,-1209.58179,33376.41,1478.19751,-320.28772,40169.94,126.281784,-1469.62256,56881.98,1444.94653,-309.629913,68459.88
9,-153.26561,258.357727,18736.24,1390.42505,1367.97034,21755.57,-596.810547,515.632263,54594.13,730.452881,2020.32983,63391.92


In [18]:
# Now lets get the stats of the impedance data
#Make Array of Max Vallues
impedanceMax = []
for i in range (0,len(impedanceLabel)):
    impedanceMax.append(impedanceDF[(impedanceLabel[i])].max())
    
impedanceMin = []
for i in range (0,len(impedanceLabel)):
    impedanceMin.append(impedanceDF[(impedanceLabel[i])].min())

impedanceMin

[-1820.34741,
 -2226.51807,
 25.2085571,
 -557.081299,
 -1440.76245,
 35.7908249,
 -1967.04712,
 -1849.56995,
 16.3143215,
 -570.220154,
 -867.789978,
 23.1628914]

Here we load the tipper parameters
>!****TIPPER PARAMETERS****!

In [19]:
# Import entity and attributes - !****TIPPER PARAMETERS****! plan to break some of these individual chunks into objects/functions
# Probably will need two functions for this - one for a single list and one for the long lists with more than one column
# Get Range of Impedance Values in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****TIPPERPARAMETERS****!':
   startIndTipper = k + 1
   print ('startIndTipper: ' + str(startIndTipper))
 
 if value.replace(" ", "") ==  '>!****COMPUTEDPARAMETERS****!':
   endIndTipper = k - 1
   print ('endIndTipper: ' + str(endIndTipper))

#Construct Array of Channel Headers   
count = 0
tipperLabel = []
tipperData = []
tdata = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
tipperDF = pd.DataFrame(tdata)
for l in range(startIndTipper,endIndTipper): 
    if list_[l][0] == '>':
     ttemp = list_[l].split(" ", 1)
     #print (ttemp)
     tipperLabel.append((ttemp[0].split(">"))[1])
     tdataTemp = list_[l+1]
     for j in range(l+2,l+8):
      tdataTemp = tdataTemp + list_[j]
      tdataTemp = re.sub(' +', ' ',tdataTemp)
      tdata = tdataTemp.split(" ")
     #print (tdata)
     del tdata[0]
     del tdata[len(tdata)-1] # need to check for empty strings and delete
     tdata = np.array(tdata).astype(np.float) #convert String to floats
     te = pd.Series(tdata)
     print ((ttemp[0].split(">"))[1])   
     tipperDF[((ttemp[0].split(">"))[1])] = te.values
    
    count = count + 1

#tipperDF = pd.DataFrame(tdata, columns=(tipperLabel))
tipperDF
#tdata
#te 

startIndTipper: 403
endIndTipper: 450
TXR.EXP
TXI.EXP
TXVAR.EXP
TYR.EXP
TYI.EXP
TYVAR.EXP


Unnamed: 0,TXR.EXP,TXI.EXP,TXVAR.EXP,TYR.EXP,TYI.EXP,TYVAR.EXP
0,-3.082384,1.498865,21.147022,0.351321,1.903971,28.108458
1,-2.485983,-1.116059,3.062269,-0.315124,-1.25413,2.076637
2,-5.961558,-0.230998,18.169838,-3.502284,-0.187459,21.607678
3,-7.512102,0.221302,4.194789,-5.84971,-0.082489,4.430822
4,1.749586,-2.551613,104.971214,5.081409,-2.831406,122.104088
5,-1.404703,-0.189633,8.057308,-0.176374,-0.116849,8.954266
6,-7.635828,0.96064,7.174944,-5.640436,0.270558,9.025857
7,-3.439769,4.283657,192.277191,-0.434035,3.223024,208.149246
8,-1.974325,1.870983,0.354525,1.197152,-0.176371,0.426685
9,2.386546,3.267124,3.34926,6.530264,0.547468,3.88899


In [20]:
# Now lets get the stats of the tipper data

# Make Array of Max Values
tipperMax = []
for i in range (0,len(tipperLabel)):
    tipperMax.append(tipperDF[(tipperLabel[i])].max())
print ('Tipper Max: ' + str(tipperMax))    

# Make Array of Min Values
tipperMin = []
for i in range (0,len(tipperLabel)):
    tipperMin.append(tipperDF[(tipperLabel[i])].min())
print ('Tipper Min: ' + str(tipperMin))

Tipper Max: [2.3865459, 4.2836566, 192.277191, 6.53026438, 3.22302389, 208.149246]
Tipper Min: [-7.6358285, -2.55161262, 0.00134994881, -5.84971046, -4.28522635, 0.00201180577]


Here we load the computed parameters
>!****COMPUTED PARAMETERS****!

In [21]:
# Import entity and attributes - !****COMPUTED PARAMETERS****! plan to break some of these individual chunks into objects/functions
# Probably will need two functions for this - one for a single list and one for the long lists with more than one column
# Get Range of Impedance Values in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****COMPUTEDPARAMETERS****!':
   startIndPar = k + 1
   print ('startIndPar: ' + str(startIndPar))
 
 if value.replace(" ", "") ==  '>END':
   endIndPar = k - 1
   print ('endIndPar: ' + str(endIndPar))

#Construct Array of Channel Headers   
count = 0
parLabel = []
parData = []
pdata = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
parDF = pd.DataFrame(pdata)
for l in range(startIndPar,endIndPar): 
    if list_[l][0] == '>':
     ptemp = list_[l].split(" ", 1)
     #print (ptemp)
     parLabel.append((ptemp[0].split(">"))[1])
     pdataTemp = list_[l+1]
     for j in range(l+2,l+8):
      pdataTemp = pdataTemp + list_[j]
      pdataTemp = re.sub(' +', ' ',pdataTemp)
      pdata = pdataTemp.split(" ")
     #print (pdata)
     del pdata[0]
     del pdata[len(pdata)-1] # need to check for empty strings and delete
     pdata = np.array(pdata).astype(np.float) #convert String to floats
     pe = pd.Series(pdata)
     print ((ptemp[0].split(">"))[1])   
     parDF[((ptemp[0].split(">"))[1])] = te.values
    
    count = count + 1

parDF
#pdata
#pe 

startIndPar: 452
endIndPar: 771
RHOROT
RHOXX
RHOXX.ERR
RHOXY
RHOXY.ERR
RHOYX
RHOYX.ERR
RHOYY
RHOYY.ERR
PHSXX
PHSXX.ERR
PHSXY
PHSXY.ERR
PHSYX
PHSYX.ERR
PHSYY
PHSYY.ERR
TIPMAG
TIPMAG.ERR
TIPPHS
TIPPHS.ERR
ZSTRIKE
ZSKEW
TSTRIKE
COH
COH
COH
COH
EPREDCOH
EPREDCOH
SIGAMP
SIGAMP
SIGAMP
SIGAMP
SIGAMP
SIGNOISE
SIGNOISE
SIGNOISE
SIGNOISE
SIGNOISE


Unnamed: 0,RHOROT,RHOXX,RHOXX.ERR,RHOXY,RHOXY.ERR,RHOYX,RHOYX.ERR,RHOYY,RHOYY.ERR,PHSXX,...,TIPMAG.ERR,TIPPHS,TIPPHS.ERR,ZSTRIKE,ZSKEW,TSTRIKE,COH,EPREDCOH,SIGAMP,SIGNOISE
0,28.108458,28.108458,28.108458,28.108458,28.108458,28.108458,28.108458,28.108458,28.108458,28.108458,...,28.108458,28.108458,28.108458,28.108458,28.108458,28.108458,28.108458,28.108458,28.108458,28.108458
1,2.076637,2.076637,2.076637,2.076637,2.076637,2.076637,2.076637,2.076637,2.076637,2.076637,...,2.076637,2.076637,2.076637,2.076637,2.076637,2.076637,2.076637,2.076637,2.076637,2.076637
2,21.607678,21.607678,21.607678,21.607678,21.607678,21.607678,21.607678,21.607678,21.607678,21.607678,...,21.607678,21.607678,21.607678,21.607678,21.607678,21.607678,21.607678,21.607678,21.607678,21.607678
3,4.430822,4.430822,4.430822,4.430822,4.430822,4.430822,4.430822,4.430822,4.430822,4.430822,...,4.430822,4.430822,4.430822,4.430822,4.430822,4.430822,4.430822,4.430822,4.430822,4.430822
4,122.104088,122.104088,122.104088,122.104088,122.104088,122.104088,122.104088,122.104088,122.104088,122.104088,...,122.104088,122.104088,122.104088,122.104088,122.104088,122.104088,122.104088,122.104088,122.104088,122.104088
5,8.954266,8.954266,8.954266,8.954266,8.954266,8.954266,8.954266,8.954266,8.954266,8.954266,...,8.954266,8.954266,8.954266,8.954266,8.954266,8.954266,8.954266,8.954266,8.954266,8.954266
6,9.025857,9.025857,9.025857,9.025857,9.025857,9.025857,9.025857,9.025857,9.025857,9.025857,...,9.025857,9.025857,9.025857,9.025857,9.025857,9.025857,9.025857,9.025857,9.025857,9.025857
7,208.149246,208.149246,208.149246,208.149246,208.149246,208.149246,208.149246,208.149246,208.149246,208.149246,...,208.149246,208.149246,208.149246,208.149246,208.149246,208.149246,208.149246,208.149246,208.149246,208.149246
8,0.426685,0.426685,0.426685,0.426685,0.426685,0.426685,0.426685,0.426685,0.426685,0.426685,...,0.426685,0.426685,0.426685,0.426685,0.426685,0.426685,0.426685,0.426685,0.426685,0.426685
9,3.88899,3.88899,3.88899,3.88899,3.88899,3.88899,3.88899,3.88899,3.88899,3.88899,...,3.88899,3.88899,3.88899,3.88899,3.88899,3.88899,3.88899,3.88899,3.88899,3.88899


In [22]:
# Now lets get the stats of the computed parameters

# Make Array of Max Values
parMax = []
for i in range (0,len(parLabel)):
    parMax.append(parDF[(parLabel[i])].max())
print ('Computed Pararmeters Max: ' + str(parMax))    

# Make Array of Min Values
parMin = []
for i in range (0,len(parLabel)):
    parMin.append(parDF[(parLabel[i])].min())
print ('Computed Pararmeters Min: ' + str(parMin))

Computed Pararmeters Max: [208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246, 208.149246]
Computed Pararmeters Min: [0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0.00201180577, 0

## Now lets get the range of values from the RSP values

In [23]:
#First Get the list of RSP files
rspList = glob.glob(os.path.join(mtStationPath, '*.RSP'),  recursive=True)
#rspList
fileListing = ''
for i in range(len(rspList)):
    splitRspList = rspList[i].split('\\')
    fileListing = fileListing + splitRspList[len(splitRspList) - 1] + '\n'
print (fileListing)

BF6-9621.RSP
BF6-9624.RSP
BF6-9625.RSP
EF-9608X.RSP
EF-9608Y.RSP



## Now the raw Binary File Listing - this can be T files or W files
We will need to figure out the best way of filtering on thise - may need to build array and then delete AVG, dmp and edi file.

These are listed in the edi file as well but they are not all there.  

    ProcessingTimeSeriesUsed:
         wp01A1.bp1                                                                     
         wp01A2.bp1                                                                     
         wp01A1.sd6                                                                     
         wp01A2.sd6                                                                     
         wp01A1.sd7                                                                     
         wp01A2.sd8                                                                     
         wp01A2_3.sd9 

- Which files need to be included in the data release?
- What is the best way to get this listing?

In [24]:
#Add the raw frequency files to the list except the AVG, dmp and edi file
binList = glob.glob(os.path.join(mtStationPath, 'WP*.*'),  recursive=True)
#binList
for i in range(len(binList)):
  splitBinList = binList[i].split('\\')
  if splitBinList[len(splitBinList) - 1].find('AVG') == -1 and splitBinList[len(splitBinList) - 1].find('dmp') == -1 and splitBinList[len(splitBinList) - 1].find('edi') == -1:
   fileListing = fileListing + splitBinList[len(splitBinList) - 1] + '\n'

print ('File ListingfileListing:\n' + fileListing)

File ListingfileListing:
BF6-9621.RSP
BF6-9624.RSP
BF6-9625.RSP
EF-9608X.RSP
EF-9608Y.RSP
WP02A1.BP1
WP02A1.FC6
WP02A1.FC7
WP02A1.FC8
WP02A1.FC9
WP02A1.SD6
WP02A1.SD7
WP02A1.SD8
WP02A1.SD9
WP02A1.TS1
WP02A2.BP1
WP02A2.FC6
WP02A2.FC7
WP02A2.FC8
WP02A2.FC9
WP02A2.SD6
WP02A2.SD7
WP02A2.SD8
WP02A2.SD9
WP02A2.TS1



In [25]:
#Now finally add the processed ASCII text files to the list
txtList = glob.glob(os.path.join(mtStationPath, '*.txt'),  recursive=True)
#txtList
for i in range(len(txtList)):
  splitTxtList = txtList[i].split('\\')
  fileListing = fileListing + splitTxtList[len(splitTxtList) - 1] + '\n'

print ('File ListingfileListing:\n' + fileListing)

File ListingfileListing:
BF6-9621.RSP
BF6-9624.RSP
BF6-9625.RSP
EF-9608X.RSP
EF-9608Y.RSP
WP02A1.BP1
WP02A1.FC6
WP02A1.FC7
WP02A1.FC8
WP02A1.FC9
WP02A1.SD6
WP02A1.SD7
WP02A1.SD8
WP02A1.SD9
WP02A1.TS1
WP02A2.BP1
WP02A2.FC6
WP02A2.FC7
WP02A2.FC8
WP02A2.FC9
WP02A2.SD6
WP02A2.SD7
WP02A2.SD8
WP02A2.SD9
WP02A2.TS1
readme.txt
USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT02-FC6_01.txt
USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT02-FC6_02.txt
USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT02-FC7_01.txt
USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT02-FC7_02.txt
USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT02-FC8_01.txt
USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT02-FC8_02.txt
USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT02-FC9_01.txt
USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT02-FC9_02.txt
USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT02-SD6_01.txt
USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT02-SD6_02.txt

# Now get values and stats on the .RSP files that are listed to add to the Ent. and Att. information

## !! So the below is interesting - we will need to speak with Brian to figure out how this file should be formated.  I don't think any of the .RSP files are formated correctly as intended. !!

In [75]:
# Load RSP files into pandas

#for i in range(len(rspList)):
    #print(rspList[i])
    
dfrsp = pd.read_csv(rspList[1],sep='\t',skiprows=(0), header=(4))

dfrsp.rename(index=str, columns={"Freq  Amp   Phz": "test"}, inplace=True)
#names = dfrsp.columns.values
#names[0]
#count = dfrsp.test.str.count(".")
#count

df2 = dfrsp.join(dfrsp['test'].str.split('  ', -1, expand=True).rename(columns={0:'Amp', 1:'Phz'}))

df2

Unnamed: 0,test,Amp,Phz,2,3,4
0,31 1,31,,1.0,,
1,.10.00183 89.3,,.10.00183,89.3,,
2,.15.00275 88.9,,.15.00275,88.9,,
3,.20.00366 88.6,,.20.00366,88.6,,
4,.30.00549 87.9,,.30.00549,87.9,,
5,.40.00732 87.2,,.40.00732,87.2,,
6,.60.01090 85.8,,.60.01090,85.8,,
7,.80.01450 84.4,,.80.01450,84.4,,
8,1.0.01810 83.0,,1.0.01810,83.0,,
9,1.5.02700 79.6,,1.5.02700,79.6,,


# Populate Metadata Template

In [26]:
#Load EDI File and Read It
metaData = os.path.join(mtMataDataTemplatePath, mtMataDataTemplateName)
xmlTemplateFile = open(metaData, 'r')
metaDataContent = xmlTemplateFile.readlines()
print(metaDataContent)
xmlTemplateFile.close()


['<?xml version="1.0" encoding="UTF-8"?>\n', '<metadata>\n', '\t<idinfo>\n', '\t\t<citation>\n', '\t\t\t<citeinfo>\n', '\t\t\t\t<origin>Rodriguez, B. D.</origin>\n', '\t\t\t\t<origin>Brown, P. J.</origin>\n', '\t\t\t\t<pubdate>2018</pubdate>\n', '\t\t\t\t<title>{title}</title>\n', '\t\t\t\t<edition>{edition}</edition>\n', '\t\t\t\t<geoform>ASCII and Binary Digital Data</geoform>\n', '\t\t\t\t<pubinfo>\n', '\t\t\t\t\t<pubplace>Denver, CO</pubplace>\n', '\t\t\t\t\t<publish>U.S. Geological Survey</publish>\n', '\t\t\t\t</pubinfo>\n', '\t\t\t\t<othercit>Additional information about Originators:Rodriguez, B.D., http://orcid.org/0000-0002-2263-611X; Brown, P.J., http://orcid.org/0000-0002-2415-7462</othercit>\n', '\t\t\t\t<onlink>{onlink}</onlink>\n', '\t\t\t\t<lworkcit>\n', '\t\t\t\t\t<citeinfo>\n', '\t\t\t\t\t\t<origin>Ailes, C. E.</origin>\n', '\t\t\t\t\t\t<origin>Rodriguez, B. D.</origin>\n', '\t\t\t\t\t\t<pubdate>2011</pubdate>\n', '\t\t\t\t\t\t<title>Audiomagnetotelluric data, Taos Pla

In [28]:
# Replace values of current metadata template with the appropriate values.  
# All of this input should have been defined when going through the steps outlined above.
lineString = ''
newMetaDataContent = metaDataContent
splitFileName = ediList[0].split('.')
myfilename = splitFileName[0] + '.xml'
xmlFile = open(myfilename,"w+")
print(myfilename)
#print(keywords.value)
for i in range(len(metaDataContent)):
    lineString = metaDataContent[i]
    if lineString.find('{title}'):
     lineString = lineString.replace('{title}', drTitle)
    
    if lineString.find('{abstract}'):
     lineString = lineString.replace('{abstract}', purposeClean)
    
    if lineString.find('{purpose}'):
     lineString = lineString.replace('{purpose}', descriptionClean)
    
    if lineString.find('{BeginFileListingHere}'):
     lineString = lineString.replace('{BeginFileListingHere}', descriptionClean)
    
    if lineString.find('{keywords}'):
     lineString = lineString.replace('{keywords}', keywords.value)
    
    else:
     lineString = lineString
    xmlFile.write(lineString)
    #print (lineString)
     
    
    
#for r in (metaDataContent):
    #newMetaDataContent = metaDataContent.replace('{title}', drTitle)
    #newMetaDataContent = metaDataContent.replace('{keywords}', keywords.value)
xmlFile.close()

print ('Creation of new metadata file is complete\n\n') 
#Load EDI File and Read It
##checkFile = open(open(myfilename, 'r')
##checkFileContent = checkFile.read()
##checkFile.close()
##print(checkFileContent)

C:\CurrentWork\DataReleases\SquamataMT_TEST\AMT02\USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT02.xml
Creation of new metadata file is complete




### At this point the new metadata file should be created.  Check the result below.  If this is ok save this file and run the loop for the rest of the children...

In [65]:
# Show the resulting child xml metadata file example 
#for i in range(len(newMetaDataContent)):
print (newMetaDataContent)

['<?xml version="1.0" encoding="UTF-8"?>\n', '<metadata>\n', '\t<idinfo>\n', '\t\t<citation>\n', '\t\t\t<citeinfo>\n', '\t\t\t\t<origin>{origin}</origin>\n', '\t\t\t\t<pubdate>{pubdate}</pubdate>\n', '\t\t\t\t<title>{title}</title>\n', '\t\t\t\t<edition>{edition}</edition>\n', '\t\t\t\t<geoform>ASCII and Binary Digital Data</geoform>\n', '\t\t\t\t<pubinfo>\n', '\t\t\t\t\t<pubplace>Denver, CO</pubplace>\n', '\t\t\t\t\t<publish>U.S. Geological Survey</publish>\n', '\t\t\t\t</pubinfo>\n', '\t\t\t\t<othercit>{othercit}</othercit><!--Please add an Orcid ID here e.g., "Additional information about Originator: Rodriguez, B.D, http://orcid.org/0000-0002-2263-611X"-->\n', '\t\t\t\t<onlink>{onlink}</onlink>\n', '\t\t\t\t<lworkcit>\n', '\t\t\t\t\t<citeinfo>\n', '\t\t\t\t\t\t{BeginOriginLoop}<!--Place to print larger work originators here. Example is:\n', '\t\t\t\t\t\t<origin>Originating Author Name</origin> /carrage return (CR is &#13; and not &#10; which is LF)\n', '\t\t\t\t\t\t-->\n', '\t\t\t\t