## SquamataMT - Jupyter notebook for releasing MT data to ScienceBase

This module performs the following operations:
- Create list of data directories.
- Identify files accompanying data release.
- Create file listing for metadata XML markup.
- Identify and load MT EDI file.
- Collect and harvest release parameters common to ALL metadata childs.
- Create entity and attribute XML markup.
- Poplulate metadata template
- Validate metadata; create error log; create HTML and FGDC Text versions of the metadata.
- Create all child metadata files from first example created in previous steps. (In development)
- Perhaps upload files to ScienceBase (In development)
- Change ScienceBase parameters such as citation information, add orcid ids, add USGS CMS tags, etc. (In development)

### To execute a function/command select a cell and Hold-Shift + Press-Enter

**The 'r' signifies a string literal. Use for paths.**

Metadata wizard:  Advanced, Open In a jupyter Notebook?
Metadata Wizard 2.o from ScienceBase

In [1]:
# Phil Brown (pbrown@usgs.gov) 2018
# Working Python 3 Notebook used to facilitate the release of Magnetotelluric (MT) Data to ScienceBase.

In [2]:
# Test Cell
print ("Jupyter is working.") #To run this cell, hold down Shift and press Enter.

Jupyter is working.


In [76]:
# Load required Libraries
import sys
import os
import zipfile
import csv
import pysb
import requests
import shutil
from shutil import copyfile
import zipfile
import datetime
import glob
from lxml import etree
import json
import pickle
import shutil
import fileinput
import json
import pandas as pd
import numpy as np
from IPython.core.display import display
from IPython.core.display import HTML
from lxml import etree
##from pymdwizard.core.xml_utils import XMLRecord
##from pymdwizard.core.xml_utils import XMLNode
import re
from ipywidgets import *
from IPython.display import display
from IPython.html.widgets import widgets

# 1) Step One - Set Directory Paths
## Please set directory paths below
### Directory paths include
- Data Path
    - This is the path to the data, data structure should have a directory for each station
- Template Path

In [4]:
#Set Data Paths - perhaps we'll get a user form to do this some day?
mtDataPath = r"C:\CurrentWork\DataReleases\SquamataMT_TEST" #The 'r' signifies a string literal. Use for paths.
mtMataDataTemplatePath = r"C:\CurrentWork\DataManagement\SquamataMT"
mtMataDataTemplateName = "MT-MetaData_TEMPLATE.xml"

In [5]:
#Check Paths for the fun of it
print ('The MT Data Path is: ' + '"' + mtDataPath + '"')
mtMataDataTemplatePath + mtMataDataTemplateName

The MT Data Path is: "C:\CurrentWork\DataReleases\SquamataMT_TEST"


'C:\\CurrentWork\\DataManagement\\SquamataMTMT-MetaData_TEMPLATE.xml'

# 2) Step Two - Collect Common Parameters
## The first step is collect the information common to all child metadata sets
### Values Include:
- Data Release Title
    - Title may need to include station number in child item, need to come up with the best way to address this
- Data Release Originator(s)
- Larger Work Title
- Larger Work Originator(s)
- Larger Work URL
- Theme Keywords
- Location Keyword
- etc. etc

**Note that much of this can be obtained from the EDI file - this file can be viewed and values imported below...**



## Now, let's explore our data. 
- What files do we have? 
- What files do we import values from?

In [6]:
#Review content in file explorer

In [7]:
mtDataDirList = os.listdir(mtDataPath)
mtDataDirList

['AMT01', 'AMT02', 'AMT03', 'AMT04', 'AMT05']

In [8]:
mtStationPath = mtDataPath + '\\' + mtDataDirList[0]
mtStationPath

'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01'

In [9]:
#Look for EDI file to load
ediList = glob.glob(os.path.join(mtStationPath, '**/*MT*.edi'),  recursive=True)
ediPath = ediList[0]
ediPath

'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT01.edi'

## Enter the information unique to this data set but common to all metadata files
### These include:
- Data Release Title
- Data Release Authors
- Theme Keywords
- Location Keywords

## After this step, information will be harvested from the MT EDI file. 
### These include:
- ProductId=USA-New_Mexico-Rio_Grande_Rift-San_Luis_Basin-2009-AMT01.edi
- ExternalUrl Url=https://doi.org/10.5066/F72F7MQ7
- Attachment Filename=https://pubs.usgs.gov/of/2011/1264/report/OF11-1264.pdf
- Survey Purpose Description: 
- Data Description:
- Citation Title=Audiomagnetotelluric data, Taos Plateau Volcanic Field, New Mexico
- Citation Authors=Chad E. Ailes, Brian D. Rodriguez
- Citation Year=2011
- YearCollected=2009
- Country=USA                                  
- Ellipsoid=Clarke 1866                                                          
- Location datum=NAD27 CONUS                                                     
- SITE LATITUDE=36.752985000                                                     
- SITE LONGITUDE=-105.560966167                                                  
- Elevation units="meters"=2608.00                                                                     
- Start=2009-07-21T19:52:03 UTC/GMT
- End=2009-07-21T20:34:20 UTC/GMT
- ProcessingTimeSeriesUsed:
         wp01A1.bp1                                                                     
         wp01A2.bp1                                                                     
         wp01A1.sd6                                                                     
         wp01A2.sd6                                                                     
         wp01A1.sd7                                                                     
         wp01A2.sd8                                                                     
         wp01A2_3.sd9 
- Entities and Attributes:
    - FREQUENCIES
    - IMPEDANCE ROTATION ANGLES
    - IMPEDANCES
    - TIPPER PARAMETERS
    - COMPUTED PARAMETERS


In [11]:
## Test of creating a jupyter GUI to get this info
#We may want to loop through these and create an array instaed but for now I think it's easier to track user input this way'
#We may want to use a widgets library or get input from a google form. 
## Visit https://ipywidgets.readthedocs.io/en/stable/examples/Widget%20List.html
print('<title> Please enter the data release title')
drTitle = input()
print('<originator> Please enter the data release author(s) seperated by a comma')
drOriginators = input()
#Create originators array
drOriginatorsArray=drOriginators.split(',')

<title> Please enter the data release title
Title Test
<originator> Please enter the data release author(s) seperated by a comma
Phil B., John F., Frank J.
Data Release Larger Work Title
Larger Work Title


In [77]:
## Check the user input parameters
print('<citeinfo>')
print('   <title>' + drTitle + '</title>')
# Be sure to strip leading and trailing spaces from user entered originator values
for originator in drOriginatorsArray:
    originator = originator.strip()
    print ('   <origin>'+ originator + '</origin>')

<citeinfo>
   <title>Title Test</title>
   <origin>Phil B.</origin>
   <origin>John F.</origin>
   <origin>Frank J.</origin>


In [78]:
## Create editable keywords example.  
## Example text is created after running this cell
## This text is displayed by running "display(keywords) below
keywords = widgets.Textarea(
    value='          <keywords>\n                <theme>\n                  <themekt>ISO 19115 Topic Category</themekt>\n                  <themekey>biota</themekey>\n                </theme>\n      <theme>\n                  <themekt>None</themekt>\n                  <themekey>impedance</themekey>\n                   <themekey>tipper</themekey>\n                   <themekey>apparent resistivity</themekey>\n                   <themekey>impedance phase</themekey>\n                  <themekey>impedance strike</themekey>\n                  <themekey>MT</themekey>\n                  <themekey>audiomagnetotelluric</themekey>\n                  <themekey>magnetotelluric</themekey>\n                  <themekey>AMT</themekey>\n                  <themekey>sounding</themekey>\n                  <themekey>Geology, Geophysics, and Geochemistry Science Center</themekey>\n                  <themekey>GGGSC</themekey>\n                  <themekey>Mineral Resources Program</themekey>\n                  <themekey>MRP</themekey>\n                        </theme>\n                <theme>\n                  <themekt>USGS Thesaurus</themekt>\n                   <themekey>Magnetic field (earth)</themekey>\n                   <themekey>Geophysics</themekey>\n                  <themekey>GPS measurement</themekey>\n                  <themekey>Electromagnetic surveying</themekey>\n                  <themekey>Magnetic surveying</themekey>\n                </theme>\n                <place>\n                  <placekt>USGS Geographic Names Information System (GNIS)</placekt>\n        <placekey>New Mexico</placekey>\n        <placekey>Rio Grande del Norte National Monument</placekey>\n        <placekey>{county}</placekey>\n        <placekey>Rio Grande</placekey>\n      </place>\n    </keywords>\n',
    placeholder='Type something',
    #description='String:',
    layout=Layout(width='100%', height='666px'),
    disabled=False
)
print ('Keywords list created.')

Keywords list created.


### Change the text in the textbox below to relflect what should be included as the key words for all child items
**Please leave the {county} tag as is.  This value will be filled in from the edi file later**

In [79]:
# Run this cell for key word text to edit.  
# Edit the text in place.  
# When complete move on to the next step

display(keywords)

## Lets now import and index values from the EDI Files
- We need these values for the metadata template.  
- We also want to run stats on some of these values for the entity and attributes section

In [91]:
#Load EDI File and Read It
ediFile = open(ediPath, 'r')
ediContent = ediFile.read()
ediFile.close()
print(ediContent)


>HEAD                                                                           
                                                                                
  DATAID="Wheeler Peak"                                                         
  ACQBY=USGS                                                                    
  ACQDATE=2009-07-21
  STATE="New Mexico"                                                            
  COUNTY=Taos                                                                   
  UNITS=M                                                                       
  STDVERS=1.0                                                                   
  PROGVERS=GEOTOOLS_2.3                                                         
  PROGDATE=09/16/94                                                             
                                                                                
>INFO   MAXLINES=1000                                                           
       

In [95]:
#Now assign values to the SB MetaDataWizard Template unknowns
list_ = ediContent.splitlines()
list_length = len (list_)

# there are probally easier ways to loop through the below but I like having it all hard coded upfront
# it's easire to track an change for me
for X in list_:
  if "ProductId" in X:
    productArray = X.split('=')
    productIdArray = productArray[1].split('.')
    productId = productIdArray[0]
    # We may want to reformat this are parse out this name further for use with a root name based on the Data Release Title?
    productId = productId.replace("-", " ")
    productId = productId.replace("_", " ")
    print ('Child Title: ' + productId)
  if "ExternalUrl Url" in X:
    externalURLArray = X.split('=')
    externalURL = externalURLArray[1]
    print ('<onlink>: ' + externalURL)
  if "STATE" in X:
    stateArray = X.split('=')
    state = stateArray[1].replace('"', "") #remove quotes around state
    print ('State: ' + state)
  if "COUNTY" in X:
    countyArray = X.split('=')
    county = countyArray[1]
    print ('County: ' + county)
  if "Attachment Filename" in X and "http" in X:
    lgwrklinkArray = X.split('=')
    lgwrklink = lgwrklinkArray[1]
    print ('Attachment Filename Link: ' + lgwrklink)
  if "Citation Title" in X:
    citTitArray = X.split('=')
    citTit = citTitArray[1]
    print ('Citation Title: ' + citTit)
  if "Citation Authors" in X:
    citNamesArray = X.split('=')
    citAuthorsArray = citNamesArray[1].split(',')
    for author in citAuthorsArray:
     author = author.strip()
     print ('Author: '+ author)
  if "Citation Year" in X:
    citYearArray = X.split('=')
    citYear = citYearArray[1]
    print ('Citation Year: ' + citYear)
  if "YearCollected" in X:
    yearColArray = X.split('=')
    yearCol = yearColArray[1]
    print ('Year Collected: ' + yearCol)
  if "Ellipsoid" in X:
    ellipsoidArray = X.split('=')
    ellipsoid = ellipsoidArray[1]
    print ('Ellipsoid: ' + ellipsoid)
  if "Location datum" in X:
    locDatumArray = X.split('=')
    locDatum = locDatumArray[1]
    print ('Local datum: ' + locDatum)
  if "SITE LATITUDE" in X:
    sitLatArray = X.split('=')
    sitLat = sitLatArray[1] # !!! probally need to reformat this to have only 6 significant digits !!!
    print ('Site latitude: ' + sitLat)
  if "SITE LONGITUDE" in X:
    sitLonArray = X.split('=')
    sitLon = sitLonArray[1] # !!! probally need to reformat this to have only 6 significant digits !!!
    print ('Site longitude: ' + sitLon)
  if "Elevation units" in X:
    elevationStringArray = X.split('=')
    siteElevation = elevationStringArray[2] 
    print ('Site Elevation: ' + siteElevation)
    elevationUnits = elevationStringArray[1].replace('"', "")
    print ('Elevation Units: ' + elevationUnits)
    
# Code below returns values that occupy more than one line
    
for i in range(list_length):
 value = list_[i] 
 if value.replace(" ", "") == 'SurveyPurposeDescription:':
   startIndPurpose = i + 1
   #print ('startIndPurpose: ' + str(startIndPurpose))
 if value.replace(" ", "") == 'DataDescription:':
   endIndPurpose = i - 1
   #print ('endIndPurpose: ' + str(endIndPurpose))
purpose = list_[startIndPurpose]
for j in range(startIndPurpose + 1,endIndPurpose): 
    purpose = purpose + list_[j]
    purposeClean = re.sub(' +', ' ',purpose)
print ('Purpose: ' + purposeClean)

for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == 'DataDescription:':
   startIndDescription = k + 1
   #print ('startIndDescription: ' + str(startIndDescription))
 if value.replace(" ", "") == 'FILECREATOR:':
   endIndDescription = k - 9
   #print ('endIndDescription: ' + str(endIndDescription))
description = list_[startIndDescription]
for l in range(startIndDescription + 1,endIndDescription): 
    description = description + list_[l]
    descriptionClean = re.sub(' +', ' ',description)
print ('Description: ' + descriptionClean)
    

State: New Mexico                                                            
County: Taos                                                                   
Child Title: USA New Mexico Rio Grande Rift San Luis Basin 2009 AMT01
<onlink>: https://doi.org/10.5066/F72F7MQ7 
Attachment Filename Link: https://pubs.usgs.gov/of/2011/1264/report/OF11-1264.pdf    
Citation Title: Audiomagnetotelluric data, Taos Plateau Volcanic Field, New Mexico
Author: Chad E. Ailes
Author: Brian D. Rodriguez
Citation Year: 2011                                                             
Year Collected: 2009
Ellipsoid: Clarke 1866                                                          
Local datum: NAD27 CONUS                                                     
Site latitude: 36.752985000                                                     
Site longitude: -105.560966167                                                  
Site Elevation: 2608.00                                               
Elevation Units:

Entity and Attribute Values for the EDI file.  List !****FREQUENCIES****!,!****IMPEDANCE ROTATION ANGLES****!,!****IMPEDANCES****!,!****COMPUTED PARAMETERS****!

Here we load the frequencies
>!****FREQUENCIES****!

In [151]:
# Import entity and attributes - !****FREQUENCIES****! plan to break some of these individual chunks into objects/functions

# Get Range of Frequency Values in EDI File
for k in range(list_length):

 value = list_[k] 
 if value.replace(" ", "") == '>!****FREQUENCIES****!':
   startIndFrequencies = k + 3
   print ('startIndFrequencies: ' + str(startIndFrequencies))
 
 if value.replace(" ", "") ==  '>!****IMPEDANCEROTATIONANGLES****!':
   endIndFrequencies = k - 1
   print ('endIndFrequencies: ' + str(endIndFrequencies))

frequencyData = []
fdata = []
fdataTemp = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
frequencyDF = pd.DataFrame(fdata)
for j in range(startIndFrequencies,endIndFrequencies):
    fdataTemp = list_[j]
    fdataTemp = re.sub(' +', ' ',fdataTemp)
    fdataTemp = fdataTemp.split(" ")
    del fdataTemp[0]
    fdata = fdata + fdataTemp
    
print (fdata)  
fdata = np.array(fdata).astype(np.float) #convert String to floats
frequencyDF = pd.DataFrame(fdata,columns=['Frequencies'])
frequencyDF


startIndFrequencies: 292
endIndFrequencies: 298
['6.50000000E+03', '4.90000000E+03', '3.55000000E+03', '2.73000000E+03', '2.20000000E+03', '1.87000000E+03', '1.50000000E+03', '1.17000000E+03', '8.85000000E+02', '7.20000000E+02', '5.80000000E+02', '4.60000000E+02', '3.40000000E+02', '2.70000000E+02', '2.10000000E+02', '1.72399994E+02', '1.50000000E+02', '1.22099998E+02', '1.00000000E+02', '8.59400024E+01', '7.90000000E+01', '6.00600014E+01', '4.15000000E+01', '2.83199997E+01', '1.90400009E+01', '1.22100000E+01', '7.32399988E+00', '4.39400005E+00']


Unnamed: 0,Frequencies
0,6500.0
1,4900.0
2,3550.0
3,2730.0
4,2200.0
5,1870.0
6,1500.0
7,1170.0
8,885.0
9,720.0


In [148]:
# Now lets get the stats of the frequency data
#Make Array of Max Vallues
frequencyMax = frequencyDF[('Frequencies')].max()
print ('Max. Frequency: ' + str(frequencyMax))
frequencyMin = frequencyDF[('Frequencies')].min()
print ('Min. Frequency: ' + str(frequencyMin))

Max. Frequency: 6500.0
Min. Frequency: 4.39400005


Here we load the Impedance Rotation Angles
>!****IMPEDANCE ROTATION ANGLES****!

In [153]:
# Import entity and attributes - !****IMPEDANCE ROTATION ANGLES****! plan to break some of these individual chunks into objects/functions

# Get Range of Frequency Values in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****IMPEDANCEROTATIONANGLES****!':
   startIndROT = k + 3
   print ('startIndROT: ' + str(startIndROT))
 
 if value.replace(" ", "") ==  '>!****IMPEDANCES****!':
   endIndROT = k - 1
   print ('endIndROT: ' + str(endIndROT))

rdata = []
rdataTemp = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
rotationDF = pd.DataFrame(rdata)
for j in range(startIndROT,endIndROT):
    rdataTemp = list_[j]
    rdataTemp = re.sub(' +', ' ',rdataTemp)
    rdataTemp = rdataTemp.split(" ")
    del rdataTemp[0]
    rdata = rdata + rdataTemp
    
print (rdata)  
rdata = np.array(rdata).astype(np.float) #convert String to floats
rotationDF = pd.DataFrame(rdata,columns=['ZROT'])
rotationDF

startIndROT: 302
endIndROT: 308
['0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00', '0.00000000E+00']


Unnamed: 0,ZROT
0,0.0
1,0.0
2,0.0
3,0.0
4,0.0
5,0.0
6,0.0
7,0.0
8,0.0
9,0.0


In [155]:
# Now lets get the stats of the rotation data
#Make Array of Max Vallues
rotationMax = rotationDF[('ZROT')].max()
print ('Max. ZROT: ' + str(frequencyMax))
rotationMin = rotationDF[('ZROT')].min()
print ('Min. ZROT: ' + str(frequencyMin))

Max. ZROT: 6500.0
Min. ZROT: 4.39400005


Here we load the impedances
>!****IMPEDANCES****!

In [28]:
# Import entity and attributes - !****IMPEDANCES****! plan to break some of these individual chunks into objects/functions

# Get Range of Impedance Values in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****IMPEDANCES****!':
   startIndImpedances = k + 1
   print ('startIndImpedances: ' + str(startIndImpedances))
 
 if value.replace(" ", "") ==  '>!****TIPPERPARAMETERS****!':
   endIndImpedances = k - 1
   print ('endIndImpedances: ' + str(endIndImpedances))

#Construct Array of Channel Headers   
count = 0
impedanceLabel = []
impedanceData = []
data = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
impedanceDF = pd.DataFrame(data)
for l in range(startIndImpedances,endIndImpedances): 
    if list_[l][0] == '>':
     temp = list_[l].split(" ", 1)
     #print (temp)
     impedanceLabel.append((temp[0].split(">"))[1])
     dataTemp = list_[l+1]
     for j in range(l+2,l+8):
      dataTemp = dataTemp + list_[j]
      dataTemp = re.sub(' +', ' ',dataTemp)
     data = dataTemp.split(" ")
     del data[0]
     data = np.array(data).astype(np.float) #convert String to floats
     se = pd.Series(data)
     print ((temp[0].split(">"))[1])   
     impedanceDF[((temp[0].split(">"))[1])] = se.values
    
    count = count + 1

#impedanceDF = pd.DataFrame(data, columns=(impedanceLabel))
impedanceDF
#data
#se 

startIndImpedances: 310
endIndImpedances: 417
ZXXR
ZXXI
ZXX.VAR
ZXYR
ZXYI
ZXY.VAR
ZYXR
ZYXI
ZYX.VAR
ZYYR
ZYYI
ZYY.VAR


Unnamed: 0,ZXXR,ZXXI,ZXX.VAR,ZXYR,ZXYI,ZXY.VAR,ZYXR,ZYXI,ZYX.VAR,ZYYR,ZYYI,ZYY.VAR
0,-1034.64185,-470.741638,77675.1172,1310.89575,1485.89343,124129.258,-1041.802,124.148865,58523.8711,1096.18286,741.179688,93524.4766
1,-186.487137,-1.843341,95383.2188,-538.065002,358.139923,166608.844,232.887787,-225.325912,40724.8711,-148.396179,-412.495636,71135.4063
2,705.366882,-568.493835,52776.5898,2607.10547,725.450928,23843.0176,-1038.81592,-533.378906,42088.9336,56.658295,692.460388,19014.625
3,39.237244,-300.534912,58721.4453,499.906677,-102.019913,53571.4922,-101.110085,17.370205,39880.7617,-142.460007,208.576767,36383.1602
4,-173.708206,-108.13633,4034.93579,992.3797,-208.713364,13069.5557,154.865479,704.395508,11099.8057,-266.053741,1111.18359,35953.3633
5,-366.549561,-457.271545,5517.53516,1521.89392,423.235443,5077.17383,-504.352966,246.435242,2973.76416,-624.017212,65.458847,2736.42432
6,-642.188599,-1023.43903,1788.85681,1479.15466,981.460632,2718.04932,-255.760269,397.993866,3355.17944,143.025375,-414.32193,5097.97266
7,-270.525299,-465.956268,5237.50586,183.664383,-154.660202,3472.99194,-133.022064,91.446121,8720.07129,167.38649,251.904846,5782.28271
8,-152.127914,-132.579636,31455.8555,571.73468,387.82074,25928.2871,-110.850121,-0.469128,8548.54492,-81.162216,-112.738808,7046.35547
9,-676.03363,250.074554,62496.2813,1083.2146,636.968933,12231.9814,-260.402252,-769.099731,19806.7383,-164.157669,37.810867,3876.6416


In [16]:
# Now lets get the stats of the impedance data
#Make Array of Max Vallues
impedanceMax = []
for i in range (0,len(impedanceLabel)):
    impedanceMax.append(impedanceDF[(impedanceLabel[i])].max())
    
impedanceMin = []
for i in range (0,len(impedanceLabel)):
    impedanceMin.append(impedanceDF[(impedanceLabel[i])].min())

impedanceMin

[-1034.64185,
 -1023.43903,
 25.4506073,
 -538.065002,
 -208.713364,
 32.026619,
 -1041.802,
 -769.099731,
 41.5653725,
 -624.017212,
 -414.32193,
 41.1588516]

Here we load the tipper parameters
>!****TIPPER PARAMETERS****!

In [168]:
# Import entity and attributes - !****TIPPER PARAMETERS****! plan to break some of these individual chunks into objects/functions
# Probably will need two functions for this - one for a single list and one for the long lists with more than one column
# Get Range of Impedance Values in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****TIPPERPARAMETERS****!':
   startIndTipper = k + 1
   print ('startIndTipper: ' + str(startIndTipper))
 
 if value.replace(" ", "") ==  '>!****COMPUTEDPARAMETERS****!':
   endIndTipper = k - 1
   print ('endIndTipper: ' + str(endIndTipper))

#Construct Array of Channel Headers   
count = 0
tipperLabel = []
tipperData = []
tdata = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
tipperDF = pd.DataFrame(tdata)
for l in range(startIndTipper,endIndTipper): 
    if list_[l][0] == '>':
     ttemp = list_[l].split(" ", 1)
     #print (ttemp)
     tipperLabel.append((ttemp[0].split(">"))[1])
     tdataTemp = list_[l+1]
     for j in range(l+2,l+8):
      tdataTemp = tdataTemp + list_[j]
      tdataTemp = re.sub(' +', ' ',tdataTemp)
      tdata = tdataTemp.split(" ")
     #print (tdata)
     del tdata[0]
     tdata = np.array(tdata).astype(np.float) #convert String to floats
     te = pd.Series(tdata)
     print ((ttemp[0].split(">"))[1])   
     tipperDF[((ttemp[0].split(">"))[1])] = te.values
    
    count = count + 1

#tipperDF = pd.DataFrame(tdata, columns=(tipperLabel))
tipperDF
#tdata
#te 

startIndTipper: 419
endIndTipper: 472
TXR.EXP
TXI.EXP
TXVAR.EXP
TYR.EXP
TYI.EXP
TYVAR.EXP


Unnamed: 0,TXR.EXP,TXI.EXP,TXVAR.EXP,TYR.EXP,TYI.EXP,TYVAR.EXP
0,-0.073564,0.054538,0.000458,-0.180288,-0.053186,0.000731
1,-0.074559,-0.019775,0.001973,-0.345296,-0.068103,0.003446
2,0.017722,-0.085146,0.000991,-0.120415,-0.212934,0.000448
3,-0.069584,-0.172033,0.009644,0.056846,-0.415709,0.008798
4,-0.010868,-0.180419,0.001572,0.290241,-0.216499,0.005093
5,0.158014,-0.06294,0.00035,0.346909,0.081858,0.000322
6,0.070844,0.141715,0.058045,0.368168,0.298037,0.088196
7,0.113809,0.088858,0.003341,-0.279236,0.003966,0.002216
8,-0.000929,-0.178596,0.006435,-0.142478,0.094842,0.005304
9,-0.167724,-0.040422,0.011998,-0.099379,0.225684,0.002348


In [170]:
# Now lets get the stats of the tipper data

# Make Array of Max Values
tipperMax = []
for i in range (0,len(tipperLabel)):
    tipperMax.append(tipperDF[(tipperLabel[i])].max())
print ('Tipper Max: ' + str(tipperMax))    

# Make Array of Min Values
tipperMin = []
for i in range (0,len(tipperLabel)):
    tipperMin.append(tipperDF[(tipperLabel[i])].min())
print ('Tipper Min: ' + str(tipperMin))

Tipper Max: [0.186667979, 0.141714558, 0.0596302822, 0.368167609, 0.298036546, 0.127178863]
Tipper Min: [-0.175109223, -0.180419073, 5.88947296e-05, -0.494497716, -0.415709376, 5.79650114e-05]


Here we load the computed parameters
>!****COMPUTED PARAMETERS****!

In [173]:
# Import entity and attributes - !****COMPUTED PARAMETERS****! plan to break some of these individual chunks into objects/functions
# Probably will need two functions for this - one for a single list and one for the long lists with more than one column
# Get Range of Impedance Values in EDI File
for k in range(list_length):
 value = list_[k] 
 if value.replace(" ", "") == '>!****COMPUTEDPARAMETERS****!':
   startIndPar = k + 1
   print ('startIndPar: ' + str(startIndPar))
 
 if value.replace(" ", "") ==  '>END':
   endIndPar = k - 1
   print ('endIndPar: ' + str(endIndPar))

#Construct Array of Channel Headers   
count = 0
parLabel = []
parData = []
pdata = []
#Constuct a library of Headers and Values using Pandas, https://pandas.pydata.org/
parDF = pd.DataFrame(pdata)
for l in range(startIndPar,endIndPar): 
    if list_[l][0] == '>':
     ptemp = list_[l].split(" ", 1)
     #print (ptemp)
     parLabel.append((ptemp[0].split(">"))[1])
     pdataTemp = list_[l+1]
     for j in range(l+2,l+8):
      pdataTemp = pdataTemp + list_[j]
      pdataTemp = re.sub(' +', ' ',pdataTemp)
      pdata = pdataTemp.split(" ")
     #print (pdata)
     del pdata[0]
     pdata = np.array(pdata).astype(np.float) #convert String to floats
     pe = pd.Series(pdata)
     print ((ptemp[0].split(">"))[1])   
     parDF[((ptemp[0].split(">"))[1])] = te.values
    
    count = count + 1

parDF
#pdata
#pe 

startIndPar: 474
endIndPar: 833
RHOROT
RHOXX
RHOXX.ERR
RHOXY
RHOXY.ERR
RHOYX
RHOYX.ERR
RHOYY
RHOYY.ERR
PHSXX
PHSXX.ERR
PHSXY
PHSXY.ERR
PHSYX
PHSYX.ERR
PHSYY
PHSYY.ERR
TIPMAG
TIPMAG.ERR
TIPPHS
TIPPHS.ERR
ZSTRIKE
ZSKEW
TSTRIKE
COH
COH
COH
COH
EPREDCOH
EPREDCOH
SIGAMP
SIGAMP
SIGAMP
SIGAMP
SIGAMP
SIGNOISE
SIGNOISE
SIGNOISE
SIGNOISE
SIGNOISE


Unnamed: 0,RHOROT,RHOXX,RHOXX.ERR,RHOXY,RHOXY.ERR,RHOYX,RHOYX.ERR,RHOYY,RHOYY.ERR,PHSXX,...,TIPMAG.ERR,TIPPHS,TIPPHS.ERR,ZSTRIKE,ZSKEW,TSTRIKE,COH,EPREDCOH,SIGAMP,SIGNOISE
0,0.000731,0.000731,0.000731,0.000731,0.000731,0.000731,0.000731,0.000731,0.000731,0.000731,...,0.000731,0.000731,0.000731,0.000731,0.000731,0.000731,0.000731,0.000731,0.000731,0.000731
1,0.003446,0.003446,0.003446,0.003446,0.003446,0.003446,0.003446,0.003446,0.003446,0.003446,...,0.003446,0.003446,0.003446,0.003446,0.003446,0.003446,0.003446,0.003446,0.003446,0.003446
2,0.000448,0.000448,0.000448,0.000448,0.000448,0.000448,0.000448,0.000448,0.000448,0.000448,...,0.000448,0.000448,0.000448,0.000448,0.000448,0.000448,0.000448,0.000448,0.000448,0.000448
3,0.008798,0.008798,0.008798,0.008798,0.008798,0.008798,0.008798,0.008798,0.008798,0.008798,...,0.008798,0.008798,0.008798,0.008798,0.008798,0.008798,0.008798,0.008798,0.008798,0.008798
4,0.005093,0.005093,0.005093,0.005093,0.005093,0.005093,0.005093,0.005093,0.005093,0.005093,...,0.005093,0.005093,0.005093,0.005093,0.005093,0.005093,0.005093,0.005093,0.005093,0.005093
5,0.000322,0.000322,0.000322,0.000322,0.000322,0.000322,0.000322,0.000322,0.000322,0.000322,...,0.000322,0.000322,0.000322,0.000322,0.000322,0.000322,0.000322,0.000322,0.000322,0.000322
6,0.088196,0.088196,0.088196,0.088196,0.088196,0.088196,0.088196,0.088196,0.088196,0.088196,...,0.088196,0.088196,0.088196,0.088196,0.088196,0.088196,0.088196,0.088196,0.088196,0.088196
7,0.002216,0.002216,0.002216,0.002216,0.002216,0.002216,0.002216,0.002216,0.002216,0.002216,...,0.002216,0.002216,0.002216,0.002216,0.002216,0.002216,0.002216,0.002216,0.002216,0.002216
8,0.005304,0.005304,0.005304,0.005304,0.005304,0.005304,0.005304,0.005304,0.005304,0.005304,...,0.005304,0.005304,0.005304,0.005304,0.005304,0.005304,0.005304,0.005304,0.005304,0.005304
9,0.002348,0.002348,0.002348,0.002348,0.002348,0.002348,0.002348,0.002348,0.002348,0.002348,...,0.002348,0.002348,0.002348,0.002348,0.002348,0.002348,0.002348,0.002348,0.002348,0.002348


## Now lets get the range of values from the RSP values

In [17]:
#First Get the list of RSP files
rspList = glob.glob(os.path.join(mtStationPath, '*.RSP'),  recursive=True)
rspList

['C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\BF6-9621.RSP',
 'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\BF6-9624.RSP',
 'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\BF6-9625.RSP',
 'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\EF-9608X.RSP',
 'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\EF-9608Y.RSP']

## Now the raw Binary File Listing - this can be T files or W files
We will need to figure out the best way of filtering on thise - may need to build array and then delete AVG, dmp and edi file.

These are listed in the edi file as well but they are not all there.  

    ProcessingTimeSeriesUsed:
         wp01A1.bp1                                                                     
         wp01A2.bp1                                                                     
         wp01A1.sd6                                                                     
         wp01A2.sd6                                                                     
         wp01A1.sd7                                                                     
         wp01A2.sd8                                                                     
         wp01A2_3.sd9 

- Which files need to be included in the data release?
- What is the best way to get this listing?

In [18]:
#First Get the list of RSP files
binList = glob.glob(os.path.join(mtStationPath, 'WP*.*'),  recursive=True)
binList

['C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\WP01A1.BP1',
 'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\WP01A1.FC6',
 'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\WP01A1.FC7',
 'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\WP01A1.FC8',
 'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\WP01A1.FC9',
 'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\WP01A1.SD6',
 'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\WP01A1.SD7',
 'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\WP01A1.SD8',
 'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\WP01A1.SD9',
 'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\WP01A1.TS1',
 'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\WP01A2.BP1',
 'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\WP01A2.FC6',
 'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\WP01A2.FC8',
 'C:\\CurrentWork\\DataReleases\\SquamataMT_TEST\\AMT01\\WP01A2.FC9',
 'C:\\CurrentWork\\D

# Populate Metadata Template

In [58]:
#Load EDI File and Read It
metaData = os.path.join(mtMataDataTemplatePath, mtMataDataTemplateName)
xmlTemplateFile = open(metaData, 'r')
metaDataContent = xmlTemplateFile.read()
print(metaDataContent)
xmlTemplateFile.close()


<?xml version="1.0" encoding="UTF-8"?>
<metadata>
	<idinfo>
		<citation>
			<citeinfo>
				<origin>{origin}</origin>
				<pubdate>{pubdate}</pubdate>
				<title>{title}</title>
				<edition>{edition}</edition>
				<geoform>ASCII and Binary Digital Data</geoform>
				<pubinfo>
					<pubplace>Denver, CO</pubplace>
					<publish>U.S. Geological Survey</publish>
				</pubinfo>
				<othercit>{othercit}</othercit><!--Please add an Orcid ID here e.g., "Additional information about Originator: Rodriguez, B.D, http://orcid.org/0000-0002-2263-611X"-->
				<onlink>{onlink}</onlink>
				<lworkcit>
					<citeinfo>
						{BeginOriginLoop}<!--Place to print larger work originators here. Example is:
						<origin>Originating Author Name</origin> /carrage return (CR is &#13; and not &#10; which is LF)
						-->
						<pubdate>{lworkcit-pubdate}</pubdate>
						<title>{lworkcit-title}</title>
						<geoform>PDF</geoform>
						<serinfo>
							<sername>{lworkcit-sername}</sername>
							<issue>{lworkci

In [64]:
# Replace values of current metadata template with the appropriate values.  
# All of this input should have been defined when going through the steps outlined above.
for r in (('{title}', drTitle), ('{keywords}', keywords.value)):
    newMetaDataContent = metaDataContent.replace(*r)

print ('Creation of new metadata file is complete')   

Creation of new metadata file is complete


### At this point the new metadata file should be created.  Check the result below.  If this is ok save this file and run the loop for the rest of the children...

In [65]:
# Show the resulting child xml metadata file example 
print (newMetaDataContent)

<?xml version="1.0" encoding="UTF-8"?>
<metadata>
	<idinfo>
		<citation>
			<citeinfo>
				<origin>{origin}</origin>
				<pubdate>{pubdate}</pubdate>
				<title>{title}</title>
				<edition>{edition}</edition>
				<geoform>ASCII and Binary Digital Data</geoform>
				<pubinfo>
					<pubplace>Denver, CO</pubplace>
					<publish>U.S. Geological Survey</publish>
				</pubinfo>
				<othercit>{othercit}</othercit><!--Please add an Orcid ID here e.g., "Additional information about Originator: Rodriguez, B.D, http://orcid.org/0000-0002-2263-611X"-->
				<onlink>{onlink}</onlink>
				<lworkcit>
					<citeinfo>
						{BeginOriginLoop}<!--Place to print larger work originators here. Example is:
						<origin>Originating Author Name</origin> /carrage return (CR is &#13; and not &#10; which is LF)
						-->
						<pubdate>{lworkcit-pubdate}</pubdate>
						<title>{lworkcit-title}</title>
						<geoform>PDF</geoform>
						<serinfo>
							<sername>{lworkcit-sername}</sername>
							<issue>{lworkci

In [None]:
# Write new xml file to appropriate directory

# Additional stuff perhaps relevent in the future:
### Seems like there are many opurtunities to integrate this stuff with Google Apps (sheets, docs, forms, etc.).  For starters we can embed any Google app and cut and pastest stuff into jupyter text boxes.  Anyhow, an example of an html frame embeding a google app can be seen below:

In [75]:
## Test of getting this info from a Google Sheet
## Note that this text can be cut an pasted into a text box and saved as a text string value
#  Pretty Cool :)
widgets.HTML(
    value='<iframe src="https://docs.google.com/a/usgs.gov/document/d/e/2PACX-1vRYhK2g3AX5UPtiHdWaHDD9QgV4eLb1FWbAgHGnGfBz16mw3U9Ss08z-ziKPGJP_4SA289TZZ6bcCxl/pub?embedded=true"width="100%" height="333"></iframe>',
    placeholder='Some HTML',
    #description='Some HTML',
)

### Process Logging
We probally at some point want to trap errors and post them to an array or something to be called and listed in a file after the processing is complete.

In [None]:
#Process Logging

pl = os.path.join(training_materials_path, "Scratch_Workspace",'ProcessingLog.txt')
process_log = open(pl,'w') # Can also use 'append' mode

process_log.write(str(datetime.datetime.now()))
process_log.write("\nSomething was performed.")
process_log.write("\nSomething else was done.")
process_log.write("\nWe can record information about what a script was doing in a notes/processing file.")

process_log.close()

print ("Process log saved at:", pl)