## Notes

### XML

More verbose than JSON.  Allows mixing domains.  Elements with attributes.

JSON can represent anything XML can represent, but XML is "cleaner" in complex cases.  But still verbose.

Also XML has well standardized "XPath" mini-language for extracting from hierarchy, JSON alternatives more varied.

### SOAP

Simple Object Access Protocol => "simple" in title, i.e. not simple.  Part of an ecosystem:

WSDL Web Service Discovery Language(?)

RPC Remote Procedure Call

XSD XML Document Schema specification, defines types, "positive integers", and complex types, "ParameterType" (string plus name plus optional type and encoding).

XSLT XML Document transformation language (not seen here)

Mostly replaced by REST HTTP / JSON APIs, 

### Issues

No R SOAP client?  Can use templates from Python version.
- R XML lib. to build / disassemble XML docs.
- RCurl or similar to send / receive

One year / analyte per query, given XML overhead, kind of low performance.

Tokens last ~1hr, would need something else for distribution.

Also - 1hr long enough for all analytes / years?  Probably.

In [None]:
# Python SOAP library
%pip install zeep

In [29]:
import re
def show_response(response):
    """For debugging, shows request / response with redacted token."""
    print("\n".join(f"{k}: {v}" for k, v in response.request.headers.items()))
    print()
    print(re.sub("(?<=csm:.....)[-_A-Za-z0-9.]+", '...', response.request.body.decode("utf8")))
    print(response.content.decode("utf8"))

In [30]:
from zeep import Client

# client = Client('https://cdxnodengn.epa.gov/cdx-enws10/services/NetworkNodePortType_V10?wsdl')
client = Client("https://cdxnodengn.epa.gov/ngn-enws20/services/NetworkNode2Service?wsdl")

In [31]:
# show functions / attributes of client
[i for i in dir(client) if not i.startswith("_")]

['bind',
 'create_message',
 'create_service',
 'get_element',
 'get_type',
 'namespaces',
 'plugins',
 'service',
 'set_default_soapheaders',
 'set_ns_prefix',
 'settings',
 'transport',
 'type_factory',
 'wsdl',
 'wsse']

In [32]:
client.namespaces

{'xsd': 'http://www.w3.org/2001/XMLSchema',
 'ns0': 'http://www.exchangenetwork.net/schema/node/2',
 'ns1': 'http://www.w3.org/2005/05/xmlmime'}

In [33]:
# show functions / attributes of client.service
[i for i in dir(client.service) if not i.startswith("_")]

['Authenticate',
 'Download',
 'Execute',
 'GetServices',
 'GetStatus',
 'NodePing',
 'Notify',
 'Query',
 'Solicit',
 'Submit']

In [34]:
# read password without displaying it
import getpass
cred = getpass.getpass()

 ········


In [42]:
# Valid authenticationMethod values are: [password, digest, certificate, xkms, hmac]
userId = "TERRYNBROWN"
# with client.settings(raw_response=True):
#     pass  # this code used to extract raw result for template
# This is a Remote Procedure Call (RPC) calling a function on the remote system
token = client.service.Authenticate(userId=userId, credential=cred, authenticationMethod='password')
token

'csm:eyJjdHkiOiJKV1QiLCJlbmMiOiJBMTI4Q0JDLUhTMjU2IiwiYWxnIjoiZGlyIn0..Oig7zPfECixFCgwRco_kxw.6hbXfmPieXWYzco0lvw-PkdL9BAtcEFdD5DwT_0Gdmk5IsY9l3CuVxirwdSQyY3Quo83cAcH9nIJjSCJPJE8iWqMP6p7s3xFnEarrsa5VVIJpvtnM4aAXw_Gq8I_-61OG-cDkQmHzeJsWnEoMK4cbib6Sabtw7q2RGvV3SNXOAif05B2m1hi3Kfm8SgjW8TT8iJ4it_SNoytebPD2h2EC98zkXd9nh6vtTDhbuepfDWu6KECDZNCv8Yh8QXPDWV5lc9bf_pnCGlXgGYq13stdlCjGZyu_6p9eFFL3fv6RPngOeD2pfbgePFc-1TY5TJjayLKRFxEMoV1TKvcwHIM0YTF3rkKM1iToZ4Jivm9DcQizJWmBcKLO_lGRIW48N4e.i4LqPZMY71HCm0OCQvYAcg'

In [None]:
show_response(token)

Request, note `SOAPAction` in header, maybe required.

```xml
User-Agent: Zeep/4.2.1 (www.python-zeep.org)
Accept-Encoding: gzip, deflate, br, zstd
Accept: */*
Connection: keep-alive
SOAPAction: ""
Content-Type: application/soap+xml; charset=utf-8; action=""
Cookie: JSESSIONID=63751690DC5CDF49F835014AE28C004E; cdx-prod-coreservices=1707921987.09.32.283202|aabbe373016d0cf673fd1f826041c98d
Content-Length: 495

<?xml version='1.0' encoding='utf-8'?>
<soap-env:Envelope xmlns:soap-env="http://www.w3.org/2003/05/soap-envelope">
  <soap-env:Body>
    <ns0:Authenticate xmlns:ns0="http://www.exchangenetwork.net/schema/node/2">
      <ns0:userId>{{USERNAME}}</ns0:userId>
      <ns0:credential>{{PASSWORD}}</ns0:credential>
      <ns0:domain xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
      <ns0:authenticationMethod>password</ns0:authenticationMethod>
    </ns0:Authenticate>
  </soap-env:Body>
</soap-env:Envelope>
```

Response - "multi-part/mime" (Multipurpose Internet Mail Extensions).

```xml
--uuid:8ce7a3c2-f555-4561-aa2f-7d9d439a4a5c
Content-Type: application/xop+xml; charset=UTF-8; type="application/soap+xml"
Content-Transfer-Encoding: binary
Content-ID: <root.message@cxf.apache.org>

<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
  <soap:Body>
    <AuthenticateResponse xmlns="http://www.exchangenetwork.net/schema/node/2" xmlns:xmime="http://www.w3.org/2005/05/xmlmime">
      <securityToken>      csm:eyJjdHkiOiJKV1QiLCJlbmMiOiJBMTI4Q0JDLUhTMjU2IiwiYWxnIjoiZGlyIn0..DqRnXQdpyK2rmHCDhDtzOw.8Gunm5UbfHaKmsEONbfLxAKwuqwuaPRSya-D-QFPXpVJAsxBGn4tDJUu4wpvsp37nrpCtthfBaZBMZS7m_OQBbYGty_sZlE5RFzg7QGWb6XjwpW6u0ZShYudpZ9SzEQNv5ATyMtOppqu486y3vh3AS4ZGvGP0jUZxfYN6
g58rusxSJyKIVUFH0IP9AlbjBmQnHYN34fqxg7_7miEDFgJb-PwN3RFW4-u7ckexdvWG7XXQrP8e0Od72NHnZ8NIalf-zeIxF-Kt-D-GasxYeAt04nhW3j29WLatb6wX
AoetFtoVMcbnHpwKtv-3SlkBAlwO2WqAcQFdNxjK0o03CvDvhLvhEZvBn6P39OPAjWuXEE3bBbQqs8BzUY1p-562FWY.UbxsuqUs82mgTds7IRXAkw
      </securityToken>
    </AuthenticateResponse>
  </soap:Body>
</soap:Envelope>
--uuid:8ce7a3c2-f555-4561-aa2f-7d9d439a4a5c--
```

In [36]:
client.service.Query.__doc__

'Query(securityToken: xsd:string, dataflow: xsd:NCName, request: xsd:string, rowId: xsd:integer, maxRows: xsd:integer, parameters: ns0:ParameterType[]) -> rowId: xsd:integer, rowCount: xsd:integer, lastSet: xsd:boolean, results: ns0:GenericXmlType'

In [37]:
ParamType = client.get_type("ns0:ParameterType")
ParamType(2010, "Year", "integer", "utf8")  # not using last two

{
    '_value_1': 2010,
    'parameterName': 'Year',
    'parameterType': 'integer',
    'parameterEncoding': 'utf8'
}

In [54]:
# with client.settings(raw_response=True):
#     pass

response = client.service.Query(
    securityToken=token, 
    request='GetWaterQualityResults_v1',
    rowId=0,  # not used
    maxRows=500,  # not used
    parameters=[
        ParamType(2010, "Year"),
        ParamType("Cond", "AnalyteCode"),
        ParamType("Temp", "AnalyteCode"),
        # alternative:
        # {'_value_1': "2010", 'parameterName': 'Year', },        
        # {'_value_1': "Cond", 'parameterName': 'AnalyteCode', },        
    ],
    dataflow="GLENDA",
    )

In [None]:
# show_response(response)

```xml
User-Agent: Zeep/4.2.1 (www.python-zeep.org)
Accept-Encoding: gzip, deflate, br, zstd
Accept: */*
Connection: keep-alive
SOAPAction: ""
Content-Type: application/soap+xml; charset=utf-8; action=""
Cookie: JSESSIONID=63751690DC5CDF49F835014AE28C004E; cdx-prod-coreservices=1707921987.09.32.283202|aabbe373016d0cf673fd1f826041c98d
Content-Length: 1098

<?xml version='1.0' encoding='utf-8'?>
<soap-env:Envelope xmlns:soap-env="http://www.w3.org/2003/05/soap-envelope">
  <soap-env:Body>
    <ns0:Query xmlns:ns0="http://www.exchangenetwork.net/schema/node/2">
      <ns0:securityToken>csm:eyJjd...</ns0:securityToken>
      <ns0:dataflow>GLENDA</ns0:dataflow>
      <ns0:request>GetWaterQualityResults_v1</ns0:request>
      <ns0:rowId>0</ns0:rowId>
      <ns0:maxRows>500</ns0:maxRows>
      <ns0:parameters parameterName="Year" parameterEncoding="None">2010</ns0:parameters>
      <ns0:parameters parameterName="AnalyteCode" parameterEncoding="None">Cond</ns0:parameters>
    </ns0:Query>
  </soap-env:Body>
</soap-env:Envelope>

--uuid:d7b64491-ce50-4d34-a0d7-03086c14ff82
Content-Type: application/xop+xml; charset=UTF-8; type="application/soap+xml"
Content-Transfer-Encoding: binary
Content-ID: <root.message@cxf.apache.org>

<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
  <soap:Body>
    <QueryResponse xmlns="http://www.exchangenetwork.net/schema/node/2" xmlns:xmime="http://www.w3.org/2005/05/xmlmime">
      <rowId>0</rowId>
      <rowCount>0</rowCount>
      <lastSet>true</lastSet>
      <results format="XML">
        <node2:GLENDA xmlns:node2="http://www.exchangenetwork.net/schema/glenda/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.exchangenetwork.net/schema/glenda/1 http://www.exchangenetwork.net/schema/glenda/1/">
          <node2:Result>
            <node2:ProgramTypeCode>WQSF</node2:ProgramTypeCode>
            <node2:GreatLakeCode>Erie</node2:GreatLakeCode>
            <node2:CruiseIdentifier>ER1011</node2:CruiseIdentifier>
            <node2:VisitIdentifier>E009M10</node2:VisitIdentifier>
            <node2:StationIdentifier>ER09</node2:StationIdentifier>
            <node2:ResultLocation>
              <node2:ResultLatitudeMeasure>42.538283</node2:ResultLatitudeMeasure>
              <node2:ResultLongitudeMeasure>-79.616800</node2:ResultLongitudeMeasure>
            </node2:ResultLocation><node2:ResultSamplingDate>2010-04-10</node2:ResultSamplingDate>
            <node2:ResultStationDepthMeasure>
              <node2:MeasureValue>50.1000</node2:MeasureValue>
              <node2:MeasureUnitCode>meters</node2:MeasureUnitCode>
            </node2:ResultStationDepthMeasure><node2:ResultSampleDepthMeasure>
              <node2:MeasureValue/>
              <node2:MeasureUnitCode>meters</node2:MeasureUnitCode>
            </node2:ResultSampleDepthMeasure>
            <node2:ResultDepthCode>Synthetic Sample</node2:ResultDepthCode>
            <node2:ResultMediumName>surface water</node2:ResultMediumName>
            <node2:ResultSampleTypeName>Individual QC</node2:ResultSampleTypeName>
            <node2:ResultQualityControlTypeName>field blank</node2:ResultQualityControlTypeName>
            <node2:ResultSampleIdentifier>10GC20R80</node2:ResultSampleIdentifier>
            <node2:ResultAnalyteCode>Cond</node2:ResultAnalyteCode>
            <node2:ResultAnalyteText>Conductivity</node2:ResultAnalyteText>
            <node2:ResultMeasure>
              <node2:MeasureValue>2.095</node2:MeasureValue>
              <node2:MeasureUnitCode>umho/cm</node2:MeasureUnitCode>
            </node2:ResultMeasure>
            <node2:ResultSampleFractionText>Total/Bulk</node2:ResultSampleFractionText>
            <node2:ResultMethodIdentifier>LG500</node2:ResultMethodIdentifier>
            <node2:ResultRemarkText>Field Reagent Blank, failed</node2:ResultRemarkText>
          </node2:Result>
          <node2:Result>
             ...
```

In [55]:
root = response['results']['_value_1']  # an ElementTree element (XML node)

In [None]:
# etree.dump(root)

In [56]:
# Make data frame
from lxml import etree
NS = {"n2": "http://www.exchangenetwork.net/schema/glenda/1"}
rows = root.xpath("//n2:Result", namespaces=NS)
# Flatten nested elements
fields = []
for elem in rows[0]:
    if len(elem) > 0:  # has children
        fields.extend(
            etree.QName(elem).localname + "_" + etree.QName(child).localname
            for child in elem
        )
    else:
        fields.append(etree.QName(elem).localname)

print("\n".join(fields))
        
# .//* recurses into nested lat/lon and value/unit elements, len(elem) == 0 selects the leaf nodes
import pandas as pd
df = pd.DataFrame(
    [
        [elem.text for elem in row.xpath(".//*") if len(elem) == 0]
        for row in rows
    ],
    columns=fields,
)
df

ProgramTypeCode
GreatLakeCode
CruiseIdentifier
VisitIdentifier
StationIdentifier
ResultLocation_ResultLatitudeMeasure
ResultLocation_ResultLongitudeMeasure
ResultSamplingDate
ResultStationDepthMeasure_MeasureValue
ResultStationDepthMeasure_MeasureUnitCode
ResultSampleDepthMeasure_MeasureValue
ResultSampleDepthMeasure_MeasureUnitCode
ResultDepthCode
ResultMediumName
ResultSampleTypeName
ResultQualityControlTypeName
ResultSampleIdentifier
ResultAnalyteCode
ResultAnalyteText
ResultMeasure_MeasureValue
ResultMeasure_MeasureUnitCode
ResultSampleFractionText
ResultMethodIdentifier
ResultRemarkText


Unnamed: 0,ProgramTypeCode,GreatLakeCode,CruiseIdentifier,VisitIdentifier,StationIdentifier,ResultLocation_ResultLatitudeMeasure,ResultLocation_ResultLongitudeMeasure,ResultSamplingDate,ResultStationDepthMeasure_MeasureValue,ResultStationDepthMeasure_MeasureUnitCode,...,ResultSampleTypeName,ResultQualityControlTypeName,ResultSampleIdentifier,ResultAnalyteCode,ResultAnalyteText,ResultMeasure_MeasureValue,ResultMeasure_MeasureUnitCode,ResultSampleFractionText,ResultMethodIdentifier,ResultRemarkText
0,WQSF,Erie,ER1011,E009M10,ER09,42.538283,-79.616800,2010-04-10,50.1000,meters,...,INSITU_MEAS,,999994404823,Temp,Temperature,4.8,C,Not applicable,,
1,WQSF,Erie,ER1011,E009M10,ER09,42.538283,-79.616800,2010-04-10,50.1000,meters,...,Individual,routine field sample,10GC20S81,Temp,Temperature,1.7,C,Not applicable,,
2,WQSF,Erie,ER1011,E009M10,ER09,42.538283,-79.616800,2010-04-10,50.1000,meters,...,Individual,routine field sample,10GC20S8A,Temp,Temperature,1.7,C,Not applicable,,
3,WQSF,Erie,ER1011,E009M10,ER09,42.538283,-79.616800,2010-04-10,50.1000,meters,...,Individual,routine field sample,10GC20S8B,Temp,Temperature,1.7,C,Not applicable,,
4,WQSF,Erie,ER1011,E009M10,ER09,42.538283,-79.616800,2010-04-10,50.1000,meters,...,Individual,routine field sample,10GC20S8C,Temp,Temperature,1.7,C,Not applicable,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1632,WQSF,Superior,SU1021,S0FEG10,SUFE,46.916667,-90.416617,2010-08-23,13.2000,meters,...,Individual,routine field sample,10GS51S61,Temp,Temperature,18.8,C,Not applicable,,
1633,WQSF,Superior,SU1021,S0FEG10,SUFE,46.916667,-90.416617,2010-08-23,13.2000,meters,...,Individual,routine field sample,10GS51S62,Temp,Temperature,18.3,C,Not applicable,,
1634,WQSF,Superior,SU1021,S0FEG10,SUFE,46.916667,-90.416617,2010-08-23,13.2000,meters,...,Individual,routine field sample,10GS51S71,Temp,Temperature,15.5,C,Not applicable,,
1635,WQSF,Superior,SU1021,S0FEG10,SUFE,46.916667,-90.416617,2010-08-23,13.2000,meters,...,Composite,routine field sample,10GS51I72,Temp,Temperature,18.8,C,Not applicable,,


In [None]:
# show_response(response)