# Calling CN.listObjects

This script demonstrates interacting with the [CN.listObjects](https://purl.dataone.org/architecture/apis/CN_APIs.html#CNRead.listObjects) method using the python client.

In [6]:
# include some utility data and methods
%run utilities.ipynb

# Import the library and create a client instance
from d1_client import baseclient_2_0

cn_base_url = "https://cn.dataone.org/cn"
client = baseclient_2_0.DataONEBaseClient_2_0(cn_base_url)

Specify that only five (5) results are to be returned in the request, and start from the first entry. Then call the `listObjects` method.

In [7]:
response = client.listObjects( count=5, start=0 )

print("XML Response:")
print(asXml(response, max_lines=25))

XML Response:
<?xml version="1.0" encoding="utf-8"?>
<ns1:objectList count="5" start="0" total="2896097" xmlns:ns1="http://ns.dataone.org/service/types/v1">
  <objectInfo>
    <identifier>0000120ce277dbb2e140d74b50ca23e5</identifier>
    <formatId>http://www.isotc211.org/2005/gmd-pangaea</formatId>
    <checksum algorithm="MD5">0000120ce277dbb2e140d74b50ca23e5</checksum>
    <dateSysMetadataModified>2018-04-20T07:59:45.497Z</dateSysMetadataModified>
    <size>19541</size>
  </objectInfo>
  <objectInfo>
    <identifier>000026213216f47287f0d3027f3c4be3</identifier>
    <formatId>http://www.isotc211.org/2005/gmd-pangaea</formatId>
    <checksum algorithm="MD5">000026213216f47287f0d3027f3c4be3</checksum>
    <dateSysMetadataModified>2018-04-20T05:09:36.311Z</dateSysMetadataModified>
    <size>26256</size>
  </objectInfo>
  <objectInfo>
    <identifier>0000aa6924377b6a7e5ab59bcec5d4f3</identifier>
    <formatId>http://www.isotc211.org/2005/gmd-pangaea</formatId>
    <checksum algorithm="MD5

Show the response, printing out each entry.

In [8]:
DATE_FORMAT = "%Y-%m-%dT%H:%M:%SZ"
from datetime import datetime as dt

def printResults(response):
    print("Total objects: {0} Start: {1}  Page size: {2}\n".format(response.total, response.start, response.count))
    counter = response.start
    for entry in response.objectInfo:
        print(u"{:08d}: ".format(counter))
        print(u"            PID: {0}".format(entry.identifier.value()))
        print(u"       formatId: {0}".format(entry.formatId))
        print(u"           size: {0}".format(entry.size))
        print(u"  date_modified: {0}".format(entry.dateSysMetadataModified.strftime(DATE_FORMAT)))
        print("")
        counter += 1

printResults(response)

Total objects: 2896097 Start: 0  Page size: 5

00000000: 
            PID: 0000120ce277dbb2e140d74b50ca23e5
       formatId: http://www.isotc211.org/2005/gmd-pangaea
           size: 19541
  date_modified: 2018-04-20T07:59:45Z

00000001: 
            PID: 000026213216f47287f0d3027f3c4be3
       formatId: http://www.isotc211.org/2005/gmd-pangaea
           size: 26256
  date_modified: 2018-04-20T05:09:36Z

00000002: 
            PID: 0000aa6924377b6a7e5ab59bcec5d4f3
       formatId: http://www.isotc211.org/2005/gmd-pangaea
           size: 35084
  date_modified: 2018-02-17T03:01:16Z

00000003: 
            PID: 0000d11ff42b22915fcce5cfa6027040
       formatId: http://www.isotc211.org/2005/gmd-pangaea
           size: 35257
  date_modified: 2018-01-06T10:43:32Z

00000004: 
            PID: 0000eb4ff1fc59ae6c33a4981e00eabf
       formatId: http://www.isotc211.org/2005/gmd-pangaea
           size: 49904
  date_modified: 2018-01-08T11:18:27Z



## Add a date filter

Add a `fromDate` parameter, so `listObjects` will respond with the list of entries that were modified between one day ago and now.

In [9]:
import dateparser

start_date = dateparser.parse('yesterday UTC', 
                              settings={'RETURN_AS_TIMEZONE_AWARE': True})

response = client.listObjects( 
    count=5, 
    start=0,
    fromDate=start_date
)

printResults( response )


Total objects: 1117 Start: 0  Page size: 5

00000000: 
            PID: 10.24431_rw1k46c_2020_8_6_191414
       formatId: http://www.isotc211.org/2005/gmd
           size: 45544
  date_modified: 2020-08-06T19:14:15Z

00000001: 
            PID: 10.24431_rw1k46d_2020_8_6_191523
       formatId: http://www.isotc211.org/2005/gmd
           size: 41501
  date_modified: 2020-08-06T19:15:23Z

00000002: 
            PID: 10.24431_rw1k46e_2020_8_6_191641
       formatId: http://www.isotc211.org/2005/gmd
           size: 41086
  date_modified: 2020-08-06T19:16:41Z

00000003: 
            PID: 10.24431_rw1k46f_2020_8_6_191834
       formatId: http://www.isotc211.org/2005/gmd
           size: 41501
  date_modified: 2020-08-06T19:18:34Z

00000004: 
            PID: 10.24431_rw1k46g_2020_8_6_191949
       formatId: http://www.isotc211.org/2005/gmd
           size: 41603
  date_modified: 2020-08-06T19:19:49Z



## Paging the response

The server will limit the total number of records returned. When requesting large sets of entries, the 
response will need to be examined to determine if additional pages of results should be requested.

In [10]:
start_date = dateparser.parse('two weeks ago UTC', 
                              settings={'RETURN_AS_TIMEZONE_AWARE': True})
end_date = dateparser.parse('one week ago UTC', 
                              settings={'RETURN_AS_TIMEZONE_AWARE': True})
max_to_retrieve = 25  # limit total numbe of entries to download

params = {
    "count": 3, #specify a small page size
    "start": 0,
    "fromDate": start_date,
    "toDate": end_date,
}
response = client.listObjects( **params )

if max_to_retrieve > response.total:
    max_to_retrieve = response.total

printResults( response )

num_retrieved = response.count
while num_retrieved < max_to_retrieve:
    params['start'] += response.count
    response = client.listObjects( **params )
    num_retrieved += response.count
    printResults( response )
    

Total objects: 1372 Start: 0  Page size: 3

00000000: 
            PID: 001d01a92d70ea24b4ab7e81c29858b9
       formatId: http://www.isotc211.org/2005/gmd
           size: 15674
  date_modified: 2020-07-29T19:51:44Z

00000001: 
            PID: 006b3a853bbd9c0acba54eed805cbe9c
       formatId: http://www.isotc211.org/2005/gmd
           size: 15024
  date_modified: 2020-07-30T13:24:02Z

00000002: 
            PID: 025399092cabfc7f09edd05ff7dacda9
       formatId: http://www.isotc211.org/2005/gmd
           size: 15444
  date_modified: 2020-07-30T13:27:34Z

Total objects: 1372 Start: 3  Page size: 3

00000003: 
            PID: 02663620-e06d-470a-90bf-83be66e5db41
       formatId: audio/mpeg
           size: 3610644
  date_modified: 2020-07-30T18:52:21Z

00000004: 
            PID: 0379b85f-dfe1-439a-91cd-67efd3384276
       formatId: audio/mpeg
           size: 4247614
  date_modified: 2020-07-30T18:46:25Z

00000005: 
            PID: 03a33d30fc0e8763f26ae3ac657931f2
       formatId: h