# USE CASE 001

- Get distinct values for the following fields in all* collections where they exist:
    - Repository
    - Repository Collection
    - Repository Collection Guide
- Write these values to a file for review


## Step 1: Get a list of collections with some needed info about each
Use [dmGetCollectionList](https://help.oclc.org/Metadata_Services/CONTENTdm/Advanced_website_customization/API_Reference/CONTENTdm_API/CONTENTdm_Server_API_Functions_-_dmwebservices?sl=en#dmGetCollectionList)  

**Signature**
```
http://yourCONTENTdmURL/digital/bl/dmwebservices/index.php?q=dmGetCollectionList/format
```
- Replace yourCONTENTdmURL with your institution's CONTENTdm Website URL.
- format is either xml or json

**Example**
```
http://yourCONTENTdmURL/digital/bl/dmwebservices/index.php?q=dmGetCollectionList/xml

```

In [None]:
# I'll need these libraries
import requests
import json

In [None]:
# make the API call
response = requests.get("http://digitalcollections.lib.washington.edu/digital/bl/dmwebservices/index.php?q=dmGetCollectionList/json")

I'm going to write this to a file so I can pick back up at this point if I want

In [None]:
# write response to a file
with open("uwlibs/allcolls.json", "w") as jsonFile01:
    jsonFile01.write(response.text)

I'll need the info I retrieved in a var for [step 2](#Step-2:-Get-field-info-per-collection) below, I also want to take a look at it

In [None]:
with open('uwlibs/allcolls.json', 'r') as file:
        data = file.read()
        collectionsjson = json.loads(data)

### ☝️ SIDEBAR: json.loads() and json.dumps()
These methods are doing some important work in loading json data from a file (just above) and writing json data to a file (below)

**REFERENCE**
- json: [JSON encoder and decoder](https://docs.python.org/3.8/library/json.html#module-json)
- json module > [json.dumps](https://docs.python.org/3.8/library/json.html#json.dumps)
- json module > [json.loads](https://docs.python.org/3.8/library/json.html#json.loads)

In [None]:
# confirm that this is a list of dictionaries using the built-in function type()
print(type(collectionsjson))
print(len(collectionsjson))
print(type(collectionsjson[0]))

for coll in collectionsjson:
    print(f"Name: {coll['name']} / Alias: {coll['secondary_alias']}")

## Step 2: Get field info per collection
Use [dmGetCollectionFieldInfo](https://help.oclc.org/Metadata_Services/CONTENTdm/Advanced_website_customization/API_Reference/CONTENTdm_API/CONTENTdm_Server_API_Functions_-_dmwebservices?sl=en#dmGetCollectionFieldInfo) 

**Signature**
```
http://yourCONTENTdmURL.com/digital/bl/dmwebservices/index.php?q=dmGetCollectionFieldInfo/alias/format
```
- Replace yourCONTENTdmURL with your institution's CONTENTdm Website URL
- alias is a collection alias
- format is either xml or json
- In JSON, Unicode characters in the field's name are converted to decimal Unicode entities. E.g., 題名 is converted to \u984c\u540d

**Example**
```
http://yourCONTENTdmURL.com/digital/bl/dmwebservices/index.php?q=dmGetCollectionFieldInfo/ctimes/xml
```

\*Note that the `.com` should *not* be included following your institution's CONTENTdm website URL

**For each collection**
- Does 'Repository' field exist? If yes, what is nickname?
- Does 'Repository Collection' field exist? If yes, what is nickname?
- Does 'Repository Collection Guide' field exist? If yes, what is nickname?

In [None]:
# I'll need the requests and json libs again

allcolls_allfields = []

# same for the API call for each collection
uwcdmurl = "digitalcollections.lib.washington.edu"
fmt = "json"

for coll in collectionsjson:
    dct = {}
    url = f"http://{uwcdmurl}/digital/bl/dmwebservices/index.php?q=dmGetCollectionFieldInfo/{coll['secondary_alias']}/{fmt}"
    response = requests.get(url)
    fielddata = json.loads(response.text)
    dct = {'alias': coll['secondary_alias'], 'field_data': fielddata}
    allcolls_allfields.append(dct)

### SIDEBAR - 💥 `JSONDecodeError`
***I don't understand it***  
...but it is stopping me from iterating through all collections to retrieve field information...  
...why is the error occurring where it is occurring, I wonder...

In [None]:
print(len(collectionsjson)) # 163 collections
print(len(allcolls_allfields)) # but I was only able to get the fields for 23 of these

As above, I'll write the collection-field info to a file so I can pick back up here later

In [None]:
# let's write all_coll_fields to a file
acafstring = json.dumps(allcolls_allfields)
with open("uwlibs/allcolls_allfields.json", "w") as jsonFile:
    jsonFile.write(acafstring)

## Step 4: Get field values for fields of interest
Use [dmGetCollectionWords](https://help.oclc.org/Metadata_Services/CONTENTdm/Advanced_website_customization/API_Reference/CONTENTdm_API/CONTENTdm_Server_API_Functions_-_dmwebservices?sl=en#dmGetCollectionWords)  

**Signature**
```
http://yourCONTENTdmURL.com/digital/bl/dmwebservices/index.php?q=dmGetCollectionWords/alias/fields/format

```

- Replace `yourCONTENTdmURL.com` with your institution's CONTENTdm Website URL.
- `alias` is a collection alias
- `fields` is a !-delimited list of field nicknames listing the fields for which the words should be returned. Can also be "all" for all fields.
- `format` is either xml or json

...still haven't gotten here yet--actions could include:

- Get values from each collection for each field of interest
- Add values to combined list
- Process combined list to retain only a list of distinct values
- Do other stuff, compare values across collections, ...

In [1]:
# picking back up, loading json from a file
import json
with open("uwlibs/allcolls_allfields.json", "r") as file:
    data = file.read()
    allfieldsjson = json.loads(data)

In [None]:
# testing 02
for coll in allfieldsjson:
    print(f"\n{coll['alias']}\n==============\n")
    for field in coll['field_data']:
        print(field['name'])

- fields_of_interest = []
- allcolls_somefields = allcolls_allfields
- for each coll in allcolls_somefields:
    - go through the 'field_data' list
    - for each dct in the 'field_data' list:
        - if the 'name' of the field is in fields_of_interest, pass
        - if the 'name' of the field is not in fields_of_interest, [remove()](https://www.programiz.com/python-programming/methods/list/remove)

In [2]:
# testing 03
list = ['egg', 'toast', 'coffee']
if 'egg' in list:
    print("we have eggs")
else:
    print("no eggs")

we have eggs
