# USE CASE 001

- Get distinct values for the following fields in all* collections where they exist:
    - Repository
    - Repository Collection
    - Repository Collection Guide
- Write these values to a file for review


## Step 1: Get a list of collections with some needed info about each
Use [dmGetCollectionList](https://help.oclc.org/Metadata_Services/CONTENTdm/Advanced_website_customization/API_Reference/CONTENTdm_API/CONTENTdm_Server_API_Functions_-_dmwebservices?sl=en#dmGetCollectionList)  

**Signature**
```
http://yourCONTENTdmURL/digital/bl/dmwebservices/index.php?q=dmGetCollectionList/format
```
- Replace yourCONTENTdmURL with your institution's CONTENTdm Website URL.
- format is either xml or json

**Example**
```
http://yourCONTENTdmURL/digital/bl/dmwebservices/index.php?q=dmGetCollectionList/xml

```

In [None]:
import requests
import json

response = requests.get("http://digitalcollections.lib.washington.edu/digital/bl/dmwebservices/index.php?q=dmGetCollectionList/json")
json_response = json.loads(response.text)
for object in json_response:
    print(object)
# note that this doesn't write to file, just shows (lengthy) results

- I already have a collection list as a `.json` file
- I'll put this in a var (as a list of dictionaries) for use

In [1]:
import json
with open('uwlibs/getCollectionList.json', 'r') as file:
        data = file.read()
        collectionsjson = json.loads(data)
# and okay I'm done with the file now so I can dedent

In [18]:
# quick check - I believe this is in fact a list of dictionaries...
print(f"""
Alias: {collectionsjson[3]['secondary_alias']}
Name: {collectionsjson[3]['name']}
""")

# that worked and gave me some info, but why not just use the built-in function type()!?

print(type(collectionsjson))
print(type(collectionsjson[0]))
# okay, yes, a list (of dictionaries)


Alias: lctext
Name: American Indians of the Pacific Northwest -- Textual Portion

<class 'list'>
<class 'dict'>


## Step 2: Get field info per collection
Use [dmGetCollectionFieldInfo](https://help.oclc.org/Metadata_Services/CONTENTdm/Advanced_website_customization/API_Reference/CONTENTdm_API/CONTENTdm_Server_API_Functions_-_dmwebservices?sl=en#dmGetCollectionFieldInfo) 

**Signature**
```
http://yourCONTENTdmURL.com/digital/bl/dmwebservices/index.php?q=dmGetCollectionFieldInfo/alias/format
```
- Replace yourCONTENTdmURL with your institution's CONTENTdm Website URL
- alias is a collection alias
- format is either xml or json
- In JSON, Unicode characters in the field's name are converted to decimal Unicode entities. E.g., 題名 is converted to \u984c\u540d

**Example**
```
http://yourCONTENTdmURL.com/digital/bl/dmwebservices/index.php?q=dmGetCollectionFieldInfo/ctimes/xml
```

\*Note that the `.com` should *not* be included following your institution's CONTENTdm website URL

**For each collection**
- Does 'Repository' field exist? If yes, what is nickname?
- Does 'Repository Collection' field exist? If yes, what is nickname?
- Does 'Repository Collection Guide' field exist? If yes, what is nickname?

In [9]:
import requests
import json

all_coll_fields = []

# these are the same for each collection
uwcdmurl = "digitalcollections.lib.washington.edu"
fmt = "json"

for coll in collectionsjson:
    dct = {}
    url = f"http://{uwcdmurl}/digital/bl/dmwebservices/index.php?q=dmGetCollectionFieldInfo/{coll['secondary_alias']}/{fmt}"
    response = requests.get(url)
    fielddata = json.loads(response.text)
    dct = {'alias': coll['secondary_alias'], 'field_data': fielddata}
    all_coll_fields.append(dct)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

### SIDEBAR - 💥 `JSONDecodeError`
***I don't understand it***  
...but it is stopping me from iterating through all collections to retrieve field information...

In [19]:
print(collectionsjson[0]['alias'])
print(all_coll_fields[0]['alias'])
print(len(collectionsjson)) # 163 collections
print(len(all_coll_fields)) # but I was only able to get the fields for 23 of these

/alaskawcanada
alaskawcanada
163
23


In [None]:
#    LEFTOVER SCRATCH FROM BEFORE--TRYING TO DO STUFF

"""
for dct in lst:
    if dct['name'] == 'Repository':
        repo = {dct['name']: dct['nick']}
            entry.update(repo)
    else:
        pass
    if dct['name'] == 'Repository Collection':
        repocoll = {dct['name']: dct['nick']}
        entry.update(repocoll)
    else:
        pass
    if dct['name'] == 'Repository Collection Guide':
        repocollguide = {dct['name']: dct['nick']}
        entry.update(repocollguide)
    else:
        pass
    if len(fields) > 2:
        fields_info.append(entry)
    else:
        pass

print(json.dumps(fields_info, indent=4))

"""

### ☝️ sidebar: json.dumps
- I don't really understand this method
- see json module > [json.dumps](https://docs.python.org/3.8/library/json.html#json.dumps)

## Step 4: Get field values for fields of interest
Use [dmGetCollectionWords](https://help.oclc.org/Metadata_Services/CONTENTdm/Advanced_website_customization/API_Reference/CONTENTdm_API/CONTENTdm_Server_API_Functions_-_dmwebservices?sl=en#dmGetCollectionWords)  

**Signature**
```
http://yourCONTENTdmURL.com/digital/bl/dmwebservices/index.php?q=dmGetCollectionWords/alias/fields/format

```

- Replace `yourCONTENTdmURL.com` with your institution's CONTENTdm Website URL.
- `alias` is a collection alias
- `fields` is a !-delimited list of field nicknames listing the fields for which the words should be returned. Can also be "all" for all fields.
- `format` is either xml or json

...still haven't gotten here yet--actions could include:

- Get values from each collection for each field of interest
- Add values to combined list
- Process combined list to retain only a list of distinct values
- Do other stuff, compare values across collections, ...