# USE CASE 001

- Get distinct values\* for fields of interest (one or more) in all collections where they exist
- \[...and then do more stuff...\]

\*`dmGetCollectionWords` (details below) provides distinct values, not all values


## Step 1: Get a list of collections with some needed info about each
Use [dmGetCollectionList](https://help.oclc.org/Metadata_Services/CONTENTdm/Advanced_website_customization/API_Reference/CONTENTdm_API/CONTENTdm_Server_API_Functions_-_dmwebservices?sl=en#dmGetCollectionList) (below is taken from OCLC documentation online)

**Signature**
```
http://yourCONTENTdmURL/digital/bl/dmwebservices/index.php?q=dmGetCollectionList/format
```
- Replace yourCONTENTdmURL with your institution's CONTENTdm Website URL.
- format is either xml or json

**Example**
```
http://yourCONTENTdmURL/digital/bl/dmwebservices/index.php?q=dmGetCollectionList/xml

```

In [1]:
# I'll need these libraries
import requests
import json

In [2]:
# make the API call
response = requests.get("http://digitalcollections.lib.washington.edu/digital/bl/dmwebservices/index.php?q=dmGetCollectionList/json")

I'm going to write this to a file so I can pick back up at this point if I want

In [3]:
# write response to a file
with open("uwlibs/allcolls.json", "w") as jsonFile01:
    jsonFile01.write(response.text)

I'll need the info I retrieved in a var for **step 2** below, I also want to take a look at it

In [4]:
with open('uwlibs/allcolls.json', 'r') as file:
        data = file.read()
        collectionsjson = json.loads(data)

### ☝️ SIDEBAR: json.loads() and json.dumps()
These methods are doing some important work in loading json data from a file (just above) and writing json data to a file (below)

**REFERENCE**
- json: [JSON encoder and decoder](https://docs.python.org/3.8/library/json.html#module-json)
- json module > [json.dumps](https://docs.python.org/3.8/library/json.html#json.dumps)
- json module > [json.loads](https://docs.python.org/3.8/library/json.html#json.loads)

In [5]:
# confirm that this is a list of dictionaries using the built-in function type()
print(type(collectionsjson)) # is it a list?
print(len(collectionsjson)) # how many items (collections)?
print(type(collectionsjson[0])) # are the items in fact dictionaries? (test the first one)

for coll in collectionsjson:
    print(f"Name: {coll['name']} / Alias: {coll['secondary_alias']}") # look at name and alias for each

<class 'list'>
163
<class 'dict'>
Name: Alaska, Western Canada and United States / Alias: alaskawcanada
Name: Alaska Yukon Pacific Exposition Photographs / Alias: ayp
Name: American Indians of the Pacific Northwest -- Image Portion / Alias: loc
Name: American Indians of the Pacific Northwest -- Textual Portion / Alias: lctext
Name: Ancient Near East Photograph Collection / Alias: neareast
Name: Architecture of the Pacific Northwest / Alias: ac
Name: Asian Architecture Collection - Photographs by Patricia Young / Alias: p16786coll17
Name: Barnes (Albert H.) Photographs of Western Washington, 1895-1920 / Alias: barnes
Name: Black Heritage Society (KCS) / Alias: imlsblackhs
Name: Boyd and Braas Photographs of Seattle and Washington State / Alias: boydBraas
Name: Broadcast Media Collection / Alias: bcmedia
Name: Brumfield (William Craft) Russian Architecture Digital Collection / Alias: p16786coll1
Name: Central Eurasia Image Database / Alias: eurasia
Name: Centralia Tragedy and the Industr

## Step 2: Get *all* field info per collection
Use [dmGetCollectionFieldInfo](https://help.oclc.org/Metadata_Services/CONTENTdm/Advanced_website_customization/API_Reference/CONTENTdm_API/CONTENTdm_Server_API_Functions_-_dmwebservices?sl=en#dmGetCollectionFieldInfo) (below is taken from OCLC documentation online)

**Signature**
```
http://yourCONTENTdmURL.com/digital/bl/dmwebservices/index.php?q=dmGetCollectionFieldInfo/alias/format
```
- Replace yourCONTENTdmURL with your institution's CONTENTdm Website URL
- alias is a collection alias
- format is either xml or json
- In JSON, Unicode characters in the field's name are converted to decimal Unicode entities. E.g., 題名 is converted to \u984c\u540d

**Example**
```
http://yourCONTENTdmURL.com/digital/bl/dmwebservices/index.php?q=dmGetCollectionFieldInfo/ctimes/xml
```

\*Note that the `.com` should *not* be included following your institution's CONTENTdm website URL

**For each collection**
- Does 'Repository' field exist? If yes, what is nickname?
- Does 'Repository Collection' field exist? If yes, what is nickname?
- Does 'Repository Collection Guide' field exist? If yes, what is nickname?

In [12]:
# I'll need the requests and json libs again

allcolls_allfields = []

# same for the API call for each collection
uwcdmurl = "digitalcollections.lib.washington.edu"
fmt = "json"

for coll in collectionsjson: # allcolls.json data stored as var
    dct = {}
    url = f"http://{uwcdmurl}/digital/bl/dmwebservices/index.php?q=dmGetCollectionFieldInfo/{coll['secondary_alias']}/{fmt}"
    response = requests.get(url)
    fielddata = json.loads(response.text)
    dct = {'alias': coll['secondary_alias'], 'field_data': fielddata}
    allcolls_allfields.append(dct)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [13]:
print(len(collectionsjson)) # N collections
print(len(allcolls_allfields)) # but I was only able to get the fields for some

163
23


### SIDEBAR - 💥 `JSONDecodeError`
***I don't understand it...***  
...but it is stopping me from iterating through all collections to retrieve field information...  
...why is the error occurring where it is occurring, I wonder...

See [note_JSONDecodeError.ipynb](note_JSONDecodeError.ipynb)

In [15]:
# I'll use a workaround for the time being, and simply not get info for some collections :(
numberaliases = ['3', '1', '11', '4', '7', '2']
allcolls_allfields = []

uwcdmurl = "digitalcollections.lib.washington.edu"
fmt = "json"

for coll in collectionsjson: # allcolls.json data stored as var
    if coll['secondary_alias'] in numberaliases:
        pass
    else:
        dct = {}
        url = f"http://{uwcdmurl}/digital/bl/dmwebservices/index.php?q=dmGetCollectionFieldInfo/{coll['secondary_alias']}/{fmt}"
        response = requests.get(url)
        fielddata = json.loads(response.text)
        dct = {'alias': coll['secondary_alias'], 'field_data': fielddata}
        allcolls_allfields.append(dct)

In [20]:
# how many collections did I get?
print(len(allcolls_allfields))
# how many fields per collection?
for coll in allcolls_allfields:
    print(f"{coll['alias']} has {len(coll['field_data'])} fields.")

157
alaskawcanada has 39 fields.
ayp has 37 fields.
loc has 34 fields.
lctext has 30 fields.
neareast has 33 fields.
ac has 45 fields.
p16786coll17 has 40 fields.
barnes has 37 fields.
imlsblackhs has 33 fields.
boydBraas has 38 fields.
bcmedia has 40 fields.
p16786coll1 has 62 fields.
eurasia has 35 fields.
iww has 32 fields.
chandless has 37 fields.
chernobyl has 29 fields.
buildings has 38 fields.
civilwar has 39 fields.
civilworks has 35 fields.
cchs has 38 fields.
CMPindiv has 36 fields.
cobb has 34 fields.
curtis has 35 fields.
dearmassar has 37 fields.
dp has 39 fields.
desmo has 35 fields.
donaldson has 43 fields.
advert has 37 fields.
imlseastside has 33 fields.
ethnomusic has 40 fields.
pnwlabor has 39 fields.
costumehist has 37 fields.
fera has 36 fields.
ftm has 35 fields.
fishimages has 29 fields.
epic has 47 fields.
gar has 43 fields.
grandcoulee has 35 fields.
harriman has 37 fields.
hegg has 37 fields.
hester has 37 fields.
historicalbookarts has 43 fields.
childrens ha

*As above, I'll write the collection-field info to a file so I can pick back up here later...*

In [17]:
# let's write all_coll_fields to a file
acafstring = json.dumps(allcolls_allfields)
with open("uwlibs/allcolls_allfields.json", "w") as jsonFile:
    jsonFile.write(acafstring)

## 🚧 Step 4: Get field values for fields of interest 🚧
*in progress*

Use [dmGetCollectionWords](https://help.oclc.org/Metadata_Services/CONTENTdm/Advanced_website_customization/API_Reference/CONTENTdm_API/CONTENTdm_Server_API_Functions_-_dmwebservices?sl=en#dmGetCollectionWords)  

**Signature**
```
http://yourCONTENTdmURL.com/digital/bl/dmwebservices/index.php?q=dmGetCollectionWords/alias/fields/format

```

- Replace `yourCONTENTdmURL.com` with your institution's CONTENTdm Website URL.
- `alias` is a collection alias
- `fields` is a !-delimited list of field nicknames listing the fields for which the words should be returned. Can also be "all" for all fields.
- `format` is either xml or json

...still haven't gotten here yet--actions could include:

- Get values from each collection for each field of interest
- Add values to combined list
- Process combined list to retain only a list of distinct values
- Do other stuff, compare values across collections, ...

In [1]:
# picking back up, loading json from a file
import json
with open("uwlibs/allcolls_allfields.json", "r") as file:
    data = file.read()
    allfieldsjson = json.loads(data)

In [None]:
# testing 02
for coll in allfieldsjson:
    print(f"\n{coll['alias']}\n==============\n")
    for field in coll['field_data']:
        print(field['name'])

## brainstorm
- fields_of_interest = []
- allcolls_somefields = allcolls_allfields
- for each coll in allcolls_somefields:
    - go through the 'field_data' list
    - for each dct in the 'field_data' list:
        - if the 'name' of the field is in fields_of_interest, pass
        - if the 'name' of the field is not in fields_of_interest, [remove()](https://www.programiz.com/python-programming/methods/list/remove)