# The basics

In [26]:
import requests
import pandas as pd
import altair as alt

## A basic query that gets everything!

The API endpoint is `https://api.prov.vic.gov.au/search/query`, though `search/select` also seems to work. I'm not sure if there's any difference between the two. I'll use `search/query` for now.

There's only one compulsory parameter, `q`. To get everything (ie run a blank query), set `q` to `*:*`. So the simplest possible query is:

In [192]:
# /search/select also works, I'm not sure of the difference
api_url = "https://api.prov.vic.gov.au/search/query"
params = {
    "q": "*:*",
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"There are {data['response']['numFound']:,} results.")

https://api.prov.vic.gov.au/search/query?q=%2A%3A%2A
There are 10,137,514 results.


The search results can be found in `response` -> `docs`. Let's try listing all the titles.

In [193]:
for result in data["response"]["docs"]:
    print(result["title"])

Admission and Discharge Register of Patients (Tuberculosis Sanatorium)
Admission and Discharge Register of Patients (Hobson Park Hospital & Traralgon Psychiatric Hospital)
Admission and Discharge Register of Patients (Traralgon Mental Hospital)
Departmental  Cabinet-in-Confidence Working Records, Bracks-Brumby Government [Department of Primary Industries]
Index to Admission and Discharge Register of Patients (Drug and Alcohol Rehabilitation)
Case History Record Book
Departmental Cabinet-in-Confidence Working Records, Bracks-Brumby Government [Department of Innovation, Industry and Regional Development]
Admission and Discharge Register of Patients (Heatherton Hospital)
Departmental Cabinet-in-Confidence Working Records, Bracks-Brumby Government [Department of Human Services]
Infant Life Protection Register- Applications to Board Out Infants - State Ward and Non-Ward Infants


By default the number of results returned by a query is 10. You can change this using the `rows` parameter.

In [194]:
# Default number of rows
len(data["response"]["docs"])

10

In [195]:
params = {
    "q": "*:*",
    "rows": 100
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

len(data["response"]["docs"])

https://api.prov.vic.gov.au/search/query?q=%2A%3A%2A&rows=100


100

## Text searches

To search for words or phrases across multiple fields, just add them to the `q` parameter. If you include multiple keywords, they'll be treated as if they were connected by an `OR` operator. So a `q` value of `murray river` is the same as `murray OR river`.

In [196]:
params = {
    "q": 'murray river',
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"There are {data['response']['numFound']:,} results.")

https://api.prov.vic.gov.au/search/query?q=murray+river
There are 64,716 results.


In [197]:
params = {
    "q": 'murray OR river',
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"There are {data['response']['numFound']:,} results.")

https://api.prov.vic.gov.au/search/query?q=murray+OR+river
There are 64,716 results.


If you want only records containing both keywords, use the `AND` operator.

In [198]:
params = {
    "q": 'murray AND river',
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"There are {data['response']['numFound']:,} results.")

https://api.prov.vic.gov.au/search/query?q=murray+AND+river
There are 2,584 results.


To treat the keywords as a phrase, enclose them in quotes, eg `"murray river"`.

In [199]:
params = {
    "q": '"murray river"',
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"There are {data['response']['numFound']:,} results.")

https://api.prov.vic.gov.au/search/query?q=%22murray+river%22
There are 1,746 results.


Text fields are stemmed, so you generally don't need to worry about plurals and other word forms. For example a search for `box` will be the same as a search for `boxes`, `mine` will match `mining`, and `engine` will match `engineer`. There doesn't seem to be any way to search for an *exact* string, so there'll always be a bit of fuzziness.

You can also use wildcards, fuzzy matches, and proximity searches. See the [Solr documentation for more information](https://solr.apache.org/guide/solr/latest/query-guide/standard-query-parser.html#specifying-terms-for-the-standard-query-parser). For example, a search for `"gold mining"` will return records that include the phrase "gold mining" (or "gold mine" because of word stemming). A search for `"gold mining"~10` will find records where `gold` and `mining` (or `mine`) occur wihin 10 words of each other.

In [200]:
params = {
    "q": '"gold mining"',
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"There are {data['response']['numFound']:,} results.")

https://api.prov.vic.gov.au/search/query?q=%22gold+mining%22
There are 3,908 results.


In [201]:
params = {
    "q": '"gold mining"~10',
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"There are {data['response']['numFound']:,} results.")

https://api.prov.vic.gov.au/search/query?q=%22gold+mining%22~10
There are 4,554 results.


## Categories

API results mix up different types of things – items, images, series, agencies, and functions. You can find out more by using facets to get a count of the values in the `category` field. To retrieve the facet counts, set the `facet` parameter to `true`, and `facet.field` to `category`.

In [202]:
params = {
    "q": "*:*",
    "facet": "true",
    "facet.field": "category",
    "rows": 0
}
response = requests.get(api_url, params=params)
print(response.url)

data = response.json()
values = data["facet_counts"]["facet_fields"]["category"]
facets = [{"facet": values[i], "count": values[i+1]} for i in range(0, len(values), 2)]
pd.DataFrame(facets).style.format(thousands=",").bar()

https://api.prov.vic.gov.au/search/query?q=%2A%3A%2A&facet=true&facet.field=category&rows=0


Unnamed: 0,facet,count
0,Item,6329699
1,Image,3613751
2,relatedEntity,150066
3,Consignment,23610
4,Series,16930
5,Agency,3137
6,Function,321


From https://prov.vic.gov.au/recordkeeping-government/a-z-topics/archival-control-model

> Record Series: a group of records which are recorded or maintained by the same agency (or agencies) and which:
> - are in the same numerical, alphabetical, chronological or other identifiable sequence;
> - or result from the same accumulation or filing process.

> Record Item: a discrete element [of] records managed within a ‘Series’. An Item represents a part of a recordkeeping system or a logical or convenient grouping of records. It may represent one record or multiple records such as a group of folios fastened together to form a file, a group of electronic files aggregated in a folder, or a single volume.

> The function entity in PROV’s ACM represents the major responsibilities of Victorian Government that may be managed by one or more agencies over time. Applying this entity helps to:
> - group together various records with the same administrative record context
> - links records to their provenance and complementary information.

> The agent entity in PROV’s ACM represents a Victorian Government agency—an administrative unit which has or had responsibility for the provision of at least one aspect of government administration. This entity helps to provide a description of a record’s context, namely who created the records and for what purpose.

From https://prov.vic.gov.au/recordkeeping-government/transferring-records/archival-description-records-transfer-projects
> The entire contents of a series may not necessarily be transferred at the same time. A series may also be transferred to PROV in portions, known as consignments, over a number of years. A consignment comprises of record items belonging to the one series which are accessioned into the custody of PROV as part of the one transfer. A consignment may consist of the entirety of a series or only part of a series.

Relationships are described in ACM: https://prov.vic.gov.au/sites/default/files/files/Govt%20Services%20General/PROV_Archival_Control_Model_Policy.pdf

To limit your results to a specific category, set the `category` field to one of these values in your query. For example, if you only wanted agencies, you'd add `category:Agency` to the `q` query string. To include multiple categories you can use the `OR` operator:

- just items: `category:Item`
- items and images: `category:Item OR category:Image`

In [203]:
params = {
    "q": "category:Item"
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} results.\n")

for result in data["response"]["docs"]:
    print(result["title"])

https://api.prov.vic.gov.au/search/query?q=category%3AItem

There are 6,329,699 results.

211/374 Leslie A Lamb: Will; Grant of probate
215/936 Ellen Cahill: Will; Grant of probate
215/981 Florence M Lovegrove: Will; Grant of probate
211/107 Amelia Hawking: Will; Grant of probate
215/980 William F Finchett: Will; Grant of probate
215/979 George Wilson: Will; Grant of probate
211/102 Bernard F Cragen: Will; Grant of probate
211/221 Jonathan Coulson: Will; Grant of probate
215/978 William E S Ockenden: Will; Grant of probate
215/959 Otto Holst: Will; Grant of probate


In [204]:
params = {
    "q": "category:Item OR category:Image"
}
response = requests.get(api_url, params=params)
print(response.url)
data = response.json()

print(f"\nThere are {data['response']['numFound']:,} results.\n")

https://api.prov.vic.gov.au/search/query?q=category%3AItem+OR+category%3AImage

There are 9,943,450 results.



## Using other fields

### Items and images in a particular series



## Indexes

series by id: `(series_id:(283))`

- `series_id`
- `record_form` – possible values:
  - Card
  - Data
  - Document
  - File
  - Map, Plan, or Drawing
  - Moving Image
  - Object
  - Photograph or Image
  - Sound Recording
  - Volume
  - Website
- `start_dt` – date range, eg: `[2025-03-03 TO *]`
- `end_dt` – date range, eg: `[* TO 2025-03-03]`
- `iiif-manifest` – set to `(*)`
- `category` – possible values:
  - Agency
  - Consignment
  - Function
  - Image
  - Item
  - Series
  - relatedEntity
- `location` – possible values:
  - Ballarat
  - Beechworth
  - Bendigo
  - Geelong
  - North Melbourne
  - Online
- `format` – possible values:
  - Digital
  - Physical
- `rights_status` – posible values:
  - Closed
  - Closed Record and Open Metadata
  - Not set
  - Open
- `medium` – possible values:
  - 16mm
  - 8 mm
  - Album
  - Blueprint
  - Bolted
  - Cardboard
  - Cassette Tape
  - Compact Disc (CD)
  - Drafting Cloth (Linen)
  - Floppy Disk
  - Glass Plate Negative
  - Kalamazoo
  - Lantern Slide
  - Loose
  - Magnetic Media
  - Microfiche
  - Microform
  - Motion Picture Film
  - Mounted
  - Negative, Slide or Transparency
  - Not Set
  - Object
  - Paper
  - Photographic Print
  - Plastic Film
  - Polyester Negative
  - Sleeved
  - Synthetic
  - Tracing/Offset
  - VHS
  - Vellum or Parchment
  - Vinyl Disc
  - Volume
  - Wood

## Facets

- record_form
- entity
- category
- format
- location
- rights_status
- medium
- parents.titles
- parents.id
- is_part_of_series.title
- is_part_of_series.id
- agencies.titles
- agencies.id
- possibly others

Note that fields with text values (like description) are often tokenised and stemmed, so are not very useful as facets

For facets, need to set `facet=true` and specify the field using `facet.field`.

https://api.prov.vic.gov.au/search/query?wt=json&q*:*&facet=true&facet.field=location&rows=0

In [None]:
https://api.prov.vic.gov.au/search/query?wt=json&q=*:*&facet=true&facet.field=rights_status&rows=0

In [24]:
def get_facets(field):
    url = f"https://api.prov.vic.gov.au/search/query?wt=json&q=*:*&facet=true&facet.field={field}&rows=0"
    response = requests.get(url)
    data = response.json()
    values = data["facet_counts"]["facet_fields"][field]
    return [{"facet": values[i], "count": values[i+1]} for i in range(0, len(values), 2)]

In [97]:
print("\n".join(sorted([f["facet"] for f in get_facets("medium")])))

16mm
8 mm
Album
Blueprint
Bolted
Cardboard
Cassette Tape
Compact Disc (CD)
Drafting Cloth (Linen)
Floppy Disk
Glass Plate Negative
Kalamazoo
Lantern Slide
Loose
Magnetic Media
Microfiche
Microform
Motion Picture Film
Mounted
Negative, Slide or Transparency
Not Set
Object
Paper
Photographic Print
Plastic Film
Polyester Negative
Sleeved
Synthetic
Tracing/Offset
VHS
Vellum or Parchment
Vinyl Disc
Volume
Wood


In [80]:
def make_facet_chart(field):
    facets = get_facets(field)
    df = pd.DataFrame(facets)
    chart = alt.Chart(df).mark_bar().encode(
        x=alt.X("facet:N", title=None),
        y=alt.Y("count:Q"),
        color=alt.Color("facet:N", title=field, legend=None),
        tooltip = [alt.Tooltip("facet", title=field), alt.Tooltip("count", format=",")]
    ).properties(width=200, height=200, title=field)
    return chart

In [86]:
charts = []
for field in ["category", "entity", "record_form", "format", "location", "rights_status"]:
   charts.append(make_facet_chart(field))
display(alt.concat(*charts, columns=3).properties(padding=20))

In [100]:
response = requests.get("https://api.prov.vic.gov.au/search/query?wt=json&q=category:Agency")

In [101]:
response.json()

{'responseHeader': {'status': 0,
  'QTime': 0,
  'params': {'q': 'category:Agency', 'wt': 'json'}},
 'response': {'numFound': 3136,
  'start': 0,
  'docs': [{'category': 'Agency',
    'entity': 'Agent',
    '_id': 'F6E388C2-F1A7-11E9-AE98-B368FE166A06',
    'timestamp': 1614747820,
    'identifier.PROV_ACM.id': 'VA 421',
    'citation': 'VA 421',
    'identifier.PID.id': 'F6E388C2-F1A7-11E9-AE98-B368FE166A06',
    'title': 'Ministry for Police and Emergency Services',
    'name.official_title.date_range': '[1979 TO 1992]',
    'name.official_title.start_dt': '1979',
    'name.official_title.end_dt': '1992',
    'name.official_title.title': 'Ministry for Police and Emergency Services',
    'date_range': ['[1979 TO 1992]'],
    'start_dt': '1979',
    'end_dt': '1992',
    'jurisdictional_coverage': ['Victoria'],
    'description': "The Ministry for Police and Emergency Services was established in July 1979 and assumed a co-ordinating and policy development role in relation to the police