In [2]:
import pandas as pd
records_data = pd.read_csv("records-sub-stats-all.csv")

# Record Subcollection (Overview)

This is an attempt to highlight particular subcollections at in institution to give indiciations of where cataloguing may be more complete in one area than another. It should be noted it *may* but is not always an indication of different cataloguing types for records (museum, library,archive). So even if an institution had examples of all three, it does not mean they have three subcollections, unless each is considered a different thing. Equally, an institution might have three subcollections, but this does not mean a museum, library and archive, it might be three different distinct museum collections that make up the overall institution.

This line, as with all things in these statistics, is blurry, and might be revised. A better title might be "Named subcollections" to indicate it's where a subcollection is referred to as particular subset, not just because of how it is catalogued.

It would be very possible to also have sub-sub-collections and so on, as there a huge variety of ways artefacts can be grouped into what is called a collection (a donated collection, a collection acquired as a whole, a collection from another institution merged into the current institution, etc). This is beyond the
needs of this reporting and is for each institution how to decide how it wants to structure it's collection statistics.

In [4]:
import altair as alt

title = alt.TitleParams('Records per institution by sub-collections', anchor='middle')
alt.Chart(records_data, title=title).mark_bar().encode(
        alt.Color('subcollection:N').legend(orient="bottom", columns=4, titleOrient="left"),
    column='precision:O',
    x='record_count:Q',
    y='institution:N'
).properties(width=250).resolve_scale(x='independent').configure(numberFormat='.2s')

In [5]:

alt.Chart(records_data, width=60, height=alt.Step(8)).mark_bar().encode(
    alt.Y("type:N"),
    alt.X("record_count:Q"),
    alt.Color("type:N").title("settings").legend(orient="bottom", titleOrient="left"),
    alt.Row("institution:N").title("Institution").header(labelAngle=0, labelAlign='left' ),
    alt.Column("subcollection:N").title("Sub Collection"),
).configure(numberFormat='.2s')

In [6]:
title = alt.TitleParams('Collection Size (Records) by topic', anchor='middle')
alt.Chart(records_data,title=title).mark_bar().encode(
    x='record_count:Q',
    y='topic:N',
    color='institution:N',
    column='sector'
).properties(width=220).resolve_scale(x='independent')

In [7]:
title = alt.TitleParams('Collection Size (Records) by precision', anchor='middle')
alt.Chart(records_data, title=title).mark_bar().encode(
    column='precision:O',
    x='record_count:Q',
    y='topic:N',
    color=alt.Color('institution:N', sort='descending')
).properties(width=220).resolve_scale(x='independent')