In [14]:
import pandas as pd
import numpy as np
import csv
from helper_functions import df_to_qs_v1, query_factgrid, parse_date, process_date_parsing_results, DateType

# Step 4 - Connecting monasteries to their orders and assigning the instance of Statement

This notebook implements the last step of the Klosterdatenbank-to-FactGrid-Workflow which is to connect information on religious order to the newly created items for religious communities.

In the original database, the information about which order a religious community belongs to is stored in the table `gs_monastery_order`. This is a 1:n-relationship, meaning that a religious community can belong to different orders over the course of its existence. As of 2025, there are 108 order designations in the monastery database, although it is not claimed to be comprehensive. The designations were previously published as normed vocabularies as part of earlier research projects and can be accessed through [Dante](https://dante.gbv.de/search?p=407) or [FactGrid](https://tinyurl.com/27slckal). The table `gs_monastery_order` contains not only these standardized order designations, but also other designations that describe the order affiliation of the community, but are not standardized. Examples of this include collective terms such as "Other lay convents/Semi-religious communities (m)", which groups together male lay convents without a more specific designation. Another example is the "evangelical monastery/stift (w)". Evangelical monasteries are not within the scope of interest of the monastery database and are therefore not described in more detail. However, if the information is available that a monastery became an evangelical monastery, this is still recorded, and it is then reflected in the order designation "evangelical monastery/stift".

The designations outside of the normative vocabulary must be caught separately in the workflow. In some cases, the designation can be mapped to an existing FactGrid item. The term "Kanonissen" from the database corresponds to the "Community of Canonesses" in FactGrid [Q1480654](https://database.factgrid.de/wiki/Item:Q1480654). In the case of the canonesses and canons, it is also necessary to express that the community is a collegiate or a women's chapter. Therefore, the value associated with [P2](https://database.factgrid.de/wiki/Property:P2) must also be modified.

## Preparation

The following cell loads all required tables into a dictionary of DataFrames and queries FactGrid to retreive existent Q-Numbers for the religious communities. The table `gs_monastery_oder` contains the mappting between religious communities and orders. `gs_orders` contains information about the orders themselves, including their Q-numbers in FactGrid.

In [15]:
# Load required data into dataframes
dataframes = {}
dataframes["gs_monastery_order"] = pd.read_excel("data/exports_monasteryDB/gs_monastery_order.xlsx")
dataframes["monasteries_in_factgrid"] = query_factgrid("monasteries")
dataframes["monasteries_in_factgrid"]
dataframes["monasteries_in_factgrid"]["item"] = dataframes["monasteries_in_factgrid"]["item"].str.split("/").str[-1]
dataframes["monasteries_in_factgrid"]["KlosterdatenbankID"] = dataframes["monasteries_in_factgrid"]["KlosterdatenbankID"].astype(int)
dataframes["gs_orders"] = pd.read_excel("data/exports_monasteryDB/gs_orders.xlsx")

dataframes["new_monasteries"] = pd.read_csv("data/intermediate_results/new_monasteries_ids.csv")
dataframes["gs_monastery_order"] = dataframes["gs_monastery_order"][dataframes["gs_monastery_order"]["gsn_id"].isin(dataframes["new_monasteries"]["id_gsn"])]

# Merge information on orders, monasteries and FactGrid IDs of newly created items
df = pd.merge(dataframes["gs_monastery_order"], dataframes["monasteries_in_factgrid"], how="left", left_on="gsn_id", right_on="KlosterdatenbankID").dropna(subset="item")
df = pd.merge(df, dataframes["gs_orders"], how="left", left_on="order_id", right_on="id_order")
df

Unnamed: 0,Unnamed: 0_x,id_monastery_order,gsn_id,monastery_status,order_id,order_begin_tpq,order_begin_taq,order_end_tpq,order_end_taq,order_begin_note,...,order_name,order_abbreviation,Symbol,gender,imagefile,comment_order,ID_GSReligiousOrder,lthk,RG_Abkuerzung,FactGridID
0,289,18,68,Kloster,32,1437,,1647,,,...,Augustinereremitinnen,OSA (w),u,Frauenkloster,Augustinerinnen,,32,,o. fr. herem. s. Aug.,Q1195145
1,2018,2714,20004,Kloster,82,1357,,1534,,,...,Franziskanerinnen der Dritten Regel (Terziarin...,TOF,t,Frauenkloster,Franziskaner_Terziarinnen,,82,,,Q1195157
2,2210,2919,20694,Kloster,82,1345,,1493,,vor 1345,...,Franziskanerinnen der Dritten Regel (Terziarin...,TOF,t,Frauenkloster,Franziskaner_Terziarinnen,,82,,,Q1195157
3,2365,3076,20636,Kommende,22,1254,,1805,,1254/58,...,Deutscher Orden,OT,G,Männerkloster,Deutscher_Orden,,22,,theutonicorum,Q147168
4,2692,6620,30016,Kloster,36,1250,1260.0,1527,,um 1255,...,Augustinereremiten,OSA,U,Männerkloster,Augustiner,,36,,o. fr. herem. s. Aug.,Q143826
5,2922,6875,30279,Stift,31,1163,1185.0,1568,,1163/1185,...,Prämonstratenserinnen,OPraem (w),p,Frauenkloster,Praemonstratenserinnen,,31,,o. Prem.,Q1195163
6,3649,8781,50211,Schwesternhaus,112,1245,1295.0,1455,1460.0,vor 1295,...,Sonstige Laienkonvente/Semireligiöse Gemeinsch...,,,Frauenkloster,default_monastery,,179,,,
7,3682,8834,50264,Kloster,28,1496,,1575,,1496 erstmals belegt,...,Zisterzienserinnen,OCist (w),z,Frauenkloster,Zisterzienserinnen,,28,,o. Cist.,Q640839
8,3963,9585,60419,Kloster,36,1291,,1525,,,...,Augustinereremiten,OSA,U,Männerkloster,Augustiner,,36,,o. fr. herem. s. Aug.,Q143826
9,4132,9770,60464,Kloster,82,1471,,1802,,,...,Franziskanerinnen der Dritten Regel (Terziarin...,TOF,t,Frauenkloster,Franziskaner_Terziarinnen,,82,,,Q1195157


### Special Cases in order and instance assignment
This cell contains the aforementioned mapping of special cases in the relation to orders. The dictionary `orden_mapping` assigns Q-numbers to orders in the case where a direct mapping is possible. The `status_mapping` maps the status of a monastery in a certain order relation to a Q-Number that will be linked in a P2-Statemtent. This will be explained below. In `orden_p2_special_cases`, instances where a specific order induces a specific P2-statement are collected.

In [16]:
orden_mapping = {
    35:"Q1480654",
    34:"Q1480653",
    7:"Q164255",
    112:"Q640846",
    79: "Q640842"
}
status_mapping = {
    "Stift":"Q523002",
    "Kloster":"Q141472",
    "Bruderhaus":"Q640846",
    "Domstift":"Q164867",
    "Kommende":"Q395378",
    "Konvent":"Q399022",
    "Schwesternhaus":"Q640846",
    "evangelisches Kloster/Stift":np.nan
}
orden_p2_special_cases = {
    34:"Q160437",
    35:"Q898116"
}
double_monasteries = {
    48:["Q164266", "Q1195146"], # Benediktiner-Doppelkloster
    56: ["Q1195144", "Q172927"], # Augustiner-Doppelstift
    59: ["Q1480653", "Q1480654"], # Doppelstift
    117: ["Q1352536", "Q1352539"], # Humiliaten-Doppelkloster
    58: ["Q143840", "Q1195163"], # Prämonstratenser-Doppelstift
    57: ["Q142148", "Q640839"] #Zisterzienser-Doppelkloster
}

## Orders

In this step, the religious order is added to monasteries using the property "Religious order" [P746](https://database.factgrid.de/wiki/Property:P746). The data is filled with the corresponding FactGrid Q-Number. In cases, where no number is given, the cell is filled using the mapping from above.

In [17]:
# Handle double monasteries
for index, row in df.iterrows():
    if df.loc[index,"order_id"] in double_monasteries:
        order_name = row["order_id"]
        df.loc[index, "FactGridID"] = double_monasteries[order_name][0]
        df.loc[len(df)] = df.loc[index].copy()
        df.loc[len(df)-1, "FactGridID"] = double_monasteries[order_name][1]
df

# Create new DataFrame for export data
export_df = pd.DataFrame()
export_df["qid"] = df["item"]

# Fill column P746 (religious order) with data from Factgrid. Fill remaining empty cells using the mapping from the cell above
export_df["P746"] = df["FactGridID"]
export_df["P746"] = export_df["P746"].fillna(df["order_id"].apply(lambda x: orden_mapping[x] if x in orden_mapping else np.nan))

export_df

Unnamed: 0,qid,P746
0,Q1758548,Q1195145
1,Q1758549,Q1195157
2,Q1758545,Q1195157
3,Q1758541,Q147168
4,Q1758547,Q143826
5,Q1758542,Q1195163
6,Q1758537,Q640846
7,Q1758539,Q640839
8,Q1758540,Q143826
9,Q1758538,Q1195157


If there are still empty cells in column `P746`, they will be shown here together with some further information about the case. This way it can be controlled if there is a FactGrid-Mapping missing (e.g. if new orders are added to the database). There is one special case in which it is expected to have an empty value, which is the case of evangelical monasteries. For those monasteries, no further action is required.

#TODO needs more information

In [18]:
na_orders = export_df[export_df["P746"].isna()]
na_orders = pd.merge(na_orders, df[["item","order_name"]], how="left", left_on="qid", right_on="item")
na_orders

Unnamed: 0,qid,P746,item,order_name


### Dates

Just like the building complexes, the relationship between religious communities and orders also has a temporal component. The information can be found in `gs_monastery_orders`, in the columns `order_begin_tpq`, `order_begin_taq`, `order_end_tpq`, `order_end_taq`, `order_begin_note` and `order_end_note`. The processing is done the same way as for the building complexes. You can find a detailed explaination in Notebook 3.

In [19]:

# Get relevant data for date parsing
export_df["order_begin_tpq"] = df["order_begin_tpq"]
export_df["order_end_tpq"] = df["order_end_tpq"]
export_df["order_begin_note"] = df["order_begin_note"]
export_df["order_end_note"] = df["order_end_note"]

# Date Parsing
export_df["begin_date_parse_result"] = df["order_begin_note"].apply(lambda x: parse_date(str(x), DateType.BEGIN_DATE))
export_df["end_date_parse_result"] = df["order_end_note"].apply(lambda x: parse_date(str(x), DateType.END_DATE))
export_df['qal787'] = df["order_begin_note"].apply(lambda x: f'\"{x}\"' if not pd.isna(x) else np.nan)
export_df['qal788'] = df["order_end_note"].apply(lambda x: f'\"{x}\"' if not pd.isna(x) else np.nan).apply(lambda x: x if x != "heute" else np.nan)
process_date_parsing_results(export_df, "order")

# Cleanup
export_df = export_df.drop(columns=["order_begin_tpq", "order_end_tpq", "begin_date_parse_result", "end_date_parse_result"])
export_df["S471"] = df["gsn_id"].apply(lambda x: f'\"{x}\"')
export_df.drop(columns={"order_begin_note", "order_end_note"}, inplace=True)
export_df

Unnamed: 0,qid,P746,qal787,qal788,qal49,qal50,qal786,qal1124,qal785,S471
0,Q1758548,Q1195145,,,+1437-01-01T00:00:00Z/9/J,+1647-01-01T00:00:00Z/9,,,,"""68"""
1,Q1758549,Q1195157,,"""um 1534""",+1357-01-01T00:00:00Z/9/J,+1534-00-00T00:00:00Z/9/J,Q10,,,"""20004"""
2,Q1758545,Q1195157,"""vor 1345""",,,+1493-01-01T00:00:00Z/9/J,,+1345-00-00T00:00:00Z/9/J,,"""20694"""
3,Q1758541,Q147168,"""1254/58""","""1805/06""",+1254-00-00T00:00:00Z/9/J,+1805-00-00T00:00:00Z/9,,,,"""20636"""
4,Q1758547,Q143826,"""um 1255""",,+1255-00-00T00:00:00Z/9/J,+1527-01-01T00:00:00Z/9/J,,,Q10,"""30016"""
5,Q1758542,Q1195163,"""1163/1185""",,+1163-00-00T00:00:00Z/9/J,+1568-01-01T00:00:00Z/9/J,,,,"""30279"""
6,Q1758537,Q640846,"""vor 1295""","""1455/1460""",,+1455-00-00T00:00:00Z/9/J,,+1295-00-00T00:00:00Z/9/J,,"""50211"""
7,Q1758539,Q640839,"""1496 erstmals belegt""",,+1496-00-00T00:00:00Z/9/J,+1575-01-01T00:00:00Z/9/J,,,,"""50264"""
8,Q1758540,Q143826,,,+1291-01-01T00:00:00Z/9/J,+1525-01-01T00:00:00Z/9/J,,,,"""60419"""
9,Q1758538,Q1195157,,,+1471-01-01T00:00:00Z/9/J,+1802-01-01T00:00:00Z/9,,,,"""60464"""


## Instance Of Statement

One of the most important properties in FactGrid is Property [P2](https://database.factgrid.de/wiki/Property:P2) "instance of". This property links an item to a class that serves as a category for the current item. Strictly speaking, there are only items in FactGrid, and no ontologically defined classes. The class structure is modeled through the linking of items with the properties "instance of" and "subproperty of". In the case of the monastery database, the information for this class affiliation is partially represented in the field `monastery_status` in the table `gs_monastery_order`. The assignment of a religious community to one of the classes depends on its order affiliation. This connection is not fully represented in the monastery database. Only a distinction is made between brotherhouse and sisterhouse, convent, abbey, monastery, commandery, convent, and evangelical convent. We use this distinction to integrate the data sets into the classification system of FactGrid, while being aware that we are omitting historical complexity. In cases where no classification is possible, the data sets are classified as "Religious Community", the most general class.

In [20]:
# Assign P2 statement according to dictionary
export_df["P2"] = df["order_name"].apply(lambda x: orden_p2_special_cases[x] if x in orden_p2_special_cases else np.nan)
export_df["P2"].fillna(df["monastery_status"].apply(lambda x: status_mapping[x] if x in status_mapping else "Q704192"), inplace=True)

# Prevent duplicate P2 assignments
p2_duplicates = export_df.duplicated(subset=["qid", "P2"])
export_df.loc[p2_duplicates, "P2"] = np.nan

export_df

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  export_df["P2"].fillna(df["monastery_status"].apply(lambda x: status_mapping[x] if x in status_mapping else "Q704192"), inplace=True)
 'Q141472' 'Q141472' 'Q141472' 'Q523002' 'Q141472' 'Q523002' 'Q141472'
 'Q640846' 'Q141472' 'Q640846' 'Q141472' 'Q399022']' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  export_df["P2"].fillna(df["monastery_status"].apply(lambda x: status_mapping[x] if x in status_mapping else "Q704192"), inplace=True)


Unnamed: 0,qid,P746,qal787,qal788,qal49,qal50,qal786,qal1124,qal785,S471,P2
0,Q1758548,Q1195145,,,+1437-01-01T00:00:00Z/9/J,+1647-01-01T00:00:00Z/9,,,,"""68""",Q141472
1,Q1758549,Q1195157,,"""um 1534""",+1357-01-01T00:00:00Z/9/J,+1534-00-00T00:00:00Z/9/J,Q10,,,"""20004""",Q141472
2,Q1758545,Q1195157,"""vor 1345""",,,+1493-01-01T00:00:00Z/9/J,,+1345-00-00T00:00:00Z/9/J,,"""20694""",Q141472
3,Q1758541,Q147168,"""1254/58""","""1805/06""",+1254-00-00T00:00:00Z/9/J,+1805-00-00T00:00:00Z/9,,,,"""20636""",Q395378
4,Q1758547,Q143826,"""um 1255""",,+1255-00-00T00:00:00Z/9/J,+1527-01-01T00:00:00Z/9/J,,,Q10,"""30016""",Q141472
5,Q1758542,Q1195163,"""1163/1185""",,+1163-00-00T00:00:00Z/9/J,+1568-01-01T00:00:00Z/9/J,,,,"""30279""",Q523002
6,Q1758537,Q640846,"""vor 1295""","""1455/1460""",,+1455-00-00T00:00:00Z/9/J,,+1295-00-00T00:00:00Z/9/J,,"""50211""",Q640846
7,Q1758539,Q640839,"""1496 erstmals belegt""",,+1496-00-00T00:00:00Z/9/J,+1575-01-01T00:00:00Z/9/J,,,,"""50264""",Q141472
8,Q1758540,Q143826,,,+1291-01-01T00:00:00Z/9/J,+1525-01-01T00:00:00Z/9/J,,,,"""60419""",Q141472
9,Q1758538,Q1195157,,,+1471-01-01T00:00:00Z/9/J,+1802-01-01T00:00:00Z/9,,,,"""60464""",Q141472


Finally, the data is again exported into the various export formats

In [21]:
export_df.to_excel("data/results/monastery_order_connection/monastery_to_order.xlsx", index=False)
export_df.to_csv("data/results/monastery_order_connection/monastery_to_order.csv", index=False, doublequote=False, quoting=csv.QUOTE_NONE, escapechar="§")
with open("data/results/monastery_order_connection/monastery_to_order.tsv", "w") as file:
    file.write(df_to_qs_v1(export_df))

#TODO Last Step

In [22]:
from helper_functions import query_factgrid

# Get FactGrid Data
monasteries_in_factgrid = query_factgrid("monasteries")
building_complexes_in_factgrid = query_factgrid("building_complexes")

# Cleanup data
monasteries_in_factgrid["item"] = monasteries_in_factgrid["item"].str.split("/").str[-1]
monasteries_in_factgrid.rename(columns={"item":"url_value", "KlosterdatenbankID":"gsn_id"}, inplace=True)
monasteries_in_factgrid["url_type_id"] = 42

building_complexes_in_factgrid
building_complexes_in_factgrid["item"] = building_complexes_in_factgrid["item"].str.split("/").str[-1]
building_complexes_in_factgrid["GSVocabTerm"] = building_complexes_in_factgrid["GSVocabTerm"].str.split("Location").str[-1]
building_complexes_in_factgrid.rename(columns={"item":"factgrid_id", "GSVocabTerm":"id_monastery_location"}, inplace=True)

# Save data
monasteries_in_factgrid.to_csv("data/factgrid_data/monasteries_in_factgrid.csv")
monasteries_in_factgrid.to_excel("data/factgrid_data/monasteries_in_factgrid.xlsx")
building_complexes_in_factgrid.to_csv("data/factgrid_data/building_complexes_in_factgrid.csv")
building_complexes_in_factgrid.to_excel("data/factgrid_data/building_complexes_in_factgrid.xlsx")