In [72]:
import pandas as pd
import numpy as np
import csv
from helper_functions import df_to_qs_v1, query_factgrid, parse_date, process_date_parsing_results, DateType

# Step 4 - Connecting monasteries to their orders and assigning the instance of Statement

This notebook implements the last step of the Klosterdatenbank-to-FactGrid-Workflow which is to connect information on religious order to the newly created items for religious communities.

In the original database, the information about which order a religious community belongs to is stored in the table `gs_monastery_order`. This is a 1:n-relationship, meaning that a religious community can belong to different orders over the course of its existence. As of 2025, there are 108 order designations in the monastery database, although it is not claimed to be comprehensive. The designations were previously published as normed vocabularies as part of earlier research projects and can be accessed through [Dante](https://dante.gbv.de/search?p=407) or [FactGrid](https://tinyurl.com/27slckal). The table `gs_monastery_order` contains not only these standardized order designations, but also other designations that describe the order affiliation of the community, but are not standardized. Examples of this include collective terms such as "Other lay convents/Semi-religious communities (m)", which groups together male lay convents without a more specific designation. Another example is the "evangelical monastery/stift (w)". Evangelical monasteries are not within the scope of interest of the monastery database and are therefore not described in more detail. However, if the information is available that a monastery became an evangelical monastery, this is still recorded, and it is then reflected in the order designation "evangelical monastery/stift".

The designations outside of the normative vocabulary must be caught separately in the workflow. In some cases, the designation can be mapped to an existing FactGrid item. The term "Kanonissen" from the database corresponds to the "Community of Canonesses" in FactGrid [Q1480654](https://database.factgrid.de/wiki/Item:Q1480654). In the case of the canonesses and canons, it is also necessary to express that the community is a collegiate or a women's chapter. Therefore, the value associated with [P2](https://database.factgrid.de/wiki/Property:P2) must also be modified.

## Preparation

The following cell loads all required tables into a dictionary of DataFrames and queries FactGrid to retreive existent Q-Numbers for the religious communities. The table `gs_monastery_oder` contains the mappting between religious communities and orders. `gs_orders` contains information about the orders themselves, including their Q-numbers in FactGrid.

In [73]:
# Load required data into dataframes
dataframes = {}
dataframes["gs_monastery_order"] = pd.read_excel("data/exports_monasteryDB/gs_monastery_order.xlsx")
dataframes["monasteries_in_factgrid"] = query_factgrid("monasteries")
dataframes["monasteries_in_factgrid"]["item"] = dataframes["monasteries_in_factgrid"]["item"].str.split("/").str[-1]
dataframes["monasteries_in_factgrid"]["KlosterdatenbankID"] = dataframes["monasteries_in_factgrid"]["KlosterdatenbankID"].astype(int)
dataframes["gs_orders"] = pd.read_excel("data/exports_monasteryDB/gs_orders.xlsx")

# Merge information on orders, monasteries and FactGrid IDs of newly created items
df = pd.merge(dataframes["gs_monastery_order"], dataframes["monasteries_in_factgrid"], how="left", left_on="gsn_id", right_on="KlosterdatenbankID").dropna(subset="item")
df = pd.merge(df, dataframes["gs_orders"], how="left", left_on="order_id", right_on="id_order")
df

Unnamed: 0,Unnamed: 0_x,id_monastery_order,gsn_id,monastery_status,order_id,order_begin_tpq,order_begin_taq,order_end_tpq,order_end_taq,order_begin_note,...,order_name,order_abbreviation,Symbol,gender,imagefile,comment_order,ID_GSReligiousOrder,lthk,RG_Abkuerzung,FactGridID
0,312,44,93,Kloster,16,1005,1022.0,1803,,ca. 1010/1022,...,Benediktiner,OSB,B,Männerkloster,Benediktiner,,16,,o. s. Ben.,Q164266
1,685,754,814,Stift,34,1220,,1567,,,...,Kanoniker,Kan.,K,Männerkloster,Kanoniker_Kollegiatsstift,,34,,eccl.,
2,831,931,972,Kloster,8,1654,,1804,,,...,Kapuziner,OFMCap,J,Männerkloster,Kapuziner,,8,,,Q143828
3,1076,1442,2055,Kloster,28,1267,,1555,,,...,Zisterzienserinnen,OCist (w),z,Frauenkloster,Zisterzienserinnen,,28,,o. Cist.,Q640839
4,1077,1443,2055,Kloster,33,1205,1215.0,1219,,um 1210 (?),...,Benediktinerinnen,OSB (w),b,Frauenkloster,Benediktinerinnen,,33,,o. s. Ben.,Q1195146
5,1168,1591,3017,Stift,34,975,1000.0,1672,,Ende 10. Jahrhundert,...,Kanoniker,Kan.,K,Männerkloster,Kanoniker_Kollegiatsstift,,34,,eccl.,
6,1800,2456,814,evangelisches Kloster/Stift,71,1567,,1649,,,...,evangelisches Kloster/Stift (m),ev.,,Männerkloster,default_monastery,,71,,,
7,3133,7511,40177,Kloster,33,1401,1500.0,1604,,15. Jahrhundert,...,Benediktinerinnen,OSB (w),b,Frauenkloster,Benediktinerinnen,,33,,o. s. Ben.,Q1195146
8,3264,7663,40329,Kloster,8,1657,,1679,,,...,Kapuziner,OFMCap,J,Männerkloster,Kapuziner,,8,,,Q143828
9,4194,10036,3790,Kloster,28,1234,1266.0,1553,,Mitte 13. Jahrhundert,...,Zisterzienserinnen,OCist (w),z,Frauenkloster,Zisterzienserinnen,,28,,o. Cist.,Q640839


### Special Cases in order and instance assignment
This cell contains the aforementioned mapping of special cases in the relation to orders. The dictionary `orden_mapping` assigns Q-numbers to orders in the case where a direct mapping is possible. The `status_mapping` maps the status of a monastery in a certain order relation to a Q-Number that will be linked in a P2-Statemtent. This will be explained below. In `orden_p2_special_cases`, instances where a specific order induces a specific P2-statement are collected.

In [74]:
orden_mapping = {
    "Kanonissen":"Q1480654",
    "Kanoniker":"Q1480653",
    "Franziskaner (Minoriten/Konventualen/Observanten/Rekollekten)":"Q164255",
    "Sonstige Laienkonvente/Semireligiöse Gemeinschaften (w)":"Q640846"
}
status_mapping = {
    "Stift":"Q523002",
    "Kloster":"Q141472",
    "Bruderhaus":"Q640846",
    "Domstift":"Q164867",
    "Kommende":"Q395378",
    "Konvent":"Q399022",
    "Schwesternhaus":"Q640846",
    "evangelisches Kloster/Stift":np.nan
}
orden_p2_special_cases = {
    "Kanoniker":"Q160437",
    "Kanonissen":"Q898116"
}

## Orders

In this step, the religious order is added to monasteries using the property "Religious order" [P746](https://database.factgrid.de/wiki/Property:P746). The data is filled with the corresponding FactGrid Q-Number. In cases, where no number is given, the cell is filled using the mapping from above.

In [75]:
# Create new DataFrame for export data
export_df = pd.DataFrame()
export_df["qid"] = df["item"]

# Fill column P746 (religious order) with data from Factgrid. Fill remaining empty cells using the mapping from the cell above
export_df["P746"] = df["FactGridID"]
export_df["P746"] = export_df["P746"].fillna(df["order_name"].apply(lambda x: orden_mapping[x] if x in orden_mapping else np.nan))
export_df

Unnamed: 0,qid,P746
0,Q1752920,Q164266
1,Q1752917,Q1480653
2,Q1752915,Q143828
3,Q469450,Q640839
4,Q469450,Q1195146
5,Q400529,Q1480653
6,Q1752917,
7,Q1752922,Q1195146
8,Q1752912,Q143828
9,Q1752916,Q640839


If there are still empty cells in column `P746`, they will be shown here together with some further information about the case. This way it can be controlled if there is a FactGrid-Mapping missing (e.g. if new orders are added to the database). There is one special case in which it is expected to have an empty value, which is the case of evangelical monasteries. For those monasteries, no further action is required.

In [76]:
na_orders = export_df[export_df["P746"].isna()]
na_orders = pd.merge(na_orders, df[["item","order_name"]], how="left", left_on="qid", right_on="item")
na_orders

Unnamed: 0,qid,P746,item,order_name
0,Q1752917,,Q1752917,Kanoniker
1,Q1752917,,Q1752917,evangelisches Kloster/Stift (m)


### Dates

Just like the building complexes, the relationship between religious communities and orders also has a temporal component. The information can be found in `gs_monastery_orders`, in the columns `order_begin_tpq`, `order_begin_taq`, `order_end_tpq`, `order_end_taq`, `order_begin_note` and `order_end_note`. The processing is done the same way as for the building complexes. You can find a detailed explaination in Notebook 3.

In [77]:

# Get relevant data for date parsing
export_df["order_begin_tpq"] = df["order_begin_tpq"]
export_df["order_end_tpq"] = df["order_end_tpq"]
export_df["order_begin_note"] = df["order_begin_note"]
export_df["order_end_note"] = df["order_end_note"]

# Date Parsing
export_df["begin_date_parse_result"] = df["order_begin_note"].apply(lambda x: parse_date(str(x), DateType.BEGIN_DATE))
export_df["end_date_parse_result"] = df["order_end_note"].apply(lambda x: parse_date(str(x), DateType.END_DATE))
export_df['qal787'] = df["order_begin_note"].apply(lambda x: f'\"{x}\"' if not pd.isna(x) else np.nan)
export_df['qal788'] = df["order_end_note"].apply(lambda x: f'\"{x}\"' if not pd.isna(x) else np.nan).apply(lambda x: x if x != "heute" else np.nan)
process_date_parsing_results(export_df, "order")

# Cleanup
export_df = export_df.drop(columns=["order_begin_tpq", "order_end_tpq", "begin_date_parse_result", "end_date_parse_result"])
export_df["S471"] = df["gsn_id"].apply(lambda x: f'\"{x}\"')
export_df.drop(columns={"order_begin_note", "order_end_note"}, inplace=True)
export_df

Unnamed: 0,qid,P746,qal787,qal788,qal49,qal785,qal50,qal1124,qal1126,S471
0,Q1752920,Q164266,"""ca. 1010/1022""",,+1010-00-00T00:00:00Z/9/J,Q10,+1803-01-01T00:00:00Z/9,,,"""93"""
1,Q1752917,Q1480653,,,+1220-01-01T00:00:00Z/9/J,,+1567-01-01T00:00:00Z/9/J,,,"""814"""
2,Q1752915,Q143828,,,+1654-01-01T00:00:00Z/9,,+1804-01-01T00:00:00Z/9,,,"""972"""
3,Q469450,Q640839,,,+1267-01-01T00:00:00Z/9/J,,+1555-01-01T00:00:00Z/9/J,,,"""2055"""
4,Q469450,Q1195146,"""um 1210 (?)""",,+1210-00-00T00:00:00Z/9/J,Q10,+1219-01-01T00:00:00Z/9/J,,,"""2055"""
5,Q400529,Q1480653,"""Ende 10. Jahrhundert""",,+0983-00-00T00:00:00Z/7/J,,+1672-01-01T00:00:00Z/9,,,"""3017"""
6,Q1752917,,,,+1567-01-01T00:00:00Z/9/J,,+1649-01-01T00:00:00Z/9,,,"""814"""
7,Q1752922,Q1195146,"""15. Jahrhundert""",,+1500-00-00T00:00:00Z/7/J,,+1604-01-01T00:00:00Z/9,,,"""40177"""
8,Q1752912,Q143828,,,+1657-01-01T00:00:00Z/9,,+1679-01-01T00:00:00Z/9,,,"""40329"""
9,Q1752916,Q640839,"""Mitte 13. Jahrhundert""","""letzte Erwähnung 1553""",+1250-00-00T00:00:00Z/7/J,,+1553-01-01T00:00:00Z/9/J,,,"""3790"""


## Instance Of Statement

One of the most important properties in FactGrid is Property [P2](https://database.factgrid.de/wiki/Property:P2) "instance of". This property links an item to a class that serves as a category for the current item. Strictly speaking, there are only items in FactGrid, and no ontologically defined classes. The class structure is modeled through the linking of items with the properties "instance of" and "subproperty of". In the case of the monastery database, the information for this class affiliation is partially represented in the field `monastery_status` in the table `gs_monastery_order`. The assignment of a religious community to one of the classes depends on its order affiliation. This connection is not fully represented in the monastery database. Only a distinction is made between brotherhouse and sisterhouse, convent, abbey, monastery, commandery, convent, and evangelical convent. We use this distinction to integrate the data sets into the classification system of FactGrid, while being aware that we are omitting historical complexity. In cases where no classification is possible, the data sets are classified as "Religious Community", the most general class.

In [78]:
# Assign P2 statement according to dictionary
export_df["P2"] = df["order_name"].apply(lambda x: orden_p2_special_cases[x] if x in orden_p2_special_cases else np.nan)
export_df["P2"].fillna(df["monastery_status"].apply(lambda x: status_mapping[x] if x in status_mapping else "Q704192"), inplace=True)

# Prevent duplicate P2 assignments
p2_duplicates = export_df.duplicated(subset=["qid", "P2"])
export_df.loc[p2_duplicates, "P2"] = np.nan

export_df

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  export_df["P2"].fillna(df["monastery_status"].apply(lambda x: status_mapping[x] if x in status_mapping else "Q704192"), inplace=True)


Unnamed: 0,qid,P746,qal787,qal788,qal49,qal785,qal50,qal1124,qal1126,S471,P2
0,Q1752920,Q164266,"""ca. 1010/1022""",,+1010-00-00T00:00:00Z/9/J,Q10,+1803-01-01T00:00:00Z/9,,,"""93""",Q141472
1,Q1752917,Q1480653,,,+1220-01-01T00:00:00Z/9/J,,+1567-01-01T00:00:00Z/9/J,,,"""814""",Q160437
2,Q1752915,Q143828,,,+1654-01-01T00:00:00Z/9,,+1804-01-01T00:00:00Z/9,,,"""972""",Q141472
3,Q469450,Q640839,,,+1267-01-01T00:00:00Z/9/J,,+1555-01-01T00:00:00Z/9/J,,,"""2055""",Q141472
4,Q469450,Q1195146,"""um 1210 (?)""",,+1210-00-00T00:00:00Z/9/J,Q10,+1219-01-01T00:00:00Z/9/J,,,"""2055""",
5,Q400529,Q1480653,"""Ende 10. Jahrhundert""",,+0983-00-00T00:00:00Z/7/J,,+1672-01-01T00:00:00Z/9,,,"""3017""",Q160437
6,Q1752917,,,,+1567-01-01T00:00:00Z/9/J,,+1649-01-01T00:00:00Z/9,,,"""814""",
7,Q1752922,Q1195146,"""15. Jahrhundert""",,+1500-00-00T00:00:00Z/7/J,,+1604-01-01T00:00:00Z/9,,,"""40177""",Q141472
8,Q1752912,Q143828,,,+1657-01-01T00:00:00Z/9,,+1679-01-01T00:00:00Z/9,,,"""40329""",Q141472
9,Q1752916,Q640839,"""Mitte 13. Jahrhundert""","""letzte Erwähnung 1553""",+1250-00-00T00:00:00Z/7/J,,+1553-01-01T00:00:00Z/9/J,,,"""3790""",Q141472


Finally, the data is again exported into the various export formats

In [79]:
export_df.to_excel("data/results/monastery_order_connection/monastery_to_order.xlsx", index=False)
export_df.to_csv("data/results/monastery_order_connection/monastery_to_order.csv", index=False, doublequote=False, quoting=csv.QUOTE_NONE, escapechar="§")
with open("data/results/monastery_order_connection/monastery_to_order.tsv", "w") as file:
    file.write(df_to_qs_v1(export_df))