# Notebook 2: Creating Items for Religious Communities
This notebook implements the second step of the Klosterdatenbank-to-FactGrid-Workflow, which is to create Items for religious communities.

As the name suggests, the "Monastery" is the central unit of the monastery database. In the table `gs_monastery`, each row represents a religious community. This is also reflected in the query options of the web interface of the monastery database. Applying the various filter functions always results in a list of religious communities. In the detail view, all relevant information from linked tables is then displayed. Here, the religious community is always at the center. All further information is displayed in connection with the religious community. By integrating with FactGrid, the query options are expanded. For example, it is now possible to query only building complexes. At the same time, the structure of the data model must be taken into account when querying information, such as the geographical location of a religious community at a specific point in time.

In order to import the religious communities to FactGrid, the following workflow will create labels based on the monastery name and it's translation from Notebook 1a. Other than the monastery locations/building complexes, the religious communities are connected to a series of external identifiers, which will party be transferred to FactGrid.

## Preparation
The notebook requires the following libraries to run. If an error occurs, make sure the libraries are installed on your system.

In [1]:
import pandas as pd
import numpy as np

First, the export files are loaded into [Dataframes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html). The dataframes are stored in a dictionary with the keys as the filenames, for easier access.

In [2]:
# Load Access exports
from helper_functions import load_files_from_folder

export_files = load_files_from_folder("data/exports_monasteryDB", "xlsx")

# Create dataframes for each table
dataframes = {key: pd.read_excel(value) for key, value in export_files.items()}

# Add dataframe for monasteries in factGrid (stored in a different directory)
dataframes["building_complexes_in_factgrid"] = pd.read_csv("data/factgrid_data/building_complexes_in_factgrid.csv")
dataframes["monasteries_in_factgrid"] = pd.read_csv("data/factgrid_data/monasteries_in_factgrid.csv")
# Add translation data
dataframes["translated"] = pd.read_csv("data/translation/translated.csv")

The next cell prepares the datasets for the workflow. First, the table `gs_monastery` is filtered for those religious communities that have the status "Online". This means, that the datasets are considered finished and are no longer being actively worked on. Afterwards, only the column `gsn_id` and `monastery_name` are selected. Finally, the column `gsn_id` is filtered against the list of monasteries with monastery database identifiers in FactGrid (`factgrid_data/monasteries_in_factgrid.xlsx`) to make sure that no duplicates are produced.

In [3]:
# Filter for monasteries online
monasteries_online = dataframes["gs_monastery"][dataframes["gs_monastery"]["status"] == "Online"]
# Drop irrelevant columns
prepared_df = monasteries_online[["id_gsn", "monastery_name"]]
prepared_df = prepared_df[~prepared_df["id_gsn"].isin(dataframes["monasteries_in_factgrid"]["KlosterdatenbankID"])]
prepared_df

Unnamed: 0,id_gsn,monastery_name
0,20498,"Kollegiatstift St. Nikolaus, Überlingen"
1,286,Zisterzienserinnenkloster Nicolaurieth
2,60324,Augustinerchorherrenstift Rebdorf
3,30208,"Klarissenkloster Heiligenberg, Jugenheim"
4,4617,"Ursulinenkloster Luzern, Schweiz"
5,11587,Franziskanerterziarinnenkloster Kaltern (Calda...
6,60014,Benediktinerkloster Attel
7,50149,"Deutschordenshaus Judenrode, Gürath"
8,10278,"Benediktinerkloster Brüttelen, Schweiz"
9,211,"Kollegiatstift St. Georg, Wernigerode"


## Labels

The label in FactGrid will be the `monastery_name`.

In [4]:
prepared_df["Lde"] = prepared_df["monastery_name"]
prepared_df

Unnamed: 0,id_gsn,monastery_name,Lde
0,20498,"Kollegiatstift St. Nikolaus, Überlingen","Kollegiatstift St. Nikolaus, Überlingen"
1,286,Zisterzienserinnenkloster Nicolaurieth,Zisterzienserinnenkloster Nicolaurieth
2,60324,Augustinerchorherrenstift Rebdorf,Augustinerchorherrenstift Rebdorf
3,30208,"Klarissenkloster Heiligenberg, Jugenheim","Klarissenkloster Heiligenberg, Jugenheim"
4,4617,"Ursulinenkloster Luzern, Schweiz","Ursulinenkloster Luzern, Schweiz"
5,11587,Franziskanerterziarinnenkloster Kaltern (Calda...,Franziskanerterziarinnenkloster Kaltern (Calda...
6,60014,Benediktinerkloster Attel,Benediktinerkloster Attel
7,50149,"Deutschordenshaus Judenrode, Gürath","Deutschordenshaus Judenrode, Gürath"
8,10278,"Benediktinerkloster Brüttelen, Schweiz","Benediktinerkloster Brüttelen, Schweiz"
9,211,"Kollegiatstift St. Georg, Wernigerode","Kollegiatstift St. Georg, Wernigerode"


As the religious communities should also have an english label, the monastery name is automatically translated. The file `translated.csv` that has been created by notebook 1a - Translation can be reused for this. This means, that if you already ran Notebooks 1 - Building Complexes and 1a - translation completely, you don't have to do anything exept running this notebook to create the monasteries. 

In [5]:
prepared_df = pd.merge(prepared_df, dataframes["translated"], how="left", left_on="Lde", right_on="monastery_Lde")[["id_gsn", "monastery_name", "Lde", "monastery_Len"]].drop_duplicates().rename(columns={"monastery_Len":"Len"})
prepared_df["Lde"] = prepared_df["Lde"].apply(lambda x: f'\"{x}\"')
prepared_df["Len"] = prepared_df["Len"].apply(lambda x: f'\"{x}\"')
prepared_df.drop_duplicates(subset="id_gsn", inplace=True)
prepared_df

Unnamed: 0,id_gsn,monastery_name,Lde,Len
0,20498,"Kollegiatstift St. Nikolaus, Überlingen","""Kollegiatstift St. Nikolaus, Überlingen""","""Collegiate Church of St. Nikolaus, Überlingen"""
1,286,Zisterzienserinnenkloster Nicolaurieth,"""Zisterzienserinnenkloster Nicolaurieth""","""Cistercian nunnery Nicolaurieth"""
2,60324,Augustinerchorherrenstift Rebdorf,"""Augustinerchorherrenstift Rebdorf""","""Canons Regular of St Augustine of Rebdorf"""
3,30208,"Klarissenkloster Heiligenberg, Jugenheim","""Klarissenkloster Heiligenberg, Jugenheim""","""Poor Clare monastery of Heiligenberg, Jugenheim"""
4,4617,"Ursulinenkloster Luzern, Schweiz","""Ursulinenkloster Luzern, Schweiz""","""Ursuline monastery of Lucerne, Switzerland"""
5,11587,Franziskanerterziarinnenkloster Kaltern (Calda...,"""Franziskanerterziarinnenkloster Kaltern (Cald...","""Tertiaries of Kaltern (Caldaro), Italy"""
6,60014,Benediktinerkloster Attel,"""Benediktinerkloster Attel""","""Benedictine monastery Attel"""
7,50149,"Deutschordenshaus Judenrode, Gürath","""Deutschordenshaus Judenrode, Gürath""","""Teutonic Order of Judenrode, Gürath"""
8,10278,"Benediktinerkloster Brüttelen, Schweiz","""Benediktinerkloster Brüttelen, Schweiz""","""Benedictine monastery Brüttelen, Switzerland"""
9,211,"Kollegiatstift St. Georg, Wernigerode","""Kollegiatstift St. Georg, Wernigerode""","""Collegiate Church of St. George, Wernigerode"""


## Link to Germania Sacra and monastery database

Every religious community is linked to the corresponding ID from the monastery database using the existing Property [P471](https://database.factgrid.de/wiki/Property:P471). Also, to state which research project contributed to the dataset, the project-item "Germania Sacra in FactGrid" ([Q153178](https://database.factgrid.de/wiki/Item:Q153178)) is linked using property [P131](https://database.factgrid.de/wiki/Property:P131).

In [6]:
prepared_df["P471"] = prepared_df["id_gsn"].apply(lambda x: f'\"{x}\"')
prepared_df["P131"] = "Q153178"
prepared_df

Unnamed: 0,id_gsn,monastery_name,Lde,Len,P471,P131
0,20498,"Kollegiatstift St. Nikolaus, Überlingen","""Kollegiatstift St. Nikolaus, Überlingen""","""Collegiate Church of St. Nikolaus, Überlingen""","""20498""",Q153178
1,286,Zisterzienserinnenkloster Nicolaurieth,"""Zisterzienserinnenkloster Nicolaurieth""","""Cistercian nunnery Nicolaurieth""","""286""",Q153178
2,60324,Augustinerchorherrenstift Rebdorf,"""Augustinerchorherrenstift Rebdorf""","""Canons Regular of St Augustine of Rebdorf""","""60324""",Q153178
3,30208,"Klarissenkloster Heiligenberg, Jugenheim","""Klarissenkloster Heiligenberg, Jugenheim""","""Poor Clare monastery of Heiligenberg, Jugenheim""","""30208""",Q153178
4,4617,"Ursulinenkloster Luzern, Schweiz","""Ursulinenkloster Luzern, Schweiz""","""Ursuline monastery of Lucerne, Switzerland""","""4617""",Q153178
5,11587,Franziskanerterziarinnenkloster Kaltern (Calda...,"""Franziskanerterziarinnenkloster Kaltern (Cald...","""Tertiaries of Kaltern (Caldaro), Italy""","""11587""",Q153178
6,60014,Benediktinerkloster Attel,"""Benediktinerkloster Attel""","""Benedictine monastery Attel""","""60014""",Q153178
7,50149,"Deutschordenshaus Judenrode, Gürath","""Deutschordenshaus Judenrode, Gürath""","""Teutonic Order of Judenrode, Gürath""","""50149""",Q153178
8,10278,"Benediktinerkloster Brüttelen, Schweiz","""Benediktinerkloster Brüttelen, Schweiz""","""Benedictine monastery Brüttelen, Switzerland""","""10278""",Q153178
9,211,"Kollegiatstift St. Georg, Wernigerode","""Kollegiatstift St. Georg, Wernigerode""","""Collegiate Church of St. George, Wernigerode""","""211""",Q153178


## External Identifiers

Other than for the building complexes, the monastery database provides a range of external URLs for each religious community to connect to existing databases and online ressources, such as wikipedia or wikidata. Some of these identifiers should also be included in FactGrid. The table `gs_external_urls_monastery` contains a mapping between religious communities, types of external identifiers and the specific identifiers that can be used to find the ressource in the corresponfing system. The table `gs_external_url_type_with_factgrid` contains information on how to resolve the identifiers using base URLs and also contains a mapping between URL types and existing FactGrid Identifiers. There are three different cases to consider: A URL type can be linked in FactGrid using a property that was introduced by the community such as the GND-ID ([P76](https://database.factgrid.de/wiki/Property:P76)). Links to other communities within the Wiki-Infrastructure are handled using Sitelinks. For Quickstatements they would be referenced by the letter "S" followed by the short name for the wiki-project. For example, to state the Q-Number for an item in Wikidata, the command would be `Swikidatawiki`, with the `S` standing for "sitelink" and `wikidatawiki` the short name for Wikidata. Third, there are also cases in which an Identifier has no correspondant Property in FactGrid. In these cases, the information is omitted. However, it can always be referenced using the monastery database's original interface which is linked in the references of each statement and via the corresponding identifier-property in FactGrid. 

The following cell processes the information on external URLs and adds them to the table. Please not that in order for the Sitelinks to be imported correctly, the final CSV table has to be processed with the function `df_to_qs_v1` from `helper_functions.py` as done in the cells below.

In [7]:
gs_external_url_type_with_factgrid = dataframes["gs_external_url_type_with_factgrid"].dropna(subset="factgrid_property")
url_factgrid = pd.merge(dataframes["gs_external_urls_monastery"], gs_external_url_type_with_factgrid, how="left", left_on="url_type_id", right_on="id_url_type")[["gsn_id", "url_value", "factgrid_property"]].dropna(subset="factgrid_property")
for index, row in url_factgrid.iterrows():
    if row["gsn_id"] in prepared_df["id_gsn"].values:
        prepared_df.loc[prepared_df["id_gsn"] == row["gsn_id"], row["factgrid_property"]] = f'\"{row["url_value"]}\"'
prepared_df

Unnamed: 0,id_gsn,monastery_name,Lde,Len,P471,P131,P76,Swikidatawiki,Sdewiki,P500,Sfrwiki
0,20498,"Kollegiatstift St. Nikolaus, Überlingen","""Kollegiatstift St. Nikolaus, Überlingen""","""Collegiate Church of St. Nikolaus, Überlingen""","""20498""",Q153178,,"""Q28977181""",,,
1,286,Zisterzienserinnenkloster Nicolaurieth,"""Zisterzienserinnenkloster Nicolaurieth""","""Cistercian nunnery Nicolaurieth""","""286""",Q153178,,"""Q28977495""",,,
2,60324,Augustinerchorherrenstift Rebdorf,"""Augustinerchorherrenstift Rebdorf""","""Canons Regular of St Augustine of Rebdorf""","""60324""",Q153178,"""4380431-7""","""Q1775841""","""Kloster_Rebdorf""",,
3,30208,"Klarissenkloster Heiligenberg, Jugenheim","""Klarissenkloster Heiligenberg, Jugenheim""","""Poor Clare monastery of Heiligenberg, Jugenheim""","""30208""",Q153178,,"""Q19826763""","""Klosterruine_Heiligenberg""",,
4,4617,"Ursulinenkloster Luzern, Schweiz","""Ursulinenkloster Luzern, Schweiz""","""Ursuline monastery of Lucerne, Switzerland""","""4617""",Q153178,"""108614306X""","""Q99023408""",,,
5,11587,Franziskanerterziarinnenkloster Kaltern (Calda...,"""Franziskanerterziarinnenkloster Kaltern (Cald...","""Tertiaries of Kaltern (Caldaro), Italy""","""11587""",Q153178,,,,,
6,60014,Benediktinerkloster Attel,"""Benediktinerkloster Attel""","""Benedictine monastery Attel""","""60014""",Q153178,"""4249098-4""","""Q204075""","""Kloster_Attel""",,
7,50149,"Deutschordenshaus Judenrode, Gürath","""Deutschordenshaus Judenrode, Gürath""","""Teutonic Order of Judenrode, Gürath""","""50149""",Q153178,,"""Q97071272""",,,
8,10278,"Benediktinerkloster Brüttelen, Schweiz","""Benediktinerkloster Brüttelen, Schweiz""","""Benedictine monastery Brüttelen, Switzerland""","""10278""",Q153178,,,,,
9,211,"Kollegiatstift St. Georg, Wernigerode","""Kollegiatstift St. Georg, Wernigerode""","""Collegiate Church of St. George, Wernigerode""","""211""",Q153178,,"""Q28977458""",,,


## Finalizing
Before linking the monasteries to their orders and building complexes, they need to have a Q-number. Load the table that is created below into Quickstatements and save the created monasteries as specified.



In [8]:
from helper_functions import df_to_qs_v1
monastery_upload = prepared_df.drop(columns=["id_gsn", "monastery_name"])
monastery_upload.insert(0, "qid", np.nan)
monastery_upload.to_excel("data/results/monasteries/import_monasteries.xlsx")
monastery_upload.to_csv("data/results/monasteries/import_monasteries.csv")
with open("data/results/monasteries/import_monasteries.tsv", "w") as file:
    file.write(df_to_qs_v1(monastery_upload))
monastery_upload

Lde "Kollegiatstift St. Nikolaus, Überlingen"
Len "Collegiate Church of St. Nikolaus, Überlingen"
P471 "20498"
P131 Q153178
Swikidatawiki "Q28977181"
Lde "Zisterzienserinnenkloster Nicolaurieth"
Len "Cistercian nunnery Nicolaurieth"
P471 "286"
P131 Q153178
Swikidatawiki "Q28977495"
Lde "Augustinerchorherrenstift Rebdorf"
Len "Canons Regular of St Augustine of Rebdorf"
P471 "60324"
P131 Q153178
P76 "4380431-7"
Swikidatawiki "Q1775841"
Sdewiki "Kloster_Rebdorf"
Lde "Klarissenkloster Heiligenberg, Jugenheim"
Len "Poor Clare monastery of Heiligenberg, Jugenheim"
P471 "30208"
P131 Q153178
Swikidatawiki "Q19826763"
Sdewiki "Klosterruine_Heiligenberg"
Lde "Ursulinenkloster Luzern, Schweiz"
Len "Ursuline monastery of Lucerne, Switzerland"
P471 "4617"
P131 Q153178
P76 "108614306X"
Swikidatawiki "Q99023408"
Lde "Franziskanerterziarinnenkloster Kaltern (Caldaro), Italien"
Len "Tertiaries of Kaltern (Caldaro), Italy"
P471 "11587"
P131 Q153178
Lde "Benediktinerkloster Attel"
Len "Benedictine monast

Unnamed: 0,qid,Lde,Len,P471,P131,P76,Swikidatawiki,Sdewiki,P500,Sfrwiki
0,,"""Kollegiatstift St. Nikolaus, Überlingen""","""Collegiate Church of St. Nikolaus, Überlingen""","""20498""",Q153178,,"""Q28977181""",,,
1,,"""Zisterzienserinnenkloster Nicolaurieth""","""Cistercian nunnery Nicolaurieth""","""286""",Q153178,,"""Q28977495""",,,
2,,"""Augustinerchorherrenstift Rebdorf""","""Canons Regular of St Augustine of Rebdorf""","""60324""",Q153178,"""4380431-7""","""Q1775841""","""Kloster_Rebdorf""",,
3,,"""Klarissenkloster Heiligenberg, Jugenheim""","""Poor Clare monastery of Heiligenberg, Jugenheim""","""30208""",Q153178,,"""Q19826763""","""Klosterruine_Heiligenberg""",,
4,,"""Ursulinenkloster Luzern, Schweiz""","""Ursuline monastery of Lucerne, Switzerland""","""4617""",Q153178,"""108614306X""","""Q99023408""",,,
5,,"""Franziskanerterziarinnenkloster Kaltern (Cald...","""Tertiaries of Kaltern (Caldaro), Italy""","""11587""",Q153178,,,,,
6,,"""Benediktinerkloster Attel""","""Benedictine monastery Attel""","""60014""",Q153178,"""4249098-4""","""Q204075""","""Kloster_Attel""",,
7,,"""Deutschordenshaus Judenrode, Gürath""","""Teutonic Order of Judenrode, Gürath""","""50149""",Q153178,,"""Q97071272""",,,
8,,"""Benediktinerkloster Brüttelen, Schweiz""","""Benedictine monastery Brüttelen, Switzerland""","""10278""",Q153178,,,,,
9,,"""Kollegiatstift St. Georg, Wernigerode""","""Collegiate Church of St. George, Wernigerode""","""211""",Q153178,,"""Q28977458""",,,
