#Metacatalogue

Notebook by Melinee Her

Cleans the megacatalogue and harmonizes ORACC data with CDLI catalogue.


#Merging the cdli_cat and megacat_mini

1. Read in the [cdli_cat_short csv](https://docs.google.com/spreadsheets/d/1wiViebAL3xGGwV75cKIr41beRX-Lxuj4u8y1ET8I27Y/edit?usp=sharing)
  * This can be replaced by the workflow in this notebook: https://drive.google.com/file/d/1yhM_8fgF6p89E3qiH1LRK4EgEau7HLuT/view?usp=sharing
  * This notebook also provides the headers / format for LOD triples in FactGrid.
2. Match the CDLI `id_text` with the ORACC `id_text` in ORACC [Megacatalog_short](https://docs.google.com/spreadsheets/d/
1iyvVpt5DrkF22Cd_p_oWHZfHsd6NzcvUlowS2_7Cm2U/edit?usp=sharing)

3. After the preliminary matching of the fields, we will want to make a final subset of the columns we will add to our Wikibase in FactGrid. To do this we also need to see if the values are different for a given field.
* Is there a way to highlight different values which we expect to be the same?

4. Lastly, using this notebook, we will obtain the proper Wikibase formatting for FactGrid header fields: https://drive.google.com/file/d/1yhM_8fgF6p89E3qiH1LRK4EgEau7HLuT/view?usp=sharing
* the subset of 8 fields we selected is only the beginning, but it is a good start.
* If the values are identical we can use the Q-items and build the final CSV for a [QuickStatements]() import.

# Mount Google Drive folder + imports

In [17]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [18]:
#any necessary imports
import pandas as pd
import zipfile
from zipfile import ZipFile
import json
import requests
from tqdm import tqdm
import os
import errno
import re
import random
import numpy as np
import sys
import copy
import networkx as nx
from pathlib import Path
import matplotlib.pyplot as plt
import seaborn as sns
import collections

#Set folder for remote drive
#folder = '/content/drive/My Drive/FactGrid Cuneiform (AWCA)/people/Melinee/'
folder = '/content/drive/MyDrive/Melinee/'

#importing utils for the method which downloads the current text json files
os.chdir(folder + 'network/utils/')
from utils import oracc_download

# This is a user defined module that searches through the texts to find the entities in the text that
# are people and places, to be imported as nodes into the network
os.chdir(folder + 'network/')
import rank_parser4 as rp

pd.set_option('display.max_columns', None)

#Shortening the ORACC Megacatalogue

In this section of the notebook, we will create a smaller version of the megacatalogue, selectin a subset of columns to compare to the CDLI data when forming the metacatalogue.

Retrieving the megacatalogue

In [19]:
#path = '/content/drive/MyDrive/FactGrid Cuneiform (AWCA)/people/Melinee/ORACC_DFS/megacatalogue.csv'
path = folder + 'ORACC_DFS/megacatalogue.csv'
megacatalogue = pd.read_csv(path, low_memory=False, index_col=0)

In [20]:
megacatalogue.head(3)

Unnamed: 0,id_text,langs,project,id_text.1,primary_publication,provenience,pleiades_id,pleiades_coord,excavation_no,archive,atae_attribution,cite_as,collection,composite_witness,credits,date_of_origin,designation,dialect,genre,has_date,language,last_mod_by,last_modified,list_name,material,object_type,period,script,script_remarks,script_type,sealed_marked,subgenre,date,long_date,ancient_creditor,ancient_debtor,date_gen,day,long_date_gen,month,year,eponym,regnal_dates,ruler,ancient_buyer,ancient_seller,seal_mark_type,seal_owner,ancient_author,museum-nos,copies,photos,editions,translations,secondary-literatures,mus_no,publication_history,recipient,title,supergenre,xproject,uri,public,images,trans,q_number,translations-moran-1992s,copy,photo,museum_no,text_comments,accession_no,ancient_year,date_bce,months_recorded,tablet_comments,date_comments,bibilography,author,publication_date,atae_lists,cdli_id,cdli_museum_no,ch_name,ch_no,ch_num_name,comp_wit,display_name,edited_twice,lka_no,oracc_header,other_pub,please_cite,proposed_date,rework,saa_attribution,saa_cdli_id,saa_cdli_museum,saao_lists,secondary_record,short_title,vol_title,volume,pkt_no,cdli_excavation_no,editorial_comments,kar_no,saa_cdli_excavation,kav_no,eponym_title,dossier_list,kah_no,nargd_no,pkta_no,findspot_square,gpa_no,external_id,temp_id,tim_11_no,title_in_date,abl_no,ancient_recipient,cdli_accession_no,pleiades_sender_loc,saa_cdli_accession,sender_loc,sender_loc_coordinates,ct_54_no,astron_date,rma_no,prt_no,dossier,ags_no,add_no,ct_53_no,las_no,adb_no,ex1,ex1id,ex2,ex2id,ex3,ex3id,PRN,BM_ID,Reg_year,Reg_collection,Reg_no,ACQ_method,ACQ_name,ACQ_year,BibXref,Combined_no,DIM_H,PROV_area,PROV_site,OBJ_type,Period_culture,Script_type_1,Genre,Subgenre,S_s_genre,Language,Full_no,DIM_W,S_s_s_genre,Library_colophon,Bib_comment,BibSpec,PROV_building,Ruler,Composition,Tablet_number,Recension,NonLibrary_colophon,Reg_part,DIM_T,PROV_room,Mus_no,ACQ_comment,Historical_ID,Q_no,Day,Month,Year,Emesal,Script_type_2,unique_ID,museum_number,accession_number,sort_order,nme_chapter,object,findspot,museum,textname,bibliography,Original_collection,DIM_D,note,Mus_no_part,goal_year,seals_number,citation,subgenre_remarks,published_collation,period_remarks,fingernails_number,ctn_no,nl_no,BAK,id_composite,person,project name,dynastic_seat,popular_name,text_manu,related_comp_id,manu_number,related_manu,text_equals_manu,text_remarks,has-sources,atf_source,date_entered,date_updated,db_source,photo_up,translation_source,ark_number,id,id_text_int,stt_no,seal_id,object_preservation,object_remarks,exemplars,keywords,last_modified_by,place,series,status,rime_no,height,width,cdli_composite_id,created_by,created_on,last_modified_on,other_names,series_section,primary_edition,session,pleaides_id,date_remarks,editor,findspot_remarks,funder,language_remarks,lemmed,owner,principal,record_id,repository,translit_ed,uploaded,user,bibliography__id_biblio,bibliography__journal_title,bibliography__shortref,bibliography__volume_number,checked,photographed,photographer,proof-read,proof-reader,thickness,bibliography__book_title,pr_joins__pages,notes,provenience_remarks,stratigraphic_level,join_information,year_name_eponym,distribution,sources,provdist,CDLI_problems,bibliography__unpublished_title,ark,atf_up,dates_referenced,surface_preservation,composite,lineart_up,seal_information,google_earth_collection,collection_copyright,author_remarks,cdli_comments,acquisition_history,publication,ancient_date,has-score,last_edited_by,last_edited_on,bdtns_id,reference,Non_Sign_List_Series,series_2,cdli_collation,condition_description,tablet_number_2,number,tradition,corpus,attested,electronic_publication,buy_book,composition_designation,object_ref,lemcount,lemcount_total,lemcount_ave,text_total,lemount_sd,group,handcopy,pq_joins__external_id,qcat_2__id_composite,qcat_2__other_names,attribution,new_subgenre,primary_publication2,catchline,colophon_describing_source,colophon_disclosing_author,P_number_problems,royal_colophon,series_tablet_no,textual_colophon,Additional_P_numbers,exemplar_info,cdli_id_numbers,composite_id,measurements,elevation,Q_designation,exemplar_number,findspots,btto_attribution,century,outline_sort_name,subproject,modern_converted_date,category_oracc,place_of_writing_oracc,prosobab,duplicates,copy_period,geers_no,photo_no,ta_photo_no,c_name,c_item,c_both,c_num_name,c_group,owner_remarks,special_features,hand,columns,headings,orientation,axes,obta_no,type,Delnero_no,Delnero_subgenre_no,deity,museum_URL,Delnero_remarks,Cohen_balag,sub_genre,external_URL_name,external_URL,additional_P_numbers,ancient_day,ancient_month,google_earth_provenience,alternative_years,old_composition_id,accounting_period,Q_objects,Q_places,parallels,new_q,subseries,subseries_section,description,oracc_id,sec1,chap,sec2
0,P522592,0x08000000,tilbarsip,P522592,Til-Barsip 01,Tell Ahmar (Til Barsip),658410.0,"[38.1191944, 36.6749623]",T 01,001 - Hanni Archive (House C1),"Adapted from Stephanie Dalley, “Neo-Assyrian T...",Please cite this page as http://oracc.org/atae...,"National Museum of Syria, Aleppo, Syria",witness,"Adapted from Stephanie Dalley, “Neo-Assyrian T...",00.000.00.00,Til-Barsip 01,Neo-Assyrian,Administrative Record,no,Akkadian,ARMEP,27.01.2022,atae/tilbarsip:P522592,clay,tablet,Neo-Assyrian,Neo-Assyrian,inscribed,Cuneiform,no,list (rations),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,adsd,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,P522593,0x08000000,tilbarsip,P522593,Til-Barsip 02,Tell Ahmar (Til Barsip),658410.0,"[38.1191944, 36.6749623]",T 02,001 - Hanni Archive (House C1),"Adapted from Stephanie Dalley, “Neo-Assyrian T...",Please cite this page as http://oracc.org/atae...,"National Museum of Syria, Aleppo, Syria",witness,"Adapted from Stephanie Dalley, “Neo-Assyrian T...",00.000.07.00,Til-Barsip 02,Neo-Assyrian,Legal Transaction,"yes, but date partially preserved",Akkadian,ARMEP,27.01.2022,atae/tilbarsip:P522593,clay,tablet,Neo-Assyrian,Neo-Assyrian,inscribed,Cuneiform,no,debt note,[...]-VII-[...],"Tašrītu [...th], eponymy of Ašš[ur?-...]",Hanni,Nabû-kin-[...],[...]-VII-[...],[...],"Tašrītu ...th, [eponymy of ...]",VII,[...],,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,adsd,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,P522594,0x08000000,tilbarsip,P522594,Til-Barsip 03,Tell Ahmar (Til Barsip),658410.0,"[38.1191944, 36.6749623]",T 03,001 - Hanni Archive (House C1),"Adapted from Stephanie Dalley, “Neo-Assyrian T...",Please cite this page as http://oracc.org/atae...,"National Museum of Syria, Aleppo, Syria",witness,"Adapted from Stephanie Dalley, “Neo-Assyrian T...",Assurbanipal.limu Bel-Harran-shaddu’a.07.01,Til-Barsip 03,Neo-Assyrian,Legal Transaction,yes,Akkadian,ARMEP,27.01.2022,atae/tilbarsip:P522594,clay,envelope,Neo-Assyrian,Neo-Assyrian,inscribed,Cuneiform,unknown,,650-VII-01,"Tašrītu 1[st], eponymy of Bēl-Harrān-šadd[û’a]",,,650-VII-01,01,"Tašrītu 1st, eponymy of Bēl-Harrān-šaddû’a",VII,650,Bēl-Harrān-šaddû’a,668–ca. 631 BC,Ashurbanipal,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,adsd,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [31]:
#list(megacatalogue.columns)

Selecting for the columns listed above:

In [33]:
megacatalogue_short = megacatalogue[[
    'composite_id', 'id_text', 'language', 'material', 'object_type', 'period','provenience','excavation_no','findspot_square','dates_referenced', 'genre', 'subgenre', 'translation_source',
    'archive','collection','museum-nos','mus_no','museum_no','cdli_museum_no','saa_cdli_museum','Mus_no','museum_number','museum','museum_URL',
    'date_of_origin', 'dialect', 'date', 'supergenre', 'xproject','q_number', 'ancient_year', 'date_bce', 'cdli_id', 'OBJ_type',
    'Period_culture', 'Script_type_1', 'Genre', 'Subgenre', 'S_s_genre','Language', 'Full_no', 'Tablet_number', 'Q_no', 'object',
    'id_composite', 'project name', 'id_text_int', 'seal_id','cdli_composite_id', 'ancient_date', 'bdtns_id', 'designation']].dropna(how='all',axis=1)

megacatalogue_short = megacatalogue_short.rename(columns={'id_composite':'composite_id'})
megacatalogue_short

Unnamed: 0,id_text,language,material,object_type,period,provenience,excavation_no,findspot_square,dates_referenced,genre,subgenre,translation_source,archive,collection,museum-nos,mus_no,museum_no,cdli_museum_no,saa_cdli_museum,Mus_no,museum_number,museum,museum_URL,date_of_origin,dialect,date,supergenre,xproject,q_number,ancient_year,date_bce,cdli_id,OBJ_type,Period_culture,Script_type_1,Genre,Subgenre,S_s_genre,Language,Full_no,Tablet_number,Q_no,object,composite_id,project name,id_text_int,seal_id,cdli_composite_id,ancient_date,bdtns_id,designation
0,P522592,Akkadian,clay,tablet,Neo-Assyrian,Tell Ahmar (Til Barsip),T 01,,,Administrative Record,list (rations),,001 - Hanni Archive (House C1),"National Museum of Syria, Aleppo, Syria",,,,,,,,,,00.000.00.00,Neo-Assyrian,,,,,,,,,,,,,,,,,,,,adsd,,,,,,Til-Barsip 01
1,P522593,Akkadian,clay,tablet,Neo-Assyrian,Tell Ahmar (Til Barsip),T 02,,,Legal Transaction,debt note,,001 - Hanni Archive (House C1),"National Museum of Syria, Aleppo, Syria",,,,,,,,,,00.000.07.00,Neo-Assyrian,[...]-VII-[...],,,,,,,,,,,,,,,,,,,adsd,,,,,,Til-Barsip 02
2,P522594,Akkadian,clay,envelope,Neo-Assyrian,Tell Ahmar (Til Barsip),T 03,,,Legal Transaction,,,001 - Hanni Archive (House C1),"National Museum of Syria, Aleppo, Syria",,,,,,,,,,Assurbanipal.limu Bel-Harran-shaddu’a.07.01,Neo-Assyrian,650-VII-01,,,,,,,,,,,,,,,,,,,adsd,,,,,,Til-Barsip 03
3,P522595,Akkadian,clay,tablet,Neo-Assyrian,Tell Ahmar (Til Barsip),T 04,,,Legal Transaction,debt note (silver),,001 - Hanni Archive (House C1),"National Museum of Syria, Aleppo, Syria",,,,,,,,,,Assurbanipal.limu Bel-Harran-shaddu’a.07.01,Neo-Assyrian,650-VII-01,,,,,,,,,,,,,,,,,,,adsd,,,,,,Til-Barsip 04
4,P522596,Akkadian,clay,tablet,Neo-Assyrian,Tell Ahmar (Til Barsip),T 05,,,Legal Transaction,sales document,,001 - Hanni Archive (House C1),"National Museum of Syria, Aleppo, Syria",,,,,,,,,,00.000.00.00,Neo-Assyrian,,,,,,,,,,,,,,,,,,,,adsd,,,,,,Til-Barsip 05
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
171140,P511531,Sumerian,,seal (not impression),Old Babylonian,Girsu,T 1483,,,Administrative,physical cylinder seal,,,,,,AO 16821,,,,,,,,,,ELA,CDLI,,,,,,,,,,,,,,,,,epsd2/admin/oldbab,,,,,,"AO 16821 = Parrot, Glyptique 228"
171141,X201001,,,,Unknown,Unknown,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,epsd2/admin/oldbab,,,,,,Iraq 82 129
171142,X201002,,,,Unknown,Unknown,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,epsd2/admin/oldbab,,,,,,Iraq 82 133
171143,X225104,,,,Unknown,Unknown,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,epsd2/admin/oldbab,,,,,,"OB Contracts, pl. D4 no. 24"


Exports megacatalogue_short to the folder ORACC_DFS

In [34]:
megacatalogue_short.to_csv(folder + 'ORACC_DFS/megacatalogue_short.csv')

#Working with the CDLI catalogue

This section is loosely based off this [notebook](https://drive.google.com/file/d/1yhM_8fgF6p89E3qiH1LRK4EgEau7HLuT/view?usp=sharing).

In this section we will convert the [cdli_cat.csv](https://media.githubusercontent.com/media/cdli-gh/data/master/cdli_cat.csv) into a smaller dataframe called cdli_cat_short that contains only the following useful columns:

    * Language
    * Material
    * Museum
    * Provenience
    * Object type
    * Genre
    * Period
    * Date

Using this limited version of the cdli catalogue, we can draw comparisons with the oracc data.

Reading in the data:

In [35]:
url = 'https://media.githubusercontent.com/media/cdli-gh/data/master/cdli_cat.csv'

data = pd.read_csv(url, sep=',', low_memory=False)
data.head(3)

Unnamed: 0,accession_no,accounting_period,acquisition_history,alternative_years,ark_number,atf_source,atf_up,author,author_remarks,cdli_collation,cdli_comments,citation,collection,composite_id,condition_description,date_entered,date_of_origin,date_remarks,date_updated,dates_referenced,db_source,designation,dumb,dumb2,electronic_publication,elevation,excavation_no,external_id,findspot_remarks,findspot_square,genre,google_earth_collection,google_earth_provenience,height,id,id_text2,id_text,join_information,language,lineart_up,material,museum_no,object_preservation,object_type,period,period_remarks,photo_up,primary_publication,provenience,provenience_remarks,publication_date,publication_history,published_collation,seal_id,seal_information,stratigraphic_level,subgenre,subgenre_remarks,surface_preservation,text_remarks,thickness,translation_source,width,object_remarks
0,,,,,21198/zz001q0dtm,"Englund, Robert K.",,CDLI,"31x61x18; Lú A 14-16.30-32.48-50; M XVIII, auf...",,,,"Vorderasiatisches Museum, Berlin, Germany",Q000002,,12/4/2001,00.00.00.00,,2020-03-14,00.00.00.00,20011204 protocuneiform_catalogue,"CDLI Lexical 000002, ex. 065",,,,,"W 06435,a",,auf Hügeloberfläche in der Nähe des Südbaues,"M XVIII,?",Lexical,,,31,1,0,1,,undetermined,150ppi 20160630,clay,VAT 01533,,tablet,Uruk III (ca. 3200-3000 BC),,,"CDLI Lexical 000002, ex. 065",Uruk (mod. Warka),,2015ff.,"Englund, Robert K. & Nissen, Hans J., ATU 3 (1...",,,,,Archaic Lu2 A (witness),,,,18,no translation,61,
1,,,,,21198/zz001q0dv4,"Englund, Robert K.",,CDLI,30x48x13; Lú A 13-15.23-25.?; Fundstelle wie W...,,,,"Vorderasiatisches Museum, Berlin, Germany",Q000002,,12/4/2001,00.00.00.00,,2018-10-20,00.00.00.00,20011204 protocuneiform_catalogue,"CDLI Lexical 000002, ex. 066",,,,,"W 06435,b",,auf der Hügeloberfläche in der Nähe des Südbaues,"M XVIII,?",Lexical,,,30,2,0,2,,undetermined,150ppi 20160630,clay,VAT 15263,,tablet,Uruk III (ca. 3200-3000 BC),,,"CDLI Lexical 000002, ex. 066",Uruk (mod. Warka),,2015ff.,"Englund, Robert K. & Nissen, Hans J., ATU 3 (1...",,,,,Archaic Lu2 A (witness),,,,13,no translation,48,
2,,,,,21198/zz001q0dwn,"Englund, Robert K.",,"Englund, Robert K. & Nissen, Hans J.","42x53x19; Vocabulary 9; Qa XVI,2, unter der Ab...",,,,"Vorderasiatisches Museum, Berlin, Germany",,,12/4/2001,,,2020-01-26,,20011204 protocuneiform_catalogue,"ATU 3, pl. 081, W 9123,d",,,,,"W 09123,d",,"unter der Abgleichung der Schicht III, 1,5 m ü...","Qa XVI,2",Lexical,,,42,3,0,3,,undetermined,150ppi 20160630,clay,VAT 15253,,tablet,Uruk IV (ca. 3350-3200 BC),,,"ATU 3, pl. 081, W 9123,d",Uruk (mod. Warka),,1993,"ATU 1, 539",,,,,Archaic Vocabulary (witness),Text category: 15-09; Foreign ID: LVO 9,,,19,no translation,53,


Create a subset of the cdli catalogue named 'data' with the columns of interest

In [36]:
#list(data.columns)

In [39]:
cdli_cat_short = data[['composite_id', 'id_text', 'language', 'object_type', 'period', 'material', 'collection', 'museum_no','provenience', 'excavation_no','findspot_square','date_of_origin',
       'dates_referenced', 'genre', 'subgenre', 'translation_source','designation']].dropna(how='all',axis=1).fillna('')

cdli_cat_short

Unnamed: 0,composite_id,id_text,language,object_type,period,material,collection,museum_no,provenience,excavation_no,findspot_square,date_of_origin,dates_referenced,genre,subgenre,translation_source,designation
0,Q000002,1,undetermined,tablet,Uruk III (ca. 3200-3000 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 01533,Uruk (mod. Warka),"W 06435,a","M XVIII,?",00.00.00.00,00.00.00.00,Lexical,Archaic Lu2 A (witness),no translation,"CDLI Lexical 000002, ex. 065"
1,Q000002,2,undetermined,tablet,Uruk III (ca. 3200-3000 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 15263,Uruk (mod. Warka),"W 06435,b","M XVIII,?",00.00.00.00,00.00.00.00,Lexical,Archaic Lu2 A (witness),no translation,"CDLI Lexical 000002, ex. 066"
2,,3,undetermined,tablet,Uruk IV (ca. 3350-3200 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 15253,Uruk (mod. Warka),"W 09123,d","Qa XVI,2",,,Lexical,Archaic Vocabulary (witness),no translation,"ATU 3, pl. 081, W 9123,d"
3,Q000002,4,undetermined,tablet,Uruk IV (ca. 3350-3200 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 15168,Uruk (mod. Warka),"W 09169,d","Qa XVI,2",00.00.00.00,00.00.00.00,Lexical,Archaic Lu2 A (witness),no translation,"CDLI Lexical 000002, ex. 051"
4,Q000002,5,undetermined,tablet,Uruk IV (ca. 3350-3200 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 15153,Uruk (mod. Warka),"W 09206,k","Qa XVI,2",00.00.00.00,00.00.00.00,Lexical,Archaic Lu2 A (witness),no translation,"CDLI Lexical 000002, ex. 172"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
353278,,532443,Sumerian,tablet,Old Babylonian (ca. 1900-1600 BC),clay,"National Museum of Iraq, Baghdad, Iraq",IM —,Marad (mod. Wanna-wa-Sadum),Marad 047,,,,Legal,,no translation,"Adab Al-Rafidayn 63, 83-92 no. 2"
353279,,532444,Akkadian,tablet,Old Babylonian (ca. 1900-1600 BC),clay,"National Museum of Iraq, Baghdad, Iraq",IM —,,TA 2100,,,,Administrative,,no translation,"Iraq 35, 173-175 (pl. 71-72) TA 2100"
353280,,532445,Akkadian,tablet,Old Babylonian (ca. 1900-1600 BC),clay,"National Museum of Iraq, Baghdad, Iraq",IM —,,TA 2101,,,,Administrative,,no translation,"Iraq 35, 173-175 (pl. 71-72) TA 2101"
353281,,532446,Akkadian,tablet & envelope,Old Babylonian (ca. 1900-1600 BC),clay,"private: William T. Grant Jr., Pelham Manor, N...",Grant 17,Larsa (mod. Tell as-Senkereh),,,,,Legal,,no translation,"AJSL 34, 199-204"


Exports cdli_cat_short to the folder ORACC_DFS

In [40]:
cdli_cat_short.to_csv(folder + 'ORACC_DFS/cdli_cat_short.csv')

#Creating the Metacatalogue

Using the CDLI shortened catalogue and the ORACC shortened catalogue, we can create a Metacatalogue- a combination of both catalogues.

The following codecell allows for importing the megacatalogue short and cdli cat short to run the following cells independently of the first half of this notebook.

In [44]:
#cshortpath = '/content/drive/MyDrive/FactGrid Cuneiform (AWCA)/people/Melinee/ORACC_DFS/cdli_cat_short.csv'
#mshortpath = '/content/drive/MyDrive/FactGrid Cuneiform (AWCA)/people/Melinee/ORACC_DFS/megacatalogue_short.csv'
cshortpath = folder + 'ORACC_DFS/cdli_cat_short.csv'
mshortpath = folder + 'ORACC_DFS/megacatalogue_short.csv'
cdli_cat_short = pd.read_csv(cshortpath, low_memory=False, index_col=0)
megacatalogue_short = pd.read_csv(mshortpath, low_memory=False, index_col=0)

In [45]:
print(cdli_cat_short.shape)
cdli_cat_short.head(3)

(353283, 17)


Unnamed: 0,composite_id,id_text,language,object_type,period,material,collection,museum_no,provenience,excavation_no,findspot_square,date_of_origin,dates_referenced,genre,subgenre,translation_source,designation
0,Q000002,1,undetermined,tablet,Uruk III (ca. 3200-3000 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 01533,Uruk (mod. Warka),"W 06435,a","M XVIII,?",00.00.00.00,00.00.00.00,Lexical,Archaic Lu2 A (witness),no translation,"CDLI Lexical 000002, ex. 065"
1,Q000002,2,undetermined,tablet,Uruk III (ca. 3200-3000 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 15263,Uruk (mod. Warka),"W 06435,b","M XVIII,?",00.00.00.00,00.00.00.00,Lexical,Archaic Lu2 A (witness),no translation,"CDLI Lexical 000002, ex. 066"
2,,3,undetermined,tablet,Uruk IV (ca. 3350-3200 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 15253,Uruk (mod. Warka),"W 09123,d","Qa XVI,2",,,Lexical,Archaic Vocabulary (witness),no translation,"ATU 3, pl. 081, W 9123,d"


Changing the column 'id_text' to be in the form P+6 numbers in order to match with the ORACC ids

In [46]:
cdli_cat_short['id_text'] = ['P'+str(id).zfill(6) for id in cdli_cat_short['id_text']]
cdli_cat_short.head(3)

Unnamed: 0,composite_id,id_text,language,object_type,period,material,collection,museum_no,provenience,excavation_no,findspot_square,date_of_origin,dates_referenced,genre,subgenre,translation_source,designation
0,Q000002,P000001,undetermined,tablet,Uruk III (ca. 3200-3000 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 01533,Uruk (mod. Warka),"W 06435,a","M XVIII,?",00.00.00.00,00.00.00.00,Lexical,Archaic Lu2 A (witness),no translation,"CDLI Lexical 000002, ex. 065"
1,Q000002,P000002,undetermined,tablet,Uruk III (ca. 3200-3000 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 15263,Uruk (mod. Warka),"W 06435,b","M XVIII,?",00.00.00.00,00.00.00.00,Lexical,Archaic Lu2 A (witness),no translation,"CDLI Lexical 000002, ex. 066"
2,,P000003,undetermined,tablet,Uruk IV (ca. 3350-3200 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 15253,Uruk (mod. Warka),"W 09123,d","Qa XVI,2",,,Lexical,Archaic Vocabulary (witness),no translation,"ATU 3, pl. 081, W 9123,d"


In [47]:
print(megacatalogue_short.shape)
megacatalogue_short.head(3)

(171145, 51)


Unnamed: 0,id_text,language,material,object_type,period,provenience,excavation_no,findspot_square,dates_referenced,genre,subgenre,translation_source,archive,collection,museum-nos,mus_no,museum_no,cdli_museum_no,saa_cdli_museum,Mus_no,museum_number,museum,museum_URL,date_of_origin,dialect,date,supergenre,xproject,q_number,ancient_year,date_bce,cdli_id,OBJ_type,Period_culture,Script_type_1,Genre,Subgenre,S_s_genre,Language,Full_no,Tablet_number,Q_no,object,composite_id,project name,id_text_int,seal_id,cdli_composite_id,ancient_date,bdtns_id,designation
0,P522592,Akkadian,clay,tablet,Neo-Assyrian,Tell Ahmar (Til Barsip),T 01,,,Administrative Record,list (rations),,001 - Hanni Archive (House C1),"National Museum of Syria, Aleppo, Syria",,,,,,,,,,00.000.00.00,Neo-Assyrian,,,,,,,,,,,,,,,,,,,,adsd,,,,,,Til-Barsip 01
1,P522593,Akkadian,clay,tablet,Neo-Assyrian,Tell Ahmar (Til Barsip),T 02,,,Legal Transaction,debt note,,001 - Hanni Archive (House C1),"National Museum of Syria, Aleppo, Syria",,,,,,,,,,00.000.07.00,Neo-Assyrian,[...]-VII-[...],,,,,,,,,,,,,,,,,,,adsd,,,,,,Til-Barsip 02
2,P522594,Akkadian,clay,envelope,Neo-Assyrian,Tell Ahmar (Til Barsip),T 03,,,Legal Transaction,,,001 - Hanni Archive (House C1),"National Museum of Syria, Aleppo, Syria",,,,,,,,,,Assurbanipal.limu Bel-Harran-shaddu’a.07.01,Neo-Assyrian,650-VII-01,,,,,,,,,,,,,,,,,,,adsd,,,,,,Til-Barsip 03


Creation of the Metacatalogue

This dataframe merges the cdli_cat_short and megacatalogue_short DFs based on the "id_text" column. It contains the rows where id_text matches and doesnt match. Any column marked as column_x belongs to the CDLI catalogue and any column marked as column_y belongs to the ORACC catalogue.

Note

In [48]:
metacatalogue = pd.merge(cdli_cat_short, megacatalogue_short, on="id_text", how = 'outer')
metacatalogue

Unnamed: 0,composite_id_x,id_text,language_x,object_type_x,period_x,material_x,collection_x,museum_no_x,provenience_x,excavation_no_x,findspot_square_x,date_of_origin_x,dates_referenced_x,genre_x,subgenre_x,translation_source_x,designation_x,language_y,material_y,object_type_y,period_y,provenience_y,excavation_no_y,findspot_square_y,dates_referenced_y,genre_y,subgenre_y,translation_source_y,archive,collection_y,museum-nos,mus_no,museum_no_y,cdli_museum_no,saa_cdli_museum,Mus_no,museum_number,museum,museum_URL,date_of_origin_y,dialect,date,supergenre,xproject,q_number,ancient_year,date_bce,cdli_id,OBJ_type,Period_culture,Script_type_1,Genre,Subgenre,S_s_genre,Language,Full_no,Tablet_number,Q_no,object,composite_id_y,project name,id_text_int,seal_id,cdli_composite_id,ancient_date,bdtns_id,designation_y
0,Q000002,P000001,undetermined,tablet,Uruk III (ca. 3200-3000 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 01533,Uruk (mod. Warka),"W 06435,a","M XVIII,?",00.00.00.00,00.00.00.00,Lexical,Archaic Lu2 A (witness),no translation,"CDLI Lexical 000002, ex. 065",undetermined,clay,tablet,Uruk III,Warka (Uruk),,,,Lexical,,,,"Vorderasiatisches Museum, Berlin, Germany",,,,,,,,,,,,,LEX,CDLI,,,,P000001,,,,,,,,,,,,,armep,,,,,,"W 06435,a"
1,Q000002,P000002,undetermined,tablet,Uruk III (ca. 3200-3000 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 15263,Uruk (mod. Warka),"W 06435,b","M XVIII,?",00.00.00.00,00.00.00.00,Lexical,Archaic Lu2 A (witness),no translation,"CDLI Lexical 000002, ex. 066",undetermined,clay,tablet,Uruk III,Warka (Uruk),,,,Lexical,,,,"Vorderasiatisches Museum, Berlin, Germany",,,,,,,,,,,,,LEX,CDLI,,,,P000002,,,,,,,,,,,,,armep,,,,,,"W 06435,b"
2,,P000003,undetermined,tablet,Uruk IV (ca. 3350-3200 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 15253,Uruk (mod. Warka),"W 09123,d","Qa XVI,2",,,Lexical,Archaic Vocabulary (witness),no translation,"ATU 3, pl. 081, W 9123,d",undetermined,clay,tablet,Uruk IV,Warka (Uruk),,,,Lexical,,,,"Vorderasiatisches Museum, Berlin, Germany",,,,,,,,,,,,,LEX,CDLI,,,,P000003,,,,,,,,,,,,,armep,,,,,,"W 09123,d"
3,Q000002,P000004,undetermined,tablet,Uruk IV (ca. 3350-3200 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 15168,Uruk (mod. Warka),"W 09169,d","Qa XVI,2",00.00.00.00,00.00.00.00,Lexical,Archaic Lu2 A (witness),no translation,"CDLI Lexical 000002, ex. 051",undetermined,clay,tablet,Uruk IV,Warka (Uruk),,,,Lexical,,,,"Vorderasiatisches Museum, Berlin, Germany",,,,,,,,,,,,,LEX,CDLI,,,,P000004,,,,,,,,,,,,,armep,,,,,,"W 09169,d"
4,Q000002,P000005,undetermined,tablet,Uruk IV (ca. 3350-3200 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 15153,Uruk (mod. Warka),"W 09206,k","Qa XVI,2",00.00.00.00,00.00.00.00,Lexical,Archaic Lu2 A (witness),no translation,"CDLI Lexical 000002, ex. 172",undetermined,clay,tablet,Uruk IV,Warka (Uruk),,,,Lexical,,,,"Vorderasiatisches Museum, Berlin, Germany",,,,,,,,,,,,,LEX,CDLI,,,,P000005,,,,,,,,,,,,,armep,,,,,,"W 09206,k"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
366911,,X096677,,,,,,,,,,,,,,,,,,,Unknown,Unknown,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,epsd2/earlylit,,,,,,BM 096677
366912,,X201001,,,,,,,,,,,,,,,,,,,Unknown,Unknown,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,epsd2/admin/oldbab,,,,,,Iraq 82 129
366913,,X201002,,,,,,,,,,,,,,,,,,,Unknown,Unknown,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,epsd2/admin/oldbab,,,,,,Iraq 82 133
366914,,X225104,,,,,,,,,,,,,,,,,,,Unknown,Unknown,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,epsd2/admin/oldbab,,,,,,"OB Contracts, pl. D4 no. 24"


Export the metacatalogue

In [None]:
metacatalogue.to_csv(folder + 'ORACC_DFS/metacatalogue.csv')

##Comparisons between matching CDLI and ORACC headers

One thing we can do is to place like-columns from each dataframe side by side. By reformatting the metacatalogue in this way, it may make columns easier to compare at a glance.


In [50]:
metacatalogue_sorted = metacatalogue[['id_text','composite_id_x', 'composite_id_y', 'language_x', 'language_y', 'material_x', 'material_y', 'object_type_x', 'object_type_y',
    'period_x', 'period_y', 'dates_referenced_x', 'dates_referenced_y', 'genre_x', 'genre_y', 'subgenre_x', 'subgenre_y', 'translation_source_x', 'translation_source_y', 'provenience_x','provenience_y','excavation_no_x','excavation_no_y','findspot_square_x','findspot_square_y',
    'date_of_origin_x', 'date_of_origin_y','museum_no_x','museum_no_y','collection_x','collection_y','dialect', 'date', 'supergenre', 'xproject','q_number', 'ancient_year', 'date_bce', 'cdli_id', 'OBJ_type',
    'designation_x','designation_y','Period_culture', 'Script_type_1', 'Genre', 'Subgenre', 'S_s_genre','Language', 'Full_no', 'Tablet_number', 'Q_no', 'object',
    'project name', 'id_text_int', 'seal_id','cdli_composite_id', 'ancient_date', 'bdtns_id']]

#print(metacatalogue_sorted.shape)
metacatalogue_sorted

Unnamed: 0,id_text,composite_id_x,composite_id_y,language_x,language_y,material_x,material_y,object_type_x,object_type_y,period_x,period_y,dates_referenced_x,dates_referenced_y,genre_x,genre_y,subgenre_x,subgenre_y,translation_source_x,translation_source_y,provenience_x,provenience_y,excavation_no_x,excavation_no_y,findspot_square_x,findspot_square_y,date_of_origin_x,date_of_origin_y,museum_no_x,museum_no_y,collection_x,collection_y,dialect,date,supergenre,xproject,q_number,ancient_year,date_bce,cdli_id,OBJ_type,designation_x,designation_y,Period_culture,Script_type_1,Genre,Subgenre,S_s_genre,Language,Full_no,Tablet_number,Q_no,object,project name,id_text_int,seal_id,cdli_composite_id,ancient_date,bdtns_id
0,P000001,Q000002,,undetermined,undetermined,clay,clay,tablet,tablet,Uruk III (ca. 3200-3000 BC),Uruk III,00.00.00.00,,Lexical,Lexical,Archaic Lu2 A (witness),,no translation,,Uruk (mod. Warka),Warka (Uruk),"W 06435,a",,"M XVIII,?",,00.00.00.00,,VAT 01533,,"Vorderasiatisches Museum, Berlin, Germany","Vorderasiatisches Museum, Berlin, Germany",,,LEX,CDLI,,,,P000001,,"CDLI Lexical 000002, ex. 065","W 06435,a",,,,,,,,,,,armep,,,,,
1,P000002,Q000002,,undetermined,undetermined,clay,clay,tablet,tablet,Uruk III (ca. 3200-3000 BC),Uruk III,00.00.00.00,,Lexical,Lexical,Archaic Lu2 A (witness),,no translation,,Uruk (mod. Warka),Warka (Uruk),"W 06435,b",,"M XVIII,?",,00.00.00.00,,VAT 15263,,"Vorderasiatisches Museum, Berlin, Germany","Vorderasiatisches Museum, Berlin, Germany",,,LEX,CDLI,,,,P000002,,"CDLI Lexical 000002, ex. 066","W 06435,b",,,,,,,,,,,armep,,,,,
2,P000003,,,undetermined,undetermined,clay,clay,tablet,tablet,Uruk IV (ca. 3350-3200 BC),Uruk IV,,,Lexical,Lexical,Archaic Vocabulary (witness),,no translation,,Uruk (mod. Warka),Warka (Uruk),"W 09123,d",,"Qa XVI,2",,,,VAT 15253,,"Vorderasiatisches Museum, Berlin, Germany","Vorderasiatisches Museum, Berlin, Germany",,,LEX,CDLI,,,,P000003,,"ATU 3, pl. 081, W 9123,d","W 09123,d",,,,,,,,,,,armep,,,,,
3,P000004,Q000002,,undetermined,undetermined,clay,clay,tablet,tablet,Uruk IV (ca. 3350-3200 BC),Uruk IV,00.00.00.00,,Lexical,Lexical,Archaic Lu2 A (witness),,no translation,,Uruk (mod. Warka),Warka (Uruk),"W 09169,d",,"Qa XVI,2",,00.00.00.00,,VAT 15168,,"Vorderasiatisches Museum, Berlin, Germany","Vorderasiatisches Museum, Berlin, Germany",,,LEX,CDLI,,,,P000004,,"CDLI Lexical 000002, ex. 051","W 09169,d",,,,,,,,,,,armep,,,,,
4,P000005,Q000002,,undetermined,undetermined,clay,clay,tablet,tablet,Uruk IV (ca. 3350-3200 BC),Uruk IV,00.00.00.00,,Lexical,Lexical,Archaic Lu2 A (witness),,no translation,,Uruk (mod. Warka),Warka (Uruk),"W 09206,k",,"Qa XVI,2",,00.00.00.00,,VAT 15153,,"Vorderasiatisches Museum, Berlin, Germany","Vorderasiatisches Museum, Berlin, Germany",,,LEX,CDLI,,,,P000005,,"CDLI Lexical 000002, ex. 172","W 09206,k",,,,,,,,,,,armep,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
366911,X096677,,,,,,,,,,Unknown,,,,,,,,,,Unknown,,,,,,,,,,,,,,,,,,,,,BM 096677,,,,,,,,,,,epsd2/earlylit,,,,,
366912,X201001,,,,,,,,,,Unknown,,,,,,,,,,Unknown,,,,,,,,,,,,,,,,,,,,,Iraq 82 129,,,,,,,,,,,epsd2/admin/oldbab,,,,,
366913,X201002,,,,,,,,,,Unknown,,,,,,,,,,Unknown,,,,,,,,,,,,,,,,,,,,,Iraq 82 133,,,,,,,,,,,epsd2/admin/oldbab,,,,,
366914,X225104,,,,,,,,,,Unknown,,,,,,,,,,Unknown,,,,,,,,,,,,,,,,,,,,,"OB Contracts, pl. D4 no. 24",,,,,,,,,,,epsd2/admin/oldbab,,,,,


In [52]:
for i in ['composite_id', 'language', 'material', 'object_type', 'period','dates_referenced', 'genre', 'subgenre', 'translation_source', 'provenience','excavation_no','findspot_square','date_of_origin','museum_no','collection','designation']:
   print(i + ' has ' + str((metacatalogue[i+'_x'] == metacatalogue[i+'_y']).sum()) + ' matches.')

composite_id has 0 matches.
language has 118158 matches.
material has 38616 matches.
object_type has 55576 matches.
period has 27 matches.
dates_referenced has 822 matches.
genre has 103568 matches.
subgenre has 13825 matches.
translation_source has 356 matches.
provenience has 1032 matches.
excavation_no has 8008 matches.
findspot_square has 651 matches.
date_of_origin has 8299 matches.
museum_no has 59600 matches.
collection has 38805 matches.
designation has 40564 matches.


As we can see, there are no perfectly overlapping columns for CDLI and ORACC.

## Subsets based on Object Type

This section will have a heavier emphasis on tablet, envelope and cone objects.

As a note, I think there are some incentives to keeping the CDLI and ORACC object type information separate from each other (object_type_x,object_type_y) so we can have a better idea of how each text was catalogued.

**Understanding the Code Below:**

Despite reasons to keep the object type columns separate, an alternate column (object_type_both) can be made by joining the string content of each column together to make quering through the metacatalogue easier. *This can be done for other columns with similar data information (e.g. period, language, ...)*

In order to get the subset of the DF, we can look at if the object type column contains partial strings like 'ablet', 'nvelope', and 'one'. (This is shortcut way of not accounting for upper or lowercase starts to the words tablet, envelope, and cone.)

As we'll see, the ORACC descriptions are either more descriptive (e.g. 'Envelope - Closed') or confusing (e.g. 'brick, stone block, tablet, door socket, cone').

One can query through to find just tables, just envelopes, and just cones as seen in a following code cell.

In [None]:
#Run this cell to look at all types of objects presented from each catalogue in the metacatalogue
#print(metacatalogue['object_type_x'].unique())
#print(metacatalogue['object_type_y'].unique())

In [None]:
metacatalogue['object_type_both'] = metacatalogue['object_type_x'] + ', ' + metacatalogue['object_type_y']

In [None]:
subset = metacatalogue.loc[metacatalogue['object_type_both'].str.contains("ablet|nvelope|one")==True]
subset

Unnamed: 0,composite_id_x,id_text2,id_text,language_x,object_type_x,period_x,material_x,collection_x,museum_no_x,provenience_x,excavation_no_x,findspot_square_x,date_of_origin_x,dates_referenced_x,genre_x,subgenre_x,translation_source_x,id,language_y,material_y,object_type_y,period_y,provenience_y,excavation_no_y,findspot_square_y,dates_referenced_y,genre_y,subgenre_y,translation_source_y,archive,collection_y,museum-nos,mus_no,museum_no_y,cdli_museum_no,saa_cdli_museum,Mus_no,museum_number,museum,museum_URL,date_of_origin_y,dialect,date,supergenre,xproject,q_number,ancient_year,date_bce,cdli_id,OBJ_type,Period_culture,Script_type_1,Genre,Subgenre,S_s_genre,Language,Full_no,Tablet_number,Q_no,object,composite_id_y,project name,id_text_int,seal_id,cdli_composite_id,ancient_date,bdtns_id,object_type_both
0,Q000002,0.0,P000001,undetermined,tablet,Uruk III (ca. 3200-3000 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 01533,Uruk (mod. Warka),"W 06435,a","M XVIII,?",00.00.00.00,00.00.00.00,Lexical,Archaic Lu2 A (witness),no translation,,undetermined,clay,tablet,Uruk III,Warka (Uruk),,,,Lexical,,,,"Vorderasiatisches Museum, Berlin, Germany",,,,,,,,,,,,,LEX,CDLI,,,,P000001,,,,,,,,,,,,,armep,,,,,,"tablet, tablet"
1,Q000002,0.0,P000002,undetermined,tablet,Uruk III (ca. 3200-3000 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 15263,Uruk (mod. Warka),"W 06435,b","M XVIII,?",00.00.00.00,00.00.00.00,Lexical,Archaic Lu2 A (witness),no translation,,undetermined,clay,tablet,Uruk III,Warka (Uruk),,,,Lexical,,,,"Vorderasiatisches Museum, Berlin, Germany",,,,,,,,,,,,,LEX,CDLI,,,,P000002,,,,,,,,,,,,,armep,,,,,,"tablet, tablet"
2,,0.0,P000003,undetermined,tablet,Uruk IV (ca. 3350-3200 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 15253,Uruk (mod. Warka),"W 09123,d","Qa XVI,2",,,Lexical,Archaic Vocabulary (witness),no translation,,undetermined,clay,tablet,Uruk IV,Warka (Uruk),,,,Lexical,,,,"Vorderasiatisches Museum, Berlin, Germany",,,,,,,,,,,,,LEX,CDLI,,,,P000003,,,,,,,,,,,,,armep,,,,,,"tablet, tablet"
3,Q000002,0.0,P000004,undetermined,tablet,Uruk IV (ca. 3350-3200 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 15168,Uruk (mod. Warka),"W 09169,d","Qa XVI,2",00.00.00.00,00.00.00.00,Lexical,Archaic Lu2 A (witness),no translation,,undetermined,clay,tablet,Uruk IV,Warka (Uruk),,,,Lexical,,,,"Vorderasiatisches Museum, Berlin, Germany",,,,,,,,,,,,,LEX,CDLI,,,,P000004,,,,,,,,,,,,,armep,,,,,,"tablet, tablet"
4,Q000002,0.0,P000005,undetermined,tablet,Uruk IV (ca. 3350-3200 BC),clay,"Vorderasiatisches Museum, Berlin, Germany",VAT 15153,Uruk (mod. Warka),"W 09206,k","Qa XVI,2",00.00.00.00,00.00.00.00,Lexical,Archaic Lu2 A (witness),no translation,,undetermined,clay,tablet,Uruk IV,Warka (Uruk),,,,Lexical,,,,"Vorderasiatisches Museum, Berlin, Germany",,,,,,,,,,,,,LEX,CDLI,,,,P000005,,,,,,,,,,,,,armep,,,,,,"tablet, tablet"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
352907,,0.0,P532071,Akkadian,tablet,Middle Assyrian (ca. 1400-1000 BC),clay,"National Museum of Syria, Damascus, Syria ?",,Kahat (mod. Tell Barri),K21.E.3386,,,,Administrative,,,,Middle Assyrian,,tablet,Middle Assyrian,Kahat (Mod. Tell Barri),,,,,Note,,,,,,K21.E.3386,,,,,,,,,,,CDLI,,,,,,,,,,,,,,,,,tcma,,,,,,"tablet, tablet"
352908,,0.0,P532072,Akkadian,tablet,Middle Assyrian (ca. 1400-1000 BC),clay,"National Museum of Syria, Damascus, Syria ?",,Kahat (mod. Tell Barri),K9.T1,,,,Letter,,,,Middle Assyrian,,tablet,Middle Assyrian,Kahat (Mod. Tell Barri),,,,,Letter order,,,,,,K9.T1,,,,,,,,,,,CDLI,,,,,,,,,,,,,,,,,tcma,,,,,,"tablet, tablet"
352909,,0.0,P532073,Akkadian,tablet,Middle Assyrian (ca. 1400-1000 BC),clay,"National Museum of Syria, Damascus, Syria ?",,Kahat (mod. Tell Barri),K9.T2,,,,Letter,,,,Middle Assyrian,,tablet,Middle Assyrian,Kahat (Mod. Tell Barri),,,,,Lexical List,,,,,,K9.T2,,,,,,,,,,,CDLI,,,,,,,,,,,,,,,,,tcma,,,,,,"tablet, tablet"
352910,,0.0,P532074,Akkadian,tablet,Middle Assyrian (ca. 1400-1000 BC),clay,"National Museum of Syria, Damascus, Syria ?",,Kahat (mod. Tell Barri),K9.T3,,,,Administrative,,,,Middle Assyrian,,tablet,Middle Assyrian,Kahat (Mod. Tell Barri),,,,,List,,,,,,K9.T3,,,,,,,,,,,CDLI,,,,,,,,,,,,,,,,,tcma,,,,,,"tablet, tablet"


To query the metacatalogue for a specific object type:

```
metacatalogue.loc[metacatalogue['object_type_both'].str.contains('object_of_interest')==True]
```



For example here's a subset of the metacatalogue for objects that are marked as a type of seal.

In [None]:
subset_seal = metacatalogue.loc[metacatalogue['object_type_both'].str.contains('seal|Seal')==True]
subset_seal

Unnamed: 0,composite_id_x,id_text2,id_text,language_x,object_type_x,period_x,material_x,collection_x,museum_no_x,provenience_x,excavation_no_x,findspot_square_x,date_of_origin_x,dates_referenced_x,genre_x,subgenre_x,translation_source_x,id,language_y,material_y,object_type_y,period_y,provenience_y,excavation_no_y,findspot_square_y,dates_referenced_y,genre_y,subgenre_y,translation_source_y,archive,collection_y,museum-nos,mus_no,museum_no_y,cdli_museum_no,saa_cdli_museum,Mus_no,museum_number,museum,museum_URL,date_of_origin_y,dialect,date,supergenre,xproject,q_number,ancient_year,date_bce,cdli_id,OBJ_type,Period_culture,Script_type_1,Genre,Subgenre,S_s_genre,Language,Full_no,Tablet_number,Q_no,object,composite_id_y,project name,id_text_int,seal_id,cdli_composite_id,ancient_date,bdtns_id,object_type_both
9024,,0.0,P100295,Sumerian,tablet,Ur III (ca. 2100-2000 BC),clay,"private: Böllinger, Endorf, Germany",Böllinger 6,uncertain (mod. uncertain),,,--.--.00.00,--.--.00.00,Administrative,,no translation,,Sumerian,,Cylinder Seal,Ur III,unknown,,,,Administrative,,,,,,,Boellinger 6,,,,,,,0000 - 00 - 00,,,ELA,CDLI,,,,,,,,,,,,,,,,,babcity,,,,,18056.0,"tablet, Cylinder Seal"
24607,,0.0,P115924,Sumerian,sealing,Ur III (ca. 2100-2000 BC),clay,"Bibliothèque de Versailles, Versailles, France",BV 19,Umma (mod. Tell Jokha),,,00.00.00.00,00.00.00.00,Administrative,,no translation,,Sumerian,,Label,Ur III,Umma,,,,Administrative,,,,,,,BV 19,,,,,,,0000 - 00 - 00,,,ELA,CDLI,,,,,,,,,,,,,,,,,babcity,,,,,23252.0,"sealing, Label"
32007,,0.0,P123204,Sumerian,tablet,Ur III (ca. 2100-2000 BC),clay,"Oriental Institute, University of Chicago, Chi...",OIM A—,Ešnunna (mod. Tell Asmar),TA 1930 0277,,Šulgi.--.00.00,Šulgi.--.00.00,Administrative,,no translation,,Sumerian,,Clay sealing,Ur III,Ešnunna,TA 1930 0277,,,Administrative,,,,,,,,,,,,,,0000 - 00 - 00,,,ELA,CDLI,,,,,,,,,,,,,,,,,babcity,,,,,12191.0,"tablet, Clay sealing"
32008,,0.0,P123205,Sumerian,tablet,Ur III (ca. 2100-2000 BC),clay,"Oriental Institute, University of Chicago, Chi...",OIM A—,Ešnunna (mod. Tell Asmar),TA 1931 0320,,Amar-Sin.--.00.00,Amar-Sin.--.00.00,Administrative,,no translation,,Sumerian,,Clay sealing,Ur III,Ešnunna,TA 1931 0320,,,Royal Inscription,,,,,,,,,,,,,,0000 - 00 - 00,,,LIT,CDLI,,,,,,,,,,,,,,,,,babcity,,,,,12192.0,"tablet, Clay sealing"
32010,,0.0,P123207,Sumerian,tablet,Ur III (ca. 2100-2000 BC),clay,"Oriental Institute, University of Chicago, Chi...",OIM A—,Ešnunna (mod. Tell Asmar),TA 1931 0379,,--.--.00.00,--.--.00.00,Administrative,,no translation,,Sumerian,,Clay sealing,Ur III,Ešnunna,TA 1931 0379,,,Administrative,,,,,,,,,,,,,,0000 - 00 - 00,,,ELA,CDLI,,,,,,,,,,,,,,,,,babcity,,,,,12194.0,"tablet, Clay sealing"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
343013,,0.0,P522007,Sumerian,seal (not impression),Ur III (ca. 2100-2000 BC),stone ?,,,Girsu (mod. Tello),,,,,Administrative,,,,Sumerian,,seal (not impression),Ur III,Girsu,,,,Administrative,,,,,,,,,,,,,,,,,ELA,CDLI,,,,,,,,,,,,,,,,,babcity,,,,,,"seal (not impression), seal (not impression)"
343014,,0.0,P522008,Sumerian,seal (not impression),Ur III (ca. 2100-2000 BC),stone ?,,,Girsu (mod. Tello),,,,,Administrative,,,,Sumerian,,seal (not impression),Ur III,Girsu,,,,Administrative,,,,,,,,,,,,,,,,,ELA,CDLI,,,,,,,,,,,,,,,,,babcity,,,,,,"seal (not impression), seal (not impression)"
343015,,0.0,P522009,Sumerian,seal (not impression),Ur III (ca. 2100-2000 BC),stone ?,,,Girsu (mod. Tello),,,,,Administrative,,,,Sumerian,,seal (not impression),Ur III,Girsu,,,,Administrative,,,,,,,,,,,,,,,,,ELA,CDLI,,,,,,,,,,,,,,,,,babcity,,,,,,"seal (not impression), seal (not impression)"
343016,,0.0,P522010,Sumerian,seal (not impression),Ur III (ca. 2100-2000 BC),stone ?,,,Umma (mod. Tell Jokha),,,,,Administrative,,,,Sumerian,,seal (not impression),Ur III,Umma,,,,Administrative,,,,,,,,,,,,,,,,,ELA,CDLI,,,,,,,,,,,,,,,,,babcity,,,,,,"seal (not impression), seal (not impression)"


#Important Counts & Exporting the Metacatalogue:

Metacatalogue : 366916 rows × 46 columns

Matching IDs : 157512 texts

Tablet Texts : 102035

Envelope Texts : 3798

Cone Texts : 5667

Seal Texts : 6472

In [53]:
#exports metacatalogue to the folder ORACC_DFS
metacatalogue.to_csv(folder + 'ORACC_DFS/metacatalogue.csv')