### Script purpose: Ming office title coding

1. General principles:
    - A comprehensive ontological structure of office title includes four parts: `Classification + Administrative Unit (optional) + Function (optional) + Title`
    - Each part corresponds to a table.
    - Separate `coding_value` and `raw_value`.
        - `raw_value`: the string appeared in original book text.
        - `coding_value`: the revised string that can be successfully coded.

2. Notes:
    - `Office title by LENGTH` table merges CBDB Ming office title with UCI table. Duplicates in CBDB table are removed in this table, i.e., this is the clean table we are going to use.

In [1]:
% matplotlib inline
import sqlite3
import pandas as pd
import networkx as nx
import xlrd
import matplotlib.pyplot as plt
import math
import warnings
from tqdm import tqdm
import re
warnings.filterwarnings('ignore')
plt.style.use('ggplot')

### `c_office_chn` from UCI.

In [2]:
df_uci_office_ming=pd.read_csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vSCmhbCk1B-9jjINMhy_VwikM6_Sn7bjdO7b_vaZJkVcYCCYlWVlhYVCFtAs0fPX-UEO62GWxaX1qAS/pub?gid=630627340&single=true&output=tsv',
                                    sep='\t')
df_uci_office_ming=df_uci_office_ming[['c_office_id（Dictionary Ser#)','Institution 1', 'Institution 2', 'Institution 3', 'c_office_chn']].rename(columns={'c_office_id（Dictionary Ser#)':'c_office_id'})
df_uci_office_ming['c_office_chn']=[s.replace('/', '') for s in df_uci_office_ming['c_office_chn']]
df_uci_office_ming.sample(3)

Unnamed: 0,c_office_id,Institution 1,Institution 2,Institution 3,c_office_chn
1967,1870,司法監察機構類 Legislation and Censorship,監察門 Censorate,總督巡撫官 Supreme Commanders and Grand Coordinators,陝西巡撫
1242,2007,京衛京營與中央軍事官署類 Central and Capital Militaries,五城兵馬指揮司門 Wardens' Offices of the Five Wards,南城兵馬指揮司 The South Warden's Office,吏目
3653,70706,皇族宮廷類 Imperial Family and Royal Court,宦官門 Eunuch Offices,內官監 The Directorate of Palace Eunuchs,掌印太監


In [3]:
df_uci_office_ming['inst_1_chn']=[str(s).split()[0].replace('nan', '') for s in df_uci_office_ming['Institution 1']]
df_uci_office_ming['inst_2_chn']=[str(s).split()[0].replace('nan', '') for s in df_uci_office_ming['Institution 2']]
df_uci_office_ming['inst_3_chn']=[str(s).split()[0].replace('nan', '') for s in df_uci_office_ming['Institution 3']]
df_uci_office_ming['uci_value']=df_uci_office_ming['inst_1_chn']+df_uci_office_ming['inst_2_chn']+df_uci_office_ming['inst_3_chn']+df_uci_office_ming['c_office_chn']
df_uci_office_ming['c_office_id']=pd.to_numeric(df_uci_office_ming['c_office_id'], errors='coerce')
df_uci_office_ming.drop(['inst_1_chn', 'inst_2_chn', 'inst_3_chn', 'Institution 1', 'Institution 2', 'Institution 3', 'c_office_chn'], axis=1, inplace=True)

In [4]:
df_uci_office_ming[df_uci_office_ming['c_office_id'].duplicated()]

Unnamed: 0,c_office_id,uci_value
1130,71508.0,中央輔佐官署類秘書門翰林院直文淵閣侍講學士
1195,71503.0,中央輔佐官署類考官門會試官知貢舉官
1219,72165.0,中央輔佐官署類考官門鄉試官順天同考官
1282,,京衛京營與中央軍事官署類京營門京營京營總兵官
2314,71504.0,地方官署類省官門行中書省理問所知事
2718,71274.0,地方軍事與治安機構類招討經略安撫使門宣撫司宣撫司經歷
2821,,文武散階勛爵類勛爵門伯平涼伯
2842,,文武散階勛爵類勛爵門伯新城伯
2862,,文武散階勛爵類勛爵門伯永定伯
2882,,文武散階勛爵類勛爵門伯鎮遠伯


In [5]:
df_uci_office_ming.drop(df_uci_office_ming[df_uci_office_ming['c_office_id'].duplicated()].index, inplace=True)
df_uci_office_ming.set_index('c_office_id', inplace=True)
df_uci_office_ming.sample(3)

Unnamed: 0_level_0,uci_value
c_office_id,Unnamed: 1_level_1
541.0,皇族宮廷類宦官門外差宦官御藥房醫官
70211.0,京衛京營與中央軍事官署類大都督府門前軍都督府經歷司經歷
2656.0,地方官署類省官門行中書省右司照磨


### `c_office_chn` from CBDB uncleaned, and merge with UCI.

In [6]:
conn = sqlite3.connect('../../SQL/20170424CBDBauUserSqlite.db')
df_cbdb_office_ming=pd.read_sql_query("SELECT * FROM OFFICE_CODES", conn)[pd.read_sql_query("SELECT * FROM OFFICE_CODES", conn).c_dy==19].set_index('c_office_id')
df_cbdb_office_ming.sample(3)

Unnamed: 0_level_0,tts_sysno,c_dy,c_office_pinyin,c_office_chn,c_office_pinyin_alt,c_office_chn_alt,c_office_trans,c_office_trans_alt,c_source,c_pages,c_notes,c_category_1,c_category_2,c_category_3,c_category_4,c_office_id_old
c_office_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
71697,18210,19.0,an lu hou,安陸侯,,,[Not Yet Translated],爵,,,,[Not Yet Translated],,,,0
70247,16760,19.0,du li shan chang,督理山廠,du li * shan chang;,督理*山廠;,Supervisory Manager of Coal and Firewood Range,,,,,Supervisory Manager of Coal and Firewood Range,,,,0
71134,17647,19.0,xing bu shan dong si lang zhong,刑部山東司郎中,,,Director of the Shandong Bureau of the Ministr...,,,,,Director of the Shandong Bureau of the Ministr...,,,,0


In [7]:
for index in tqdm(df_uci_office_ming.index):
    if index in df_cbdb_office_ming.index:
        df_uci_office_ming.loc[index, 'cbdb_value']=df_cbdb_office_ming.loc[index, 'c_office_chn']
        df_uci_office_ming.loc[index, 'tts_sysno']=df_cbdb_office_ming.loc[index, 'tts_sysno']
        df_uci_office_ming.loc[index, 'c_office_pinyin']=df_cbdb_office_ming.loc[index, 'c_office_pinyin']
        df_uci_office_ming.loc[index, 'c_office_pinyin_alt']=df_cbdb_office_ming.loc[index, 'c_office_pinyin_alt']
        df_uci_office_ming.loc[index, 'c_office_chn_alt']=df_cbdb_office_ming.loc[index, 'c_office_chn_alt']
        df_uci_office_ming.loc[index, 'c_office_trans']=df_cbdb_office_ming.loc[index, 'c_office_trans']
        df_uci_office_ming.loc[index, 'c_office_trans_alt']=df_cbdb_office_ming.loc[index, 'c_office_trans_alt']
        df_uci_office_ming.loc[index, 'c_source']=df_cbdb_office_ming.loc[index, 'c_source']
        df_uci_office_ming.loc[index, 'c_pages']=df_cbdb_office_ming.loc[index, 'c_pages']
        df_uci_office_ming.loc[index, 'c_notes']=df_cbdb_office_ming.loc[index, 'c_notes']
        df_uci_office_ming.loc[index, 'c_category_1']=df_cbdb_office_ming.loc[index, 'c_category_1']
        df_uci_office_ming.loc[index, 'c_category_2']=df_cbdb_office_ming.loc[index, 'c_category_2']
        df_uci_office_ming.loc[index, 'c_category_3']=df_cbdb_office_ming.loc[index, 'c_category_3']
        df_uci_office_ming.loc[index, 'c_category_4']=df_cbdb_office_ming.loc[index, 'c_category_4']
        df_uci_office_ming.loc[index, 'c_office_id_old']=df_cbdb_office_ming.loc[index, 'c_office_id_old']
df_uci_office_ming.loc[index, 'c_dy']=19

100%|██████████| 4304/4304 [00:30<00:00, 139.91it/s]


In [8]:
df_office_ming_merged=df_uci_office_ming
df_office_ming_merged.sample(3)

Unnamed: 0_level_0,uci_value,cbdb_value,tts_sysno,c_office_pinyin,c_office_pinyin_alt,c_office_chn_alt,c_office_trans,c_office_trans_alt,c_source,c_pages,c_notes,c_category_1,c_category_2,c_category_3,c_category_4,c_office_id_old,c_dy
c_office_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
71215.0,中央輔佐官署類六科門工科行在工科掌科給事中,行在工科掌科給事中,17728.0,xing zai gong ke zhang ke ji shi zhong,,,Auxiliary Supervising Secretary of the Office ...,,,,,Auxiliary Supervising Secretary of the Office ...,,,,0.0,
294.0,皇族宮廷類宦官門尚膳監僉書太監,,,,,,,,,,,,,,,,
256.0,皇族宮廷類宦官門司設監右少監,,,,,,,,,,,,,,,,


### Coding `c_office_chn`.

### TODO:
    - [×] Subtract titles from right.
    - [×] Add appointment type.
    - [×] Use online revised CLS table.

In [9]:
df_adm=pd.read_csv('../data_output/C_OT_ADM.tsv', sep='\t').set_index('c_ot_adm_id')
df_cls=pd.read_csv('../data_output/C_OT_CLS.tsv', sep='\t').set_index('c_ot_cls_id')
df_tit=pd.read_csv('../data_output/C_OT_TIT.tsv', sep='\t').set_index('c_ot_tit_id')
df_func=pd.read_csv('../data_output/C_OT_FUNC.tsv', sep='\t').set_index('c_ot_func_id')
df_app_ty=pd.read_csv('../data_output/APPOINTMENT_TYPE_CODES.tsv', sep='\t').set_index('c_appt_type_code')

In [10]:
df_tit.sample(3)

Unnamed: 0_level_0,c_ot_tit_chinm,value_to_run,c_ot_tit_desc,c_ot_tit_start,c_ot_tit_end,length
c_ot_tit_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1120,瀋王,2.0,,,,2
1411,貼刑,1.0,,,,2
98,右宗正,1.0,,,,3


In [11]:
df_office_ming_merged['c_ot_coding']=df_office_ming_merged['uci_value']

In [12]:
# Replace titles (only one title in an office title string).
for ming_ot_index in tqdm(df_office_ming_merged.index):
    ming_ot = df_office_ming_merged.loc[ming_ot_index, 'c_ot_coding']
    ming_ot_done=[]
    for tit_index in df_tit.index:
        tit=df_tit.loc[tit_index, 'c_ot_tit_chinm']
        if ming_ot.endswith(tit) and ming_ot not in ming_ot_done:
            df_office_ming_merged.loc[ming_ot_index, 'c_ot_tit_chinm']=tit
            df_office_ming_merged.loc[ming_ot_index, 'c_ot_coding']=ming_ot.split(tit)[0]+'T'+str(tit_index)
            ming_ot_done.append(ming_ot)
df_office_ming_merged.sample(3)

100%|██████████| 4304/4304 [02:18<00:00, 30.97it/s]


Unnamed: 0_level_0,uci_value,cbdb_value,tts_sysno,c_office_pinyin,c_office_pinyin_alt,c_office_chn_alt,c_office_trans,c_office_trans_alt,c_source,c_pages,c_notes,c_category_1,c_category_2,c_category_3,c_category_4,c_office_id_old,c_dy,c_ot_coding,c_ot_tit_chinm
c_office_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
71232.0,中央中樞官署類六部門禮部行在禮部尚書,行在禮部尚書,17745.0,xing zai li bu shang shu,,,Auxiliary Minister of Rites,,,,,Auxiliary Minister of Rites,,,,0.0,,中央中樞官署類六部門禮部行在禮部T1090,尚書
70212.0,南京官署類南京軍事官署門南京五軍都督府右軍都督府都督僉事,都督府僉事,16725.0,du du fu qian shi,,,Assistant in a Chief Military Commission,,,,,Assistant in a Chief Military Commission,,,,0.0,,南京官署類南京軍事官署門南京五軍都督府右軍都督府都督T994,僉事
1060.0,中央中樞官署類六部門戶部在京行用庫典史,,,,,,,,,,,,,,,,,中央中樞官署類六部門戶部在京行用庫T798,典史


In [13]:
# Replace Classifications (can be multiple units in an office title string).
for ming_ot_index in tqdm(df_office_ming_merged.index):
    cls_list=[]
    for cls_index in df_cls.index:
        cls=df_cls.loc[cls_index, 'c_ot_cls_chinm']
        c_ot_coding = df_office_ming_merged.loc[ming_ot_index, 'c_ot_coding']
        if cls in c_ot_coding:
            cls_list.append(cls)
            df_office_ming_merged.loc[ming_ot_index, 'c_ot_coding']=c_ot_coding.replace(cls, 'C'+str(cls_index))
    if cls_list!=[]:
        df_office_ming_merged.loc[ming_ot_index, 'c_ot_cls_chinm']='#'.join(cls_list)
df_office_ming_merged.sample(3)

100%|██████████| 4304/4304 [00:32<00:00, 133.40it/s]


Unnamed: 0_level_0,uci_value,cbdb_value,tts_sysno,c_office_pinyin,c_office_pinyin_alt,c_office_chn_alt,c_office_trans,c_office_trans_alt,c_source,c_pages,c_notes,c_category_1,c_category_2,c_category_3,c_category_4,c_office_id_old,c_dy,c_ot_coding,c_ot_tit_chinm,c_ot_cls_chinm
c_office_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
71417.0,中央輔佐官署類秘書門典籍實錄修纂玉牒館副總裁,玉牒館副總裁,17930.0,yu die guan fu zong cai,,,Vice Director-general of the Imperial Genealog...,,,,,Vice Director-general of the Imperial Genealog...,,,,0.0,,C8C48典籍實錄修纂玉牒館T197,副總裁,中央輔佐官署類#秘書門
410.0,皇族宮廷類宦官門兵仗局大使,,,,,,,,,,,,,,,,,C23C47兵仗局T1243,大使,皇族宮廷類#宦官門
1567.0,中央輔佐官署類寺監門光祿司卿,,,,,,,,,,,,,,,,,C8C44光祿司卿,,中央輔佐官署類#寺監門


In [14]:
# Replace admin units (can be multiple units in an office title string).
for ming_ot_index in tqdm(df_office_ming_merged.index):
    adm_list=[]
    for adm_index in df_adm.index:
        adm=df_adm.loc[adm_index, 'c_ot_adm_chinm']
        c_ot_coding = df_office_ming_merged.loc[ming_ot_index, 'c_ot_coding']
        if adm in c_ot_coding:
            adm_list.append(adm)
            df_office_ming_merged.loc[ming_ot_index, 'c_ot_coding']=c_ot_coding.replace(adm, 'A'+str(adm_index))
    if adm_list!=[]:
        df_office_ming_merged.loc[ming_ot_index, 'c_ot_adm_chinm']='#'.join(adm_list)
df_office_ming_merged.sample(3)

100%|██████████| 4304/4304 [01:54<00:00, 37.71it/s]


Unnamed: 0_level_0,uci_value,cbdb_value,tts_sysno,c_office_pinyin,c_office_pinyin_alt,c_office_chn_alt,c_office_trans,c_office_trans_alt,c_source,c_pages,...,c_category_1,c_category_2,c_category_3,c_category_4,c_office_id_old,c_dy,c_ot_coding,c_ot_tit_chinm,c_ot_cls_chinm,c_ot_adm_chinm
c_office_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
3062.0,地方軍事與治安機構類番夷都指揮使司門番夷衞指揮使司小旗,,,,,,,,,,...,,,,,,,C1C5番夷衞指揮使司小旗,,地方軍事與治安機構類#番夷都指揮使司門,
71851.0,文武散階勛爵類勛爵門侯吉安侯,吉安侯,18364.0,ji an hou,,,[Not Yet Translated],爵,,,...,[Not Yet Translated],,,,0.0,,C6C41C71T383,吉安侯,文武散階勛爵類#勛爵門#侯,
1030.0,中央中樞官署類六部門戶部御馬倉副使,,,,,,,,,,...,,,,,,,C11C45A694A542T1047,副使,中央中樞官署類#六部門,御馬倉#戶部


In [15]:
# Replace functional units (can be multiple units in an office title string).
for ming_ot_index in tqdm(df_office_ming_merged.index):
    func_list=[]
    for func_index in df_func.index:
        func=df_func.loc[func_index, 'c_ot_func_chinm']
        c_ot_coding = df_office_ming_merged.loc[ming_ot_index, 'c_ot_coding']
        if func in c_ot_coding:
            func_list.append(func)
            df_office_ming_merged.loc[ming_ot_index, 'c_ot_coding']=c_ot_coding.replace(func, 'F'+str(func_index))
    if func_list!=[]:
        df_office_ming_merged.loc[ming_ot_index, 'c_ot_func_chinm']='#'.join(func_list)
df_office_ming_merged.sample(3)

100%|██████████| 4304/4304 [00:13<00:00, 308.24it/s]


Unnamed: 0_level_0,uci_value,cbdb_value,tts_sysno,c_office_pinyin,c_office_pinyin_alt,c_office_chn_alt,c_office_trans,c_office_trans_alt,c_source,c_pages,...,c_category_2,c_category_3,c_category_4,c_office_id_old,c_dy,c_ot_coding,c_ot_tit_chinm,c_ot_cls_chinm,c_ot_adm_chinm,c_ot_func_chinm
c_office_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2855.0,地方軍事與治安機構類地區軍官門保定軍官鎮守總兵官,,,,,,,,,,...,,,,,,C1C14C84鎮守T258,總兵官,地方軍事與治安機構類#地區軍官門#保定軍官,,
1013.0,中央中樞官署類六部門戶部丙字庫大使,,,,,,,,,,...,,,,,,C11C45A694A305T1243,大使,中央中樞官署類#六部門,丙字庫#戶部,
71914.0,文武散階勛爵類勛爵門王寧河王,寧河王,18427.0,ning he wang,,,[Not Yet Translated],爵,,,...,,,,0.0,,C6C41王T717,寧河王,文武散階勛爵類#勛爵門,,


In [16]:
# Replace appointment type.
for ming_ot_index in tqdm(df_office_ming_merged.index):
    app_ty_list=[]
    for app_ty_index in df_app_ty.index:
        app_ty=df_app_ty.loc[app_ty_index, 'c_appt_type_desc_chn']
        c_ot_coding = df_office_ming_merged.loc[ming_ot_index, 'c_ot_coding']
        if app_ty in c_ot_coding:
            app_ty_list.append(app_ty)
            df_office_ming_merged.loc[ming_ot_index, 'c_ot_coding']=c_ot_coding.replace(app_ty, 'P'+str(app_ty_index))
    if app_ty_list!=[]:
        df_office_ming_merged.loc[ming_ot_index, 'c_ot_app_chinm']='#'.join(app_ty_list)
df_office_ming_merged.sample(3)

100%|██████████| 4304/4304 [00:16<00:00, 257.65it/s]


Unnamed: 0_level_0,uci_value,cbdb_value,tts_sysno,c_office_pinyin,c_office_pinyin_alt,c_office_chn_alt,c_office_trans,c_office_trans_alt,c_source,c_pages,...,c_category_3,c_category_4,c_office_id_old,c_dy,c_ot_coding,c_ot_tit_chinm,c_ot_cls_chinm,c_ot_adm_chinm,c_ot_func_chinm,c_ot_app_chinm
c_office_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1931.0,京衛京營與中央軍事官署類大都督府門中軍都督府/左斷事,,,,,,,,,,...,,,,,C0C20A1097/T133,左斷事,京衛京營與中央軍事官署類#大都督府門,中軍都督府,,
71982.0,文武散階勛爵類勛爵門侯太平侯,太平侯,18495.0,tai ping hou,,,[Not Yet Translated],爵,,,...,,,0.0,,C6C41C71T459,太平侯,文武散階勛爵類#勛爵門#侯,,,
70959.0,中央輔佐官署類秘書門通政使司通政使,通政使,17472.0,tong zheng shi,,,Commissioner of the Office of Transmission,,,,...,,,0.0,,C8C48T479,通政使,中央輔佐官署類#秘書門,,,


In [17]:
for index in tqdm(df_office_ming_merged.index):
    c_ot_coding=df_office_ming_merged.loc[index, 'c_ot_coding']
    if re.sub(r'A|C|T|F|\d', '', string=c_ot_coding)!='':
        df_office_ming_merged.loc[index, 'pass']='F'
    else:
        df_office_ming_merged.loc[index, 'pass']='T'

100%|██████████| 4304/4304 [00:04<00:00, 1054.08it/s]


In [18]:
df_office_ming_merged.to_excel('../data_output/ming_office_title_merged_coding.xlsx', encoding='utf8')