# Introduction
This folder works towards creating a 'right' csv-file of the Fusus al-hikam. Its origin is a TSV-file created by Dirk Roorda (accessed february 11) which has since then seen slight changes. This TSV-file is in turn based on the Lakhnawi edition of Ibn Arabi's Fusus al-hikam.

### Aim
My aim to create a clean and ready file `fusus.csv`. It is to strictly encapsulate the text of Fusus al-hikam, ready to be machine-readable. As the text is groomed and cleaned, `fusus.csv` is undergoing change, with the underlying code not preserved. This is especially true for columns right of 'word'. The column 'short' is meant to be the cleanest. It has been stripped of diacritics, shaddas, tatwīl, spaces, and punctuation. The columns to the right are annotations on it.

### Credits
Text by Ibn Arabi, copy by Sadr al-Din Qunawi (finished in the year 1233). Edition by Nizam al-Din Ahmad al-Husayni al-Lakhnawi (Beirut: 2013). First conversion by Dirk Roorda (January 2021). Finalization by Cornelis van Lit (February 2021). Mostly relying on Pandas, with some thankful use of a list of Arabic characters by Lakhdar Benzahia and Talha Javed Mukhtar for CLTK.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import arabicABC as abc

In [50]:
fusus = pd.read_csv('fusus.csv')
fusus = fusus.fillna('')

In [3]:
fusus.shape

(41532, 17)

Exploring a specific punctuation mark

In [4]:
punctMark = fusus[fusus['haspunct']=='٤١‐']
punctMark

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS,poetryMeter,poetryVerse


Moving punctuation from `short` to `After` and deleting from `haspunct` (aim is to get `haspunct` empty and delete it)

In [5]:
def SetPunctuation(row):
    position = punctMark.iloc[row] 
    punct = position.haspunct
    short = position.short
    if short[-2:] == punct:
        fusus.iloc[position.name, fusus.columns.get_loc('punctAfter')] = punct
        fusus.iloc[position.name, fusus.columns.get_loc('short')] = short.replace(punct,'')
        fusus.iloc[position.name, fusus.columns.get_loc('haspunct')] = ""

In [6]:
# Uncomment to set function in motion for all cases of the particular punctuation
# for i in range(0,len(punctMark)):
#     SetPunctuation(i)

Check results and possibly adjust details

In [None]:
# look at one
# look at line
# fusus[(fusus['page']==273) & (fusus.line==8)]
# change one value
# fusus.iloc[8327, fusus.columns.get_loc("poetryVerse")] = "13"
# fusus.iloc[8326, fusus.columns.get_loc("haspunct")] = ""
# fusus.iloc[8326, fusus.columns.get_loc("word")] = ""
# fusus.iloc[8326, fusus.columns.get_loc("short")] = ""

This code changes title lines into tagged titles.

In [122]:
fusus.iloc[y, fusus.columns.get_loc("short")] = fusus.iloc[y-1, fusus.columns.get_loc("short")] + fusus.iloc[y, fusus.columns.get_loc("short")]
fusus.iloc[y, fusus.columns.get_loc("word")] = fusus.iloc[y-1, fusus.columns.get_loc("word")] + fusus.iloc[y, fusus.columns.get_loc("word")]
fusus.iloc[y-1, fusus.columns.get_loc("short")] = ""
fusus.iloc[y-1, fusus.columns.get_loc("word")] = ""
fusus.iloc[y-1, fusus.columns.get_loc("haspunct")] = ""
fusus[(fusus['page']==p) & (fusus.line==l)]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS,poetryMeter,poetryVerse,fass
38383,377,1,1,1,r,354.0,108.0,376.0,136.0,,,,,,72b,,,
38384,377,1,1,1,r,308.0,108.0,354.0,136.0,[٧٢]﴿فَصُّ,[٧٢]﴿فص,﴿,,,72b,,,
38385,377,1,1,1,r,268.0,108.0,308.0,136.0,حِكْمَةٍ,حكمة,,,,72b,,,
38386,377,1,1,1,r,232.0,108.0,268.0,136.0,فَـرْدِيَّةٍ,فردية,,,,72b,,,
38387,377,1,1,1,r,204.0,108.0,220.0,136.0,فِي,في,,,,72b,,,
38388,377,1,1,1,r,165.0,108.0,204.0,136.0,كَلِمَةٍ,كلمة,,,,72b,,,
38389,377,1,1,1,r,105.0,108.0,165.0,136.0,مُحَمَّدِيَّةٍ﴾,محمدية﴾,﴾,,,72b,,,


In [121]:
p = 377
l = 1
y = fusus[(fusus['page']==p) & (fusus.line==l)].iloc[0].name+1
d = 5
fusus[(fusus['page']==p) & (fusus.line==l)]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS,poetryMeter,poetryVerse,fass
38383,377,1,1,1,r,354.0,108.0,376.0,136.0,[٧٢],[٧٢],[٧٢],,,72b,,,
38384,377,1,1,1,r,308.0,108.0,354.0,136.0,﴿فَصُّ,﴿فص,﴿,,,72b,,,
38385,377,1,1,1,r,268.0,108.0,308.0,136.0,حِكْمَةٍ,حكمة,,,,72b,,,
38386,377,1,1,1,r,232.0,108.0,268.0,136.0,فَـرْدِيَّةٍ,فردية,,,,72b,,,
38387,377,1,1,1,r,204.0,108.0,220.0,136.0,فِي,في,,,,72b,,,
38388,377,1,1,1,r,165.0,108.0,204.0,136.0,كَلِمَةٍ,كلمة,,,,72b,,,
38389,377,1,1,1,r,105.0,108.0,165.0,136.0,مُحَمَّدِيَّةٍ﴾,محمدية﴾,﴾,,,72b,,,


In [128]:
fusus.to_csv('fusus.csv',index=False)

Which punctuation remains? We find out below

In [24]:
fusus.haspunct.value_counts().to_frame().head(60)

Unnamed: 0,haspunct
,38970
﴿,487
﴾,372
[:,317
―,300
[,89
«»,72
﴾[:,67
«»،,46
»،,46


In [5]:
puncs = fusus[fusus['haspunct']!=''].haspunct.unique().tolist()

In [25]:
fusus[fusus['haspunct']=='١١]']

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS,poetryMeter,poetryVerse
5133,56,8,1,1,r,143.0,289.0,155.0,304.0,١١],١١],١١],,,11b,,
5138,57,1,1,1,r,255.0,91.0,266.0,106.0,١١],١١],١١],,,11b,,
5146,57,2,1,1,r,325.0,117.0,337.0,132.0,١١],١١],١١],,,11b,,
5153,57,2,1,1,r,79.0,117.0,93.0,132.0,١١],١١],١١],,,11b,,
5276,58,4,1,1,r,395.0,169.0,402.0,184.0,١١],١١],١١],,,12a,,
5331,58,10,1,1,r,79.0,325.0,93.0,340.0,١١],١١],١١],,,12a,,
5378,58,16,1,1,r,197.0,481.0,208.0,496.0,١١],١١],١١],,,12a,,
5418,59,3,1,1,r,357.0,143.0,369.0,158.0,١١],١١],١١],,,12a,,
14262,159,4,1,1,r,395.0,169.0,402.0,184.0,١١],١١],١١],,,29a,,
14303,160,1,1,1,r,162.0,91.0,174.0,106.0,١١],١١],١١],,,29a,,


In [6]:
for punc in puncs:
    times = fusus[fusus['haspunct']==punc].shape[0]
    print(punc + " " + str(times))

[ 89
] 12
[٧٢٦ 1
[١]﴿ 1
﴾ 372
:. 8
― 300
﴿ 487
[: 317
٣٢١] 4
﴾، 31
٠٣]. 1
«»، 46
«». 23
«» 72
٥٧] 2
»، 46
﴾.[: 7
١] 4
﴾؛[: 3
[١] 1
[٢] 2
[٣] 1
[٤] 1
[٥] 1
[٦] 1
[٧] 1
[٨] 1
[٩] 1
[٠١] 2
[١١] 2
[٢١] 2
[٣١] 2
[٤١] 2
[٥١] 2
[٦١] 2
[٧١] 2
[٨١] 2
[٩١] 2
[٠٢] 2
[١٢] 2
[٢٢] 2
[٣٢] 2
[٤٢] 2
[٥٢] 2
[٦٢] 2
[٧٢] 2
٠٦] 1
١٣] 2
؟! 1
١٢] 2
:٠٥] 1
– 2
[٣]﴿ 1
٣٥] 5
﴾. 3
٢]: 1
﴾[: 67
١١] 12
﴿﴾ 29
٨]، 1
٩]، 3
٠١] 2
:﴿ 6
٥–٦]. 1
٠١١] 3
٥] 5
﴿﴾[: 7
٧] 1
﴾[ 15
٢١] 2
١٢]. 2
٦١] 3
٧]. 4
٢] 3
٢٢] 1
٨٠١] 2
٥٨] 3
٣٢] 4
٣٣] 4
»؟ 2
٣] 3
٤٣] 2
٤٢] 9
٠٢]. 1
٥٢] 4
٦] 2
«»؛ 3
٩٢] 6
٦٢] 7
﴾.() 1
:٥٥] 1
﴿﴾، 4
٧٢] 8
٨٢] 9
١٩]. 1
﴾:[: 1
٨٨] 1
[٤]﴿ 1
٧٥]. 3
٥٣] 1
٥٣]، 1
١]، 2
:٥]، 1
٨٨]، 1
٣٢١]، 3
٠٦]. 1
٧٥]، 1
٠٣]، 5
٥٧]، 2
؟﴿ 1
٢٠١]. 1
٧٠١]. 1
[٥]﴿ 1
[] 8
]، 1
٧٦] 1
٩٤١]، 1
؟» 2
٢٤] 3
٩٤١]. 3
﴾؛ 1
﴿﴾: 1
»: 1
٤٦١]. 1
.: 2
٤]. 4
[٦]﴿ 1
٢٠١]، 1
٤٠١‐٥٠١]، 1
٣٤]. 1
٦٠١] 1
٨٣]. 1
٩٢]. 3
[٧]﴿ 1
:٠٥]، 5
٨٢–٩٢] 1
[]] 1
٨] 1
٧٤]. 1
٤٥]. 1
[٨]﴿ 1
٢٣١]، 1
٩١]، 1
٧٢]. 2
٧٢]، 1
٩١]؛ 1
٦١]، 3
٩٤١] 1
٢١١]، 1
[٩]﴿ 1
:« 1
٤] 2
٥]. 3
٥]، 

In [7]:
puncsLess = []
for punc in puncs:
    if '[' not in punc and ']' not in punc:
        puncsLess.append(punc)
print(puncsLess)

['﴾', ':.', '―', '﴿', '﴾،', '«»،', '«».', '«»', '»،', '؟!', '–', '﴾.', '﴿﴾', ':﴿', '»؟', '«»؛', '﴾.()', '﴿﴾،', '؟﴿', '؟»', '﴾؛', '﴿﴾:', '»:', '.:', ':«', '«»؟', '٣٥؛', '..', '؛:', '؟».', '٨٨؛', '«»:', '﴿ی', ':﴿﴾', '……………', '٢٢١؛', '—', '٣،', '٩٢،', '٧١،', '٠٣،', '«،', '!»', '﴾؟', '٩٢؛', '…»', '٦٢،', '٨٢،', '٢،']


In [8]:
for punc in puncsLess:
    times = fusus[fusus['haspunct']==punc].shape[0]
    print(punc + " " + str(times))

﴾ 372
:. 8
― 300
﴿ 487
﴾، 31
«»، 46
«». 23
«» 72
»، 46
؟! 1
– 2
﴾. 3
﴿﴾ 29
:﴿ 6
»؟ 2
«»؛ 3
﴾.() 1
﴿﴾، 4
؟﴿ 1
؟» 2
﴾؛ 1
﴿﴾: 1
»: 1
.: 2
:« 1
«»؟ 1
٣٥؛ 4
.. 1
؛: 1
؟». 3
٨٨؛ 1
«»: 1
﴿ی 1
:﴿﴾ 1
…………… 2
٢٢١؛ 2
— 6
٣، 1
٩٢، 1
٧١، 1
٠٣، 1
«، 1
!» 1
﴾؟ 1
٩٢؛ 1
…» 1
٦٢، 1
٨٢، 1
٢، 1


Some more functionality to check what is going on

In [19]:
fusus.haspunct.str.contains(pat='').value_counts()

True    41532
Name: haspunct, dtype: int64

In [30]:
fusus[fusus.word.str.contains(pat='فَصُّ')]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS,poetryMeter,poetryVerse
283,14,1,1,1,r,299.0,134.0,357.0,162.0,[١]﴿فَصُّ,[١]﴿فص,[١]﴿,,,2a,,
2319,31,8,1,1,r,358.0,308.0,383.0,329.0,وَفَصُّ,وفص,,,,6a,,
4564,50,5,1,1,r,102.0,191.0,142.0,212.0,التفَصُّيل,التفصيل,,,,10b,,
4653,51,1,1,1,r,293.0,134.0,354.0,162.0,[٣]﴿فَصُّ,[٣]﴿فص,[٣]﴿,,,10b,,
4881,52,18,1,1,r,275.0,529.0,326.0,550.0,التفَصُّيلِ,التفصيل,,,,11a,,
4896,53,1,1,1,r,235.0,87.0,284.0,108.0,التفَصُّيلِ,التفصيل,,,,11a,,
6228,70,1,1,1,r,301.0,134.0,360.0,162.0,[٤]﴿فَصُّ,[٤]﴿فص,[٤]﴿,,,13b,,
7389,83,1,1,1,r,317.0,143.0,377.0,171.0,[٥]﴿فَصُّ,[٥]﴿فص,[٥]﴿,,,16a,,
8199,94,1,1,1,r,319.0,141.0,379.0,169.0,[٦]﴿فَصُّ,[٦]﴿فص,[٦]﴿,,,17b,,
9462,110,1,1,1,r,299.0,134.0,360.0,162.0,[٧]﴿فَصُّ,[٧]﴿فص,[٧]﴿,,,20a,,


In [86]:
lijst = punctMark.index.tolist()

In [64]:
helelijst = []
for l in lijst:
    helelijst.append(l-1)
    helelijst.append(l)

first need to clean up `[zahr -#]`, which refer to MS Qunawi, and only then the poetry metre. Poetry can consist of two words (majzūʾ in front). 29817/304-9 needs to be split. Folio 53 is counted twice (in the MS).

In [87]:
for l in lijst:
    fusus.iloc[l, fusus.columns.get_loc("poetryVerse")] = "9"
#     fusus.iloc[l, fusus.columns.get_loc("poetryMeter")] = fusus.iloc[l-1, fusus.columns.get_loc("word")]
    fusus.iloc[l, fusus.columns.get_loc("word")] = fusus.iloc[l, fusus.columns.get_loc("word")][2:]
    fusus.iloc[l, fusus.columns.get_loc("short")] = fusus.iloc[l, fusus.columns.get_loc("short")][2:]
    fusus.iloc[l, fusus.columns.get_loc("haspunct")] = ""
#     fusus.iloc[l-1, fusus.columns.get_loc("word")] = ""
#     fusus.iloc[l-1, fusus.columns.get_loc("short")] = ""
#     fusus.iloc[l-1, fusus.columns.get_loc("haspunct")] = ""

In [65]:
# These are the words right before poetry starts: indicate meter.
fusus.iloc[helelijst].sort_index()

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS,poetryMeter,poetryVerse
226,12,4,2,1,r,154.0,165.0,187.0,186.0,فَعُـــوْا,فعوا,,,,2a,,
227,12,5,1,1,r,345.0,191.0,374.0,212.0,ثُمَّ,ثم,,,,2a,,3.0
1951,28,2,2,1,r,89.0,113.0,113.0,134.0,نَعْــنِي,نعني,,,,5b,,
1952,28,3,1,1,r,353.0,139.0,402.0,160.0,فَالـكُلُّ,فالكل,,,,5b,,3.0
5099,56,4,2,1,r,101.0,174.0,124.0,195.0,سَيِّدًا,سيدا,,,,11b,,
5100,56,5,1,1,r,364.0,200.0,402.0,221.0,فَمَنْ,فمن,,,,11b,,3.0
7132,79,11,2,1,r,85.0,356.0,109.0,377.0,بَصَـرُ,بصر,,,,15b,,
7133,79,12,1,1,r,362.0,382.0,402.0,403.0,جَمِّعْ,جمع,,,,15b,,3.0
8078,91,4,2,1,r,95.0,165.0,129.0,186.0,أَجْحَدُهُ,أجحده,,,,17a,,
8079,91,5,1,1,r,299.0,191.0,366.0,212.0,فَـيَعْـرِفُنِي,فيعرفني,,,,17a,,3.0


In [27]:
majzu = fusus[fusus.word=='[مَجْـزُوءُ'].index.tolist()
for i in majzu:
    fusus.iloc[i, fusus.columns.get_loc("word")] += " " + fusus.iloc[i+1, fusus.columns.get_loc("word")]
    fusus.iloc[i, fusus.columns.get_loc("short")] += " " + fusus.iloc[i+1, fusus.columns.get_loc("short")]
    fusus.iloc[i, fusus.columns.get_loc("haspunct")] += fusus.iloc[i+1, fusus.columns.get_loc("haspunct")]
    fusus.iloc[i+1, fusus.columns.get_loc("word")] = ""
    fusus.iloc[i+1, fusus.columns.get_loc("short")] = ""
    fusus.iloc[i+1, fusus.columns.get_loc("haspunct")] = ""

In [197]:
allwahj = fusus[(fusus.word.str.contains(pat='ظهر')) & (fusus.word.str.contains(pat='\[')) & (fusus.word.str.contains(pat='\]'))].index.tolist()

In [211]:
for i in allwahj:
    fusus.iloc[i, fusus.columns.get_loc("haspunct")] = ""

In [61]:
fusus[(fusus.word.str.contains(pat='وجه')) & (fusus.word.str.contains(pat='\[')) & (fusus.word.str.contains(pat='\]'))]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS
188,11,4,1,1,r,79.0,165.0,119.0,186.0,[٢وجه],,,,,2a
644,18,11,1,1,r,160.0,347.0,197.0,368.0,[٣وجه],,,,,3a
1149,22,14,1,1,r,311.0,425.0,350.0,446.0,[٤وجه],,,,,4a
1613,25,11,1,1,r,149.0,347.0,186.0,368.0,[٥وجه],,,,,5a
2116,29,6,1,1,r,206.0,217.0,242.0,238.0,[٦وجه],,,,,6a
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
39086,384,6,1,1,r,79.0,217.0,100.0,238.0,[٤٧وجه],,,,,74a
39631,389,9,1,1,r,334.0,302.0,352.0,323.0,[٥٧وجه],,,,,75a
40123,395,4,1,1,r,305.0,165.0,319.0,186.0,[٦٧وجه],,,,,76a
40711,400,7,1,1,r,79.0,243.0,110.0,264.0,[٧٧وجه],,,,,77a


In [31]:
fusus.head()

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS
0,8,2,1,1,r,356.0,197.0,384.0,218.0,الحَمْدُ,الحمد,,,,1b
1,8,2,1,1,r,341.0,197.0,356.0,218.0,لِلهِ,لله,,,,1b
2,8,2,1,1,r,312.0,197.0,341.0,218.0,مُـنَـزِّلِ,منزل,,,,1b
3,8,2,1,1,r,274.0,197.0,312.0,218.0,الحِكَمِ,الحكم,,,,1b
4,8,2,1,1,r,260.0,197.0,274.0,218.0,عَلَىٰ,على,,,,1b


Still have to get the QunawiMS annotated to the first next word

In [9]:
fusus[(fusus.poetryVerse != "")]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS,poetryMeter,poetryVerse
215,12,3,1,1,r,339.0,139.0,374.0,160.0,فَمِنَ,فمن,,,,2a,[مجـزوء الخفيف],1.0
221,12,4,1,1,r,343.0,165.0,374.0,186.0,فَإِذَا,فإذا,,,,2a,,2.0
227,12,5,1,1,r,345.0,191.0,374.0,212.0,ثُمَّ,ثم,,,,2a,,3.0
233,12,6,1,1,r,346.0,217.0,374.0,238.0,ثُمَّ,ثم,,,,2a,,4.0
240,12,7,1,1,r,338.0,243.0,374.0,264.0,هٰذِهِ,هذه,,,,2a,,5.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32088,325,5,1,1,r,336.0,209.0,384.0,230.0,مَشِيْىَٔـتُهُ,مشيىٔته,,,,61a,,3.0
32096,326,1,1,1,r,346.0,87.0,384.0,108.0,يُـرِيدُ,يريد,,,,61a,,4.0
32105,326,2,1,1,r,350.0,113.0,384.0,134.0,فَهٰذَا,فهذا,,,,61a,,5.0
38791,382,9,1,1,r,358.0,304.0,397.0,325.0,يَحِـنُّ,يحن,,,,73b,[المُتَقَارِبُ],1.0


In [29]:
fusus.QunawiMS = fusus.QunawiMS.ffill()

In [176]:
wahj = fusus[fusus.word=='ظهر]'].index.tolist()
for i in wahj:
    print(str(fusus.iloc[[i-1,i]].word.tolist()[0]) + str(fusus.iloc[[i-1,i]].word.tolist()[1])+ " at " + str(fusus.iloc[[i-1,i]].index.tolist()[0]))

[٠١ظهر] at 4446
[١١ظهر] at 4951
[٢١ظهر] at 5462
٧][٣١ظهر] at 6041
[٤١ظهر] at 6565
[٥١ظهر] at 7111
[٦١ظهر] at 7621
[٧١ظهر] at 8117
[٨١ظهر] at 8595
[٩١ظهر] at 9114
[٠٢ظهر] at 9628
[١٢ظهر] at 10156
[٢٢ظهر] at 10610
[٣٢ظهر] at 11164
[٤٢ظهر] at 11702
[٥٢ظهر] at 12239
[٦٢ظهر] at 12826
[٧٢ظهر] at 13333
[٨٢ظهر] at 13923
[٩٢ظهر] at 14440
[٠٣ظهر] at 14967
[١٣ظهر] at 15533
[٢٣ظهر] at 16082
[٣٣ظهر] at 16628
[٤٣ظهر] at 17172
[٥٣ظهر] at 17710
[٦٣ظهر] at 18246
[٧٣ظهر] at 18832
[٨٣ظهر] at 19370
[٩٣ظهر] at 19950
[٠٤ظهر] at 20485
[١٤ظهر] at 21073
[٢٤ظهر] at 21594
[٣٤ظهر] at 22039
[٤٤ظهر] at 22586
[٥٤ظهر] at 23189
[٦٤ظهر] at 23747
[٧٤ظهر] at 24279
[٨٤ظهر] at 24856
[٩٤ظهر] at 25431
[١٥ظهر] at 26554
[٢٥ظهر] at 27067
[٣٥ظهر] at 27564
[٣٥‐أظهر] at 28099
[٤٥ظهر] at 28659
[٥٥ظهر] at 29188
[٦٥ظهر] at 29752
[٧٥ظهر] at 30252
[٨٥ظهر] at 30753
[٩٥ظهر] at 31278
[٠٦ظهر] at 31816
[١٦ظهر] at 32327
[٢٦ظهر] at 32882
[٣٦ظهر] at 33419
[٤٦ظهر] at 33962
[٥٦ظهر] at 34506
[٦٦ظهر] at 35018
[٧٦ظهر] at 35587
[٨٦ظهر] at 36085
[٩٦ظ

In [83]:
fusus[(fusus['page']==47) & (fusus.line==7)]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS
4189,47,7,1,1,r,388.0,243.0,402.0,264.0,مِنَ,من,,,,9b
4190,47,7,1,1,r,368.0,243.0,388.0,264.0,اللهِ،,الله,,،,,9b
4191,47,7,1,1,r,357.0,243.0,368.0,264.0,لَا,لا,,,,9b
4192,47,7,1,1,r,335.0,243.0,357.0,264.0,مِنْ,من,,,,9b
4193,47,7,1,1,r,312.0,243.0,335.0,264.0,رُوحٍ,روح,,,,9b
4194,47,7,1,1,r,295.0,243.0,312.0,264.0,مِنَ,من,,,,9b
4195,47,7,1,1,r,257.0,243.0,295.0,264.0,الأَرْوَاحِ،,الأرواح,,،,,9b
4196,47,7,1,1,r,238.0,243.0,257.0,264.0,بَلْ,بل,,,,9b
4197,47,7,1,1,r,221.0,243.0,235.0,264.0,مِنْ,من,,,,9b
4198,47,7,1,1,r,193.0,243.0,221.0,264.0,رُوْحِهِ,روحه,,,,9b


In [124]:
wajh = fusus[(fusus.word.str.contains(pat='وجه'))].index.tolist()

In [127]:
fusus = fusus.drop(fusus.index[wajh])

In [128]:
fusus.shape

(41454, 17)