# Introduction
This folder works towards creating a 'right' csv-file of the Fusus al-hikam. Its origin is a TSV-file created by Dirk Roorda (accessed february 11) which has since then seen slight changes. This TSV-file is in turn based on the Lakhnawi edition of Ibn Arabi's Fusus al-hikam.

### Aim
My aim to create a clean and ready file `fusus.csv`. It is to strictly encapsulate the text of Fusus al-hikam, ready to be machine-readable. As the text is groomed and cleaned, `fusus.csv` is undergoing change, with the underlying code not preserved. This is especially true for columns right of 'word'. The column 'short' is meant to be the cleanest. It has been stripped of diacritics, shaddas, tatwīl, spaces, and punctuation. The columns to the right are annotations on it.

### Credits
Text by Ibn Arabi (finished in the year 1233). Edition by Nizam al-Din Ahmad al-Husayni al-Lakhnawi (Beirut: 2013). First conversion by Dirk Roorda (January 2021). Finalization by Cornelis van Lit (February 2021). Mostly relying on Pandas, with some thankful use of a list of Arabic characters by Lakhdar Benzahia and Talha Javed Mukhtar for CLTK.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import arabicABC as abc

In [35]:
# fusus = pd.read_csv('fusus.csv')
fusus = fusus.fillna('')

In [36]:
fusus.head()

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS
0,8,2,1,1,r,356.0,197.0,384.0,218.0,الحَمْدُ,الحمد,,,,1b
1,8,2,1,1,r,341.0,197.0,356.0,218.0,لِلهِ,لله,,,,1b
2,8,2,1,1,r,312.0,197.0,341.0,218.0,مُـنَـزِّلِ,منزل,,,,1b
3,8,2,1,1,r,274.0,197.0,312.0,218.0,الحِكَمِ,الحكم,,,,1b
4,8,2,1,1,r,260.0,197.0,274.0,218.0,عَلَىٰ,على,,,,1b


Exploring a specific punctuation mark

In [63]:
punctMark = fusus[fusus['haspunct']=='١‐']
punctMark

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS
216,12,3,1,1,r,339.0,139.0,374.0,160.0,١‐فَمِنَ,١‐فمن,١‐,,,2a
1930,28,1,1,1,r,358.0,87.0,402.0,108.0,١‐فَالكُلُّ,١‐فالكل,١‐,,,5b
5085,56,3,1,1,r,367.0,148.0,402.0,169.0,١‐فَـإِنْ,١‐فإن,١‐,,,11b
7123,79,10,1,1,r,358.0,330.0,402.0,351.0,١‐فَالحَقُّ,١‐فالحق,١‐,,,15b
8086,91,3,1,1,r,295.0,139.0,366.0,160.0,١‐فَـيَحْمَدُنِي,١‐فيحمدني,١‐,,,17a
8178,92,6,1,1,r,306.0,222.0,386.0,243.0,١‐فَنَحْنُ,١‐فنحن,١‐,,,17b
8225,94,3,1,1,r,330.0,206.0,366.0,227.0,١‐فِدَاءُ,١‐فداء,١‐,,,17b
8932,104,8,1,1,r,315.0,285.0,366.0,306.0,١‐فَلِلْوَاحِدِ,١‐فللواحد,١‐,,,19a
9061,106,5,1,1,r,319.0,191.0,366.0,212.0,١‐يَاخَالِقَ,١‐ياخالق,١‐,,,19a
9420,108,16,1,1,r,326.0,477.0,366.0,498.0,١‐فَـوَقْـتًا,١‐فوقتا,١‐,,,20a


Moving punctuation from `short` to `After` and deleting from `haspunct` (aim is to get `haspunct` empty and delete it)

In [5]:
def SetPunctuation(row):
    position = punctMark.iloc[row] 
    punct = position.haspunct
    short = position.short
    if short[-2:] == punct:
        fusus.iloc[position.name, fusus.columns.get_loc('punctAfter')] = punct
        fusus.iloc[position.name, fusus.columns.get_loc('short')] = short.replace(punct,'')
        fusus.iloc[position.name, fusus.columns.get_loc('haspunct')] = ""

In [6]:
# Uncomment to set function in motion for all cases of the particular punctuation
# for i in range(0,len(punctMark)):
#     SetPunctuation(i)

Check results and possibly adjust details

In [76]:
# look at one
# fusus.iloc[26816]
# look at line
# fusus[(fusus['page']==273) & (fusus.line==8)]
# change one value
# fusus.iloc[6041, fusus.columns.get_loc("word")] = '[٣١ظهر]'
fusus[(fusus['page']==47) & (fusus.line==7)]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS
4189,47,7,1,1,r,388.0,243.0,402.0,264.0,مِنَ,من,,,,9b
4190,47,7,1,1,r,368.0,243.0,388.0,264.0,اللهِ،,الله,,،,,9b
4191,47,7,1,1,r,357.0,243.0,368.0,264.0,لَا,لا,,,,9b
4192,47,7,1,1,r,335.0,243.0,357.0,264.0,مِنْ,من,,,,9b
4193,47,7,1,1,r,312.0,243.0,335.0,264.0,رُوحٍ,روح,,,,9b
4194,47,7,1,1,r,295.0,243.0,312.0,264.0,مِنَ,من,,,,9b
4195,47,7,1,1,r,257.0,243.0,295.0,264.0,الأَرْوَاحِ،,الأرواح,,،,,9b
4196,47,7,1,1,r,238.0,243.0,257.0,264.0,بَلْ,بل,,,,9b
4197,47,7,1,1,r,221.0,243.0,235.0,264.0,مِنْ,من,,,,9b
4198,47,7,1,1,r,193.0,243.0,221.0,264.0,رُوْحِهِ,روحه,,,,9b


In [55]:
fusus.iloc[6041].word[2:]
# fusus.iloc[6040, fusus.columns.get_loc("word")]

'[٣١ظهر]'

Which punctuation remains? We find out below

In [8]:
puncs = fusus[fusus['haspunct']!=''].haspunct.unique().tolist()

In [9]:
puncsLess = []
for punc in puncs:
    if '[' not in punc and ']' not in punc:
        puncsLess.append(punc)
print(puncsLess)

['١‐', '٢‐', '٣‐', '٤‐', '٥‐', '﴾', ':.', '―', '﴿', '﴾،', '«»،', '«».', '«»', '»،', '؟!', '–', '﴾.', '﴿﴾', ':﴿', '»؟', '«»؛', '﴾.()', '﴿﴾،', '؟﴿', '؟»', '﴾؛', '﴿﴾:', '»:', '.:', '٦‐', '٧‐', '٨‐', '٩‐', '٠١‐', '١١‐', '٢١‐', '٣١‐', ':«', '«»؟', '٣٥؛', '..', '؛:', '؟».', '٨٨؛', '«»:', '﴿ی', ':﴿﴾', '……………', '٢٢١؛', '—', '٣،', '٩٢،', '٧١،', '٠٣،', '«،', '!»', '﴾؟', '٩٢؛', '…»', '٦٢،', '٨٢،', '٢،']


In [10]:
for punc in puncsLess:
    times = fusus[fusus['haspunct']==punc].shape[0]
    print(punc + " " + str(times))

١‐ 30
٢‐ 30
٣‐ 23
٤‐ 17
٥‐ 13
﴾ 372
:. 8
― 300
﴿ 487
﴾، 31
«»، 46
«». 23
«» 72
»، 46
؟! 1
– 2
﴾. 3
﴿﴾ 29
:﴿ 6
»؟ 2
«»؛ 3
﴾.() 1
﴿﴾، 4
؟﴿ 1
؟» 2
﴾؛ 1
﴿﴾: 1
»: 1
.: 2
٦‐ 8
٧‐ 4
٨‐ 3
٩‐ 2
٠١‐ 2
١١‐ 2
٢١‐ 1
٣١‐ 1
:« 1
«»؟ 1
٣٥؛ 4
.. 1
؛: 1
؟». 3
٨٨؛ 1
«»: 1
﴿ی 1
:﴿﴾ 1
…………… 2
٢٢١؛ 2
— 6
٣، 1
٩٢، 1
٧١، 1
٠٣، 1
«، 1
!» 1
﴾؟ 1
٩٢؛ 1
…» 1
٦٢، 1
٨٢، 1
٢، 1


Some more functionality to check what is going on

In [20]:
fusus.haspunct.str.contains(pat='').value_counts()

True    41683
Name: haspunct, dtype: int64

In [33]:
fusus[fusus.haspunct.str.contains(pat='١١')]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore
2200,30,6,1,1,r,360.0,242.0,374.0,263.0,[١١],[١١],[١١],,
4199,47,7,1,1,r,180.0,243.0,193.0,264.0,[١١,[١١,[١١,,
4684,51,4,1,1,r,294.0,235.0,307.0,256.0,[١١,[١١,[١١,,
4951,54,3,1,1,r,79.0,139.0,99.0,160.0,[١١,[١١,[١١,,
5138,56,8,1,1,r,143.0,289.0,155.0,304.0,١١],١١],١١],,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28155,289,4,1,1,r,82.0,169.0,115.0,184.0,٨١و١٦].,٨١و١٦].,٨١١٦].,,
29325,298,5,1,1,r,275.0,195.0,289.0,210.0,١١].,١١].,١١].,,
30752,314,10,1,1,r,130.0,325.0,142.0,340.0,١١],١١],١١],,
30760,314,11,1,1,r,159.0,351.0,174.0,366.0,١١]،,١١]،,١١]،,,


In [64]:
lijst = punctMark.index.tolist()
lijst = [x-1 for x in lijst]
lijst

[215,
 1929,
 5084,
 7122,
 8085,
 8177,
 8224,
 8931,
 9060,
 9419,
 9959,
 10033,
 10157,
 10273,
 13125,
 13230,
 14434,
 14719,
 15155,
 16671,
 18850,
 18880,
 20420,
 21589,
 22056,
 25417,
 27088,
 29817,
 32186,
 38932]

first need to clean up `[zahr -#]`, which refer to MS Qunawi, and only then the poetry metre. Poetry can consist of two words (majzūʾ in front). 29817/304-9 needs to be split. Folio 53 is counted twice (in the MS).

In [65]:
# These are the words right before poetry starts: indicate meter.
fusus.iloc[lijst]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS
215,12,2,1,1,r,323.0,113.0,371.0,134.0,الخفيف],الخفيف],],,,2a
1929,27,14,1,1,r,364.0,425.0,402.0,446.0,[البسيط],[البسيط],[],,,5b
5084,56,2,1,1,r,364.0,122.0,402.0,143.0,[الطويل],[الطويل],[],,,11b
7122,79,9,1,1,r,364.0,295.0,402.0,316.0,[البسيط],[البسيط],[],,,15b
8085,91,2,1,1,r,325.0,113.0,366.0,134.0,الــوَافِــرِ],الوافر],],,,17a
8177,92,5,1,1,r,325.0,196.0,366.0,217.0,الــوَافِــرِ],الوافر],],,,17b
8224,94,2,1,1,r,364.0,175.0,402.0,196.0,[الطويل],[الطويل],[],,,17b
8931,104,7,1,1,r,364.0,250.0,402.0,271.0,[الطويل],[الطويل],[],,,19a
9060,106,4,1,1,r,355.0,165.0,402.0,186.0,[الســريــع],[السريع],[],,,19a
9419,108,15,1,1,r,364.0,451.0,402.0,472.0,[الطويل],[الطويل],[],,,20a


In [197]:
allwahj = fusus[(fusus.word.str.contains(pat='ظهر')) & (fusus.word.str.contains(pat='\[')) & (fusus.word.str.contains(pat='\]'))].index.tolist()

In [211]:
for i in allwahj:
    fusus.iloc[i, fusus.columns.get_loc("haspunct")] = ""

In [61]:
fusus[(fusus.word.str.contains(pat='وجه')) & (fusus.word.str.contains(pat='\[')) & (fusus.word.str.contains(pat='\]'))]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS
188,11,4,1,1,r,79.0,165.0,119.0,186.0,[٢وجه],,,,,2a
644,18,11,1,1,r,160.0,347.0,197.0,368.0,[٣وجه],,,,,3a
1149,22,14,1,1,r,311.0,425.0,350.0,446.0,[٤وجه],,,,,4a
1613,25,11,1,1,r,149.0,347.0,186.0,368.0,[٥وجه],,,,,5a
2116,29,6,1,1,r,206.0,217.0,242.0,238.0,[٦وجه],,,,,6a
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
39086,384,6,1,1,r,79.0,217.0,100.0,238.0,[٤٧وجه],,,,,74a
39631,389,9,1,1,r,334.0,302.0,352.0,323.0,[٥٧وجه],,,,,75a
40123,395,4,1,1,r,305.0,165.0,319.0,186.0,[٦٧وجه],,,,,76a
40711,400,7,1,1,r,79.0,243.0,110.0,264.0,[٧٧وجه],,,,,77a


In [31]:
fusus.head()

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS
0,8,2,1,1,r,356.0,197.0,384.0,218.0,الحَمْدُ,الحمد,,,,1b
1,8,2,1,1,r,341.0,197.0,356.0,218.0,لِلهِ,لله,,,,1b
2,8,2,1,1,r,312.0,197.0,341.0,218.0,مُـنَـزِّلِ,منزل,,,,1b
3,8,2,1,1,r,274.0,197.0,312.0,218.0,الحِكَمِ,الحكم,,,,1b
4,8,2,1,1,r,260.0,197.0,274.0,218.0,عَلَىٰ,على,,,,1b


Still have to get the QunawiMS annotated to the first next word

In [30]:
fusus[(fusus.QunawiMS.notna())]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS
0,8,2,1,1,r,356.0,197.0,384.0,218.0,الحَمْدُ,الحمد,,,,1b
1,8,2,1,1,r,341.0,197.0,356.0,218.0,لِلهِ,لله,,,,1b
2,8,2,1,1,r,312.0,197.0,341.0,218.0,مُـنَـزِّلِ,منزل,,,,1b
3,8,2,1,1,r,274.0,197.0,312.0,218.0,الحِكَمِ,الحكم,,,,1b
4,8,2,1,1,r,260.0,197.0,274.0,218.0,عَلَىٰ,على,,,,1b
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41678,410,9,1,1,r,278.0,297.0,294.0,315.0,[٠٣٦,[٠٣٦,[٠٣٦,,,78a
41679,410,9,1,1,r,261.0,297.0,278.0,315.0,هـ],ه],],,,78a
41680,410,9,1,1,r,227.0,297.0,261.0,315.0,وَالحَمْدُ,والحمد,,,,78a
41681,410,9,1,1,r,215.0,297.0,227.0,315.0,لِلهِ,لله,,,,78a


In [29]:
fusus.QunawiMS = fusus.QunawiMS.ffill()

In [176]:
wahj = fusus[fusus.word=='ظهر]'].index.tolist()
for i in wahj:
    print(str(fusus.iloc[[i-1,i]].word.tolist()[0]) + str(fusus.iloc[[i-1,i]].word.tolist()[1])+ " at " + str(fusus.iloc[[i-1,i]].index.tolist()[0]))

[٠١ظهر] at 4446
[١١ظهر] at 4951
[٢١ظهر] at 5462
٧][٣١ظهر] at 6041
[٤١ظهر] at 6565
[٥١ظهر] at 7111
[٦١ظهر] at 7621
[٧١ظهر] at 8117
[٨١ظهر] at 8595
[٩١ظهر] at 9114
[٠٢ظهر] at 9628
[١٢ظهر] at 10156
[٢٢ظهر] at 10610
[٣٢ظهر] at 11164
[٤٢ظهر] at 11702
[٥٢ظهر] at 12239
[٦٢ظهر] at 12826
[٧٢ظهر] at 13333
[٨٢ظهر] at 13923
[٩٢ظهر] at 14440
[٠٣ظهر] at 14967
[١٣ظهر] at 15533
[٢٣ظهر] at 16082
[٣٣ظهر] at 16628
[٤٣ظهر] at 17172
[٥٣ظهر] at 17710
[٦٣ظهر] at 18246
[٧٣ظهر] at 18832
[٨٣ظهر] at 19370
[٩٣ظهر] at 19950
[٠٤ظهر] at 20485
[١٤ظهر] at 21073
[٢٤ظهر] at 21594
[٣٤ظهر] at 22039
[٤٤ظهر] at 22586
[٥٤ظهر] at 23189
[٦٤ظهر] at 23747
[٧٤ظهر] at 24279
[٨٤ظهر] at 24856
[٩٤ظهر] at 25431
[١٥ظهر] at 26554
[٢٥ظهر] at 27067
[٣٥ظهر] at 27564
[٣٥‐أظهر] at 28099
[٤٥ظهر] at 28659
[٥٥ظهر] at 29188
[٦٥ظهر] at 29752
[٧٥ظهر] at 30252
[٨٥ظهر] at 30753
[٩٥ظهر] at 31278
[٠٦ظهر] at 31816
[١٦ظهر] at 32327
[٢٦ظهر] at 32882
[٣٦ظهر] at 33419
[٤٦ظهر] at 33962
[٥٦ظهر] at 34506
[٦٦ظهر] at 35018
[٧٦ظهر] at 35587
[٨٦ظهر] at 36085
[٩٦ظ

In [83]:
fusus[(fusus['page']==47) & (fusus.line==7)]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS
4189,47,7,1,1,r,388.0,243.0,402.0,264.0,مِنَ,من,,,,9b
4190,47,7,1,1,r,368.0,243.0,388.0,264.0,اللهِ،,الله,,،,,9b
4191,47,7,1,1,r,357.0,243.0,368.0,264.0,لَا,لا,,,,9b
4192,47,7,1,1,r,335.0,243.0,357.0,264.0,مِنْ,من,,,,9b
4193,47,7,1,1,r,312.0,243.0,335.0,264.0,رُوحٍ,روح,,,,9b
4194,47,7,1,1,r,295.0,243.0,312.0,264.0,مِنَ,من,,,,9b
4195,47,7,1,1,r,257.0,243.0,295.0,264.0,الأَرْوَاحِ،,الأرواح,,،,,9b
4196,47,7,1,1,r,238.0,243.0,257.0,264.0,بَلْ,بل,,,,9b
4197,47,7,1,1,r,221.0,243.0,235.0,264.0,مِنْ,من,,,,9b
4198,47,7,1,1,r,193.0,243.0,221.0,264.0,رُوْحِهِ,روحه,,,,9b


In [81]:
fusus[(fusus.haspunct.str.contains(pat='\['))]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS
43,9,4,1,1,r,206.0,165.0,219.0,186.0,[وَ,[و,[,,,1b
58,9,5,1,1,r,104.0,191.0,124.0,212.0,[٧٢٦,[٧٢٦,[٧٢٦,,,1b
66,9,6,1,1,r,223.0,217.0,236.0,238.0,[وَ,[و,[,,,1b
214,12,2,1,1,r,371.0,113.0,402.0,134.0,[مجـزوء,[مجزوء,[,,,2a
276,13,5,1,1,r,311.0,207.0,326.0,225.0,[بقية,[بقية,[,,,2a
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41338,407,8,1,1,r,193.0,269.0,295.0,290.0,﴿بِحَمْدِهِ﴾[سورةالإسراء:,﴿بحمده﴾[سورةالإسراء:,﴿﴾[:,,,78a
41514,409,5,1,1,r,165.0,191.0,266.0,212.0,ٱلسَّبِيْلَ﴾[سورةالأحزاب:,ٱلسبيل﴾[سورةالأحزاب:,﴾[:,,,78a
41562,409,11,1,1,r,292.0,349.0,311.0,367.0,[…],[…],[…],,,78a
41621,410,2,1,1,r,134.0,115.0,190.0,133.0,[؟]صَاحِبُ,[؟]صاحب,[؟],,,78a


In [77]:
empty = fusus[fusus.word==""].index.tolist()

In [79]:
fusus.drop(fusus.index[empty], inplace=True)

In [82]:
fusus.shape

(41545, 15)