# Introduction
This folder works towards creating a 'right' csv-file of the Fusus al-hikam. Its origin is a TSV-file created by Dirk Roorda (accessed february 11) which has since then seen slight changes. This TSV-file is in turn based on the Lakhnawi edition of Ibn Arabi's Fusus al-hikam.

### Aim
My aim to create a clean and ready file `fusus.csv`. It is to strictly encapsulate the text of Fusus al-hikam, ready to be machine-readable. As the text is groomed and cleaned, `fusus.csv` is undergoing change, with the underlying code not preserved. This is especially true for columns right of 'word'. The column 'short' is meant to be the cleanest. It has been stripped of diacritics, shaddas, tatwīl, spaces, and punctuation. The columns to the right are annotations on it.

### Credits
Text by Ibn Arabi, copy by Sadr al-Din Qunawi (finished in the year 1233). Edition by Nizam al-Din Ahmad al-Husayni al-Lakhnawi (Beirut: 2013). First conversion by Dirk Roorda (January 2021). Finalization by Cornelis van Lit (February 2021). Mostly relying on Pandas, with some thankful use of a list of Arabic characters by Lakhdar Benzahia and Talha Javed Mukhtar for CLTK.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import arabicABC as abc

In [30]:
fusus = pd.read_csv('fusus.csv')
fusus = fusus.fillna('')

In [31]:
fusus.shape

(41532, 15)

Exploring a specific punctuation mark

In [108]:
punctMark = fusus[fusus['haspunct']=='٤١‐']
punctMark

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS,poetryMeter,poetryVerse


Moving punctuation from `short` to `After` and deleting from `haspunct` (aim is to get `haspunct` empty and delete it)

In [5]:
def SetPunctuation(row):
    position = punctMark.iloc[row] 
    punct = position.haspunct
    short = position.short
    if short[-2:] == punct:
        fusus.iloc[position.name, fusus.columns.get_loc('punctAfter')] = punct
        fusus.iloc[position.name, fusus.columns.get_loc('short')] = short.replace(punct,'')
        fusus.iloc[position.name, fusus.columns.get_loc('haspunct')] = ""

In [6]:
# Uncomment to set function in motion for all cases of the particular punctuation
# for i in range(0,len(punctMark)):
#     SetPunctuation(i)

Check results and possibly adjust details

In [133]:
# look at one
# fusus.iloc[26816]
# look at line
# fusus[(fusus['page']==273) & (fusus.line==8)]
# change one value

fusus.iloc[6032, fusus.columns.get_loc("word")] += fusus.iloc[6033, fusus.columns.get_loc("word")][:2]
fusus.iloc[6033, fusus.columns.get_loc("word")] = fusus.iloc[6033, fusus.columns.get_loc("word")][2:]

# fusus.iloc[8327, fusus.columns.get_loc("poetryVerse")] = "13"
# fusus.iloc[8326, fusus.columns.get_loc("haspunct")] = ""
# fusus.iloc[8326, fusus.columns.get_loc("word")] = ""
# fusus.iloc[8326, fusus.columns.get_loc("short")] = ""
fusus[(fusus['page']==66) & (fusus.line==7)]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS,poetryMeter,poetryVerse
6029,66,7,1,1,r,345.0,243.0,385.0,264.0,وَ﴿جَعَلُوا,و﴿جعلوا,﴿,,,13a,,
6030,66,7,1,1,r,296.0,243.0,345.0,264.0,أَصَـٰبِعَهُمْ,أصبعهم,,,,13a,,
6031,66,7,1,1,r,278.0,243.0,296.0,264.0,فِىٓ,فى,,,,13a,,
6032,66,7,1,1,r,188.0,243.0,278.0,264.0,ءَاذَانِهِمْ﴾.[سورةنوح:,ءاذانهم﴾.[سورةنوح:,﴾.[:,,,13a,,
6033,66,7,1,1,r,142.0,243.0,170.0,264.0,٧][٣١ظهر],,,,,13b,,
6034,66,7,1,1,r,79.0,243.0,113.0,264.0,طَـلَـبًا,طلبا,,,,13b,,


In [134]:
fusus.iloc[6033, fusus.columns.get_loc("word")][2:]

'ا،'

Which punctuation remains? We find out below

In [8]:
puncs = fusus[fusus['haspunct']!=''].haspunct.unique().tolist()

In [9]:
puncsLess = []
for punc in puncs:
    if '[' not in punc and ']' not in punc:
        puncsLess.append(punc)
print(puncsLess)

['١‐', '٢‐', '٣‐', '٤‐', '٥‐', '﴾', ':.', '―', '﴿', '﴾،', '«»،', '«».', '«»', '»،', '؟!', '–', '﴾.', '﴿﴾', ':﴿', '»؟', '«»؛', '﴾.()', '﴿﴾،', '؟﴿', '؟»', '﴾؛', '﴿﴾:', '»:', '.:', '٦‐', '٧‐', '٨‐', '٩‐', '٠١‐', '١١‐', '٢١‐', '٣١‐', ':«', '«»؟', '٣٥؛', '..', '؛:', '؟».', '٨٨؛', '«»:', '﴿ی', ':﴿﴾', '……………', '٢٢١؛', '—', '٣،', '٩٢،', '٧١،', '٠٣،', '«،', '!»', '﴾؟', '٩٢؛', '…»', '٦٢،', '٨٢،', '٢،']


In [10]:
for punc in puncsLess:
    times = fusus[fusus['haspunct']==punc].shape[0]
    print(punc + " " + str(times))

١‐ 30
٢‐ 30
٣‐ 23
٤‐ 17
٥‐ 13
﴾ 372
:. 8
― 300
﴿ 487
﴾، 31
«»، 46
«». 23
«» 72
»، 46
؟! 1
– 2
﴾. 3
﴿﴾ 29
:﴿ 6
»؟ 2
«»؛ 3
﴾.() 1
﴿﴾، 4
؟﴿ 1
؟» 2
﴾؛ 1
﴿﴾: 1
»: 1
.: 2
٦‐ 8
٧‐ 4
٨‐ 3
٩‐ 2
٠١‐ 2
١١‐ 2
٢١‐ 1
٣١‐ 1
:« 1
«»؟ 1
٣٥؛ 4
.. 1
؛: 1
؟». 3
٨٨؛ 1
«»: 1
﴿ی 1
:﴿﴾ 1
…………… 2
٢٢١؛ 2
— 6
٣، 1
٩٢، 1
٧١، 1
٠٣، 1
«، 1
!» 1
﴾؟ 1
٩٢؛ 1
…» 1
٦٢، 1
٨٢، 1
٢، 1


Some more functionality to check what is going on

In [20]:
fusus.haspunct.str.contains(pat='').value_counts()

True    41683
Name: haspunct, dtype: int64

In [33]:
fusus[fusus.haspunct.str.contains(pat='١١')]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore
2200,30,6,1,1,r,360.0,242.0,374.0,263.0,[١١],[١١],[١١],,
4199,47,7,1,1,r,180.0,243.0,193.0,264.0,[١١,[١١,[١١,,
4684,51,4,1,1,r,294.0,235.0,307.0,256.0,[١١,[١١,[١١,,
4951,54,3,1,1,r,79.0,139.0,99.0,160.0,[١١,[١١,[١١,,
5138,56,8,1,1,r,143.0,289.0,155.0,304.0,١١],١١],١١],,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28155,289,4,1,1,r,82.0,169.0,115.0,184.0,٨١و١٦].,٨١و١٦].,٨١١٦].,,
29325,298,5,1,1,r,275.0,195.0,289.0,210.0,١١].,١١].,١١].,,
30752,314,10,1,1,r,130.0,325.0,142.0,340.0,١١],١١],١١],,
30760,314,11,1,1,r,159.0,351.0,174.0,366.0,١١]،,١١]،,١١]،,,


In [86]:
lijst = punctMark.index.tolist()

In [64]:
helelijst = []
for l in lijst:
    helelijst.append(l-1)
    helelijst.append(l)

first need to clean up `[zahr -#]`, which refer to MS Qunawi, and only then the poetry metre. Poetry can consist of two words (majzūʾ in front). 29817/304-9 needs to be split. Folio 53 is counted twice (in the MS).

In [87]:
for l in lijst:
    fusus.iloc[l, fusus.columns.get_loc("poetryVerse")] = "9"
#     fusus.iloc[l, fusus.columns.get_loc("poetryMeter")] = fusus.iloc[l-1, fusus.columns.get_loc("word")]
    fusus.iloc[l, fusus.columns.get_loc("word")] = fusus.iloc[l, fusus.columns.get_loc("word")][2:]
    fusus.iloc[l, fusus.columns.get_loc("short")] = fusus.iloc[l, fusus.columns.get_loc("short")][2:]
    fusus.iloc[l, fusus.columns.get_loc("haspunct")] = ""
#     fusus.iloc[l-1, fusus.columns.get_loc("word")] = ""
#     fusus.iloc[l-1, fusus.columns.get_loc("short")] = ""
#     fusus.iloc[l-1, fusus.columns.get_loc("haspunct")] = ""

In [65]:
# These are the words right before poetry starts: indicate meter.
fusus.iloc[helelijst].sort_index()

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS,poetryMeter,poetryVerse
226,12,4,2,1,r,154.0,165.0,187.0,186.0,فَعُـــوْا,فعوا,,,,2a,,
227,12,5,1,1,r,345.0,191.0,374.0,212.0,ثُمَّ,ثم,,,,2a,,3.0
1951,28,2,2,1,r,89.0,113.0,113.0,134.0,نَعْــنِي,نعني,,,,5b,,
1952,28,3,1,1,r,353.0,139.0,402.0,160.0,فَالـكُلُّ,فالكل,,,,5b,,3.0
5099,56,4,2,1,r,101.0,174.0,124.0,195.0,سَيِّدًا,سيدا,,,,11b,,
5100,56,5,1,1,r,364.0,200.0,402.0,221.0,فَمَنْ,فمن,,,,11b,,3.0
7132,79,11,2,1,r,85.0,356.0,109.0,377.0,بَصَـرُ,بصر,,,,15b,,
7133,79,12,1,1,r,362.0,382.0,402.0,403.0,جَمِّعْ,جمع,,,,15b,,3.0
8078,91,4,2,1,r,95.0,165.0,129.0,186.0,أَجْحَدُهُ,أجحده,,,,17a,,
8079,91,5,1,1,r,299.0,191.0,366.0,212.0,فَـيَعْـرِفُنِي,فيعرفني,,,,17a,,3.0


In [27]:
majzu = fusus[fusus.word=='[مَجْـزُوءُ'].index.tolist()
for i in majzu:
    fusus.iloc[i, fusus.columns.get_loc("word")] += " " + fusus.iloc[i+1, fusus.columns.get_loc("word")]
    fusus.iloc[i, fusus.columns.get_loc("short")] += " " + fusus.iloc[i+1, fusus.columns.get_loc("short")]
    fusus.iloc[i, fusus.columns.get_loc("haspunct")] += fusus.iloc[i+1, fusus.columns.get_loc("haspunct")]
    fusus.iloc[i+1, fusus.columns.get_loc("word")] = ""
    fusus.iloc[i+1, fusus.columns.get_loc("short")] = ""
    fusus.iloc[i+1, fusus.columns.get_loc("haspunct")] = ""

In [197]:
allwahj = fusus[(fusus.word.str.contains(pat='ظهر')) & (fusus.word.str.contains(pat='\[')) & (fusus.word.str.contains(pat='\]'))].index.tolist()

In [211]:
for i in allwahj:
    fusus.iloc[i, fusus.columns.get_loc("haspunct")] = ""

In [61]:
fusus[(fusus.word.str.contains(pat='وجه')) & (fusus.word.str.contains(pat='\[')) & (fusus.word.str.contains(pat='\]'))]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS
188,11,4,1,1,r,79.0,165.0,119.0,186.0,[٢وجه],,,,,2a
644,18,11,1,1,r,160.0,347.0,197.0,368.0,[٣وجه],,,,,3a
1149,22,14,1,1,r,311.0,425.0,350.0,446.0,[٤وجه],,,,,4a
1613,25,11,1,1,r,149.0,347.0,186.0,368.0,[٥وجه],,,,,5a
2116,29,6,1,1,r,206.0,217.0,242.0,238.0,[٦وجه],,,,,6a
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
39086,384,6,1,1,r,79.0,217.0,100.0,238.0,[٤٧وجه],,,,,74a
39631,389,9,1,1,r,334.0,302.0,352.0,323.0,[٥٧وجه],,,,,75a
40123,395,4,1,1,r,305.0,165.0,319.0,186.0,[٦٧وجه],,,,,76a
40711,400,7,1,1,r,79.0,243.0,110.0,264.0,[٧٧وجه],,,,,77a


In [31]:
fusus.head()

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS
0,8,2,1,1,r,356.0,197.0,384.0,218.0,الحَمْدُ,الحمد,,,,1b
1,8,2,1,1,r,341.0,197.0,356.0,218.0,لِلهِ,لله,,,,1b
2,8,2,1,1,r,312.0,197.0,341.0,218.0,مُـنَـزِّلِ,منزل,,,,1b
3,8,2,1,1,r,274.0,197.0,312.0,218.0,الحِكَمِ,الحكم,,,,1b
4,8,2,1,1,r,260.0,197.0,274.0,218.0,عَلَىٰ,على,,,,1b


Still have to get the QunawiMS annotated to the first next word

In [30]:
fusus[(fusus.QunawiMS.notna())]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS
0,8,2,1,1,r,356.0,197.0,384.0,218.0,الحَمْدُ,الحمد,,,,1b
1,8,2,1,1,r,341.0,197.0,356.0,218.0,لِلهِ,لله,,,,1b
2,8,2,1,1,r,312.0,197.0,341.0,218.0,مُـنَـزِّلِ,منزل,,,,1b
3,8,2,1,1,r,274.0,197.0,312.0,218.0,الحِكَمِ,الحكم,,,,1b
4,8,2,1,1,r,260.0,197.0,274.0,218.0,عَلَىٰ,على,,,,1b
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41678,410,9,1,1,r,278.0,297.0,294.0,315.0,[٠٣٦,[٠٣٦,[٠٣٦,,,78a
41679,410,9,1,1,r,261.0,297.0,278.0,315.0,هـ],ه],],,,78a
41680,410,9,1,1,r,227.0,297.0,261.0,315.0,وَالحَمْدُ,والحمد,,,,78a
41681,410,9,1,1,r,215.0,297.0,227.0,315.0,لِلهِ,لله,,,,78a


In [29]:
fusus.QunawiMS = fusus.QunawiMS.ffill()

In [176]:
wahj = fusus[fusus.word=='ظهر]'].index.tolist()
for i in wahj:
    print(str(fusus.iloc[[i-1,i]].word.tolist()[0]) + str(fusus.iloc[[i-1,i]].word.tolist()[1])+ " at " + str(fusus.iloc[[i-1,i]].index.tolist()[0]))

[٠١ظهر] at 4446
[١١ظهر] at 4951
[٢١ظهر] at 5462
٧][٣١ظهر] at 6041
[٤١ظهر] at 6565
[٥١ظهر] at 7111
[٦١ظهر] at 7621
[٧١ظهر] at 8117
[٨١ظهر] at 8595
[٩١ظهر] at 9114
[٠٢ظهر] at 9628
[١٢ظهر] at 10156
[٢٢ظهر] at 10610
[٣٢ظهر] at 11164
[٤٢ظهر] at 11702
[٥٢ظهر] at 12239
[٦٢ظهر] at 12826
[٧٢ظهر] at 13333
[٨٢ظهر] at 13923
[٩٢ظهر] at 14440
[٠٣ظهر] at 14967
[١٣ظهر] at 15533
[٢٣ظهر] at 16082
[٣٣ظهر] at 16628
[٤٣ظهر] at 17172
[٥٣ظهر] at 17710
[٦٣ظهر] at 18246
[٧٣ظهر] at 18832
[٨٣ظهر] at 19370
[٩٣ظهر] at 19950
[٠٤ظهر] at 20485
[١٤ظهر] at 21073
[٢٤ظهر] at 21594
[٣٤ظهر] at 22039
[٤٤ظهر] at 22586
[٥٤ظهر] at 23189
[٦٤ظهر] at 23747
[٧٤ظهر] at 24279
[٨٤ظهر] at 24856
[٩٤ظهر] at 25431
[١٥ظهر] at 26554
[٢٥ظهر] at 27067
[٣٥ظهر] at 27564
[٣٥‐أظهر] at 28099
[٤٥ظهر] at 28659
[٥٥ظهر] at 29188
[٦٥ظهر] at 29752
[٧٥ظهر] at 30252
[٨٥ظهر] at 30753
[٩٥ظهر] at 31278
[٠٦ظهر] at 31816
[١٦ظهر] at 32327
[٢٦ظهر] at 32882
[٣٦ظهر] at 33419
[٤٦ظهر] at 33962
[٥٦ظهر] at 34506
[٦٦ظهر] at 35018
[٧٦ظهر] at 35587
[٨٦ظهر] at 36085
[٩٦ظ

In [83]:
fusus[(fusus['page']==47) & (fusus.line==7)]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS
4189,47,7,1,1,r,388.0,243.0,402.0,264.0,مِنَ,من,,,,9b
4190,47,7,1,1,r,368.0,243.0,388.0,264.0,اللهِ،,الله,,،,,9b
4191,47,7,1,1,r,357.0,243.0,368.0,264.0,لَا,لا,,,,9b
4192,47,7,1,1,r,335.0,243.0,357.0,264.0,مِنْ,من,,,,9b
4193,47,7,1,1,r,312.0,243.0,335.0,264.0,رُوحٍ,روح,,,,9b
4194,47,7,1,1,r,295.0,243.0,312.0,264.0,مِنَ,من,,,,9b
4195,47,7,1,1,r,257.0,243.0,295.0,264.0,الأَرْوَاحِ،,الأرواح,,،,,9b
4196,47,7,1,1,r,238.0,243.0,257.0,264.0,بَلْ,بل,,,,9b
4197,47,7,1,1,r,221.0,243.0,235.0,264.0,مِنْ,من,,,,9b
4198,47,7,1,1,r,193.0,243.0,221.0,264.0,رُوْحِهِ,روحه,,,,9b


In [124]:
wajh = fusus[(fusus.word.str.contains(pat='وجه'))].index.tolist()

In [127]:
fusus = fusus.drop(fusus.index[wajh])

In [128]:
fusus.shape

(41454, 17)

In [130]:
fusus[(fusus.word.str.contains(pat='ظهر'))].head(60)

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore,QunawiMS,poetryMeter,poetryVerse
402,16,3,1,1,r,207.0,139.0,242.0,160.0,[٢ظهر],,,,,2b,,
888,20,12,1,1,r,100.0,373.0,140.0,394.0,[٣ظهر],,,,,3b,,
1374,24,3,1,1,r,236.0,139.0,271.0,160.0,[٤ظهر],,,,,4b,,
1858,27,4,1,1,r,315.0,165.0,352.0,186.0,[٥ظهر],,,,,5b,,
2368,31,12,1,1,r,349.0,412.0,383.0,433.0,[٦ظهر],,,,,6b,,
2908,36,13,1,1,r,132.0,399.0,173.0,420.0,[٧ظهر],,,,,7b,,
3421,41,7,1,1,r,349.0,243.0,384.0,264.0,[٨ظهر],,,,,8b,,
3922,45,11,1,1,r,185.0,347.0,222.0,368.0,[٩ظهر],,,,,9b,,
4444,49,5,1,1,r,346.0,191.0,358.0,212.0,[٠١ظهر],,,,,10b,,
4947,54,3,1,1,r,79.0,139.0,99.0,160.0,[١١ظهر],,,,,11b,,
