# Introduction
This folder works towards creating a 'right' csv-file of the Fusus al-hikam. Its origin is a TSV-file created by Dirk Roorda (accessed february 11) which has since then seen slight changes. This TSV-file is in turn based on the Lakhnawi edition of Ibn Arabi's Fusus al-hikam.

### Aim
My aim to create a clean and ready file `fusus.csv`. It is to strictly encapsulate the text of Fusus al-hikam, ready to be machine-readable. As the text is groomed and cleaned, `fusus.csv` is undergoing change, with the underlying code not preserved. This is especially true for columns right of 'word'. The column 'short' is meant to be the cleanest. It has been stripped of diacritics, shaddas, tatwīl, spaces, and punctuation. The columns to the right are annotations on it.

### Credits
Text by Ibn Arabi (finished in the year 1233). Edition by Nizam al-Din Ahmad al-Husayni al-Lakhnawi (Beirut: 2013). First conversion by Dirk Roorda (January 2021). Finalization by Cornelis van Lit (February 2021). Mostly relying on Pandas, with some thankful use of a list of Arabic characters by Lakhdar Benzahia and Talha Javed Mukhtar for CLTK.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import arabicABC as abc

In [2]:
fusus = pd.read_csv('fusus.csv')
fusus = fusus.fillna('')

In [3]:
fusus.head()

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore
0,8,2,1,1,r,356.0,197.0,384.0,218.0,الحَمْدُ,الحمد,,,
1,8,2,1,1,r,341.0,197.0,356.0,218.0,لِلهِ,لله,,,
2,8,2,1,1,r,312.0,197.0,341.0,218.0,مُـنَـزِّلِ,منزل,,,
3,8,2,1,1,r,274.0,197.0,312.0,218.0,الحِكَمِ,الحكم,,,
4,8,2,1,1,r,260.0,197.0,274.0,218.0,عَلَىٰ,على,,,


Exploring a specific punctuation mark

In [4]:
punctMark = fusus[fusus['haspunct']==abc.MADDA_ABOVE]
punctMark.head(60)

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore


Moving punctuation from `short` to `After` and deleting from `haspunct` (aim is to get `haspunct` empty and delete it)

In [5]:
def SetPunctuation(row):
    position = punctMark.iloc[row] 
    punct = position.haspunct
    short = position.short
    if short[-2:] == punct:
        fusus.iloc[position.name, fusus.columns.get_loc('punctAfter')] = punct
        fusus.iloc[position.name, fusus.columns.get_loc('short')] = short.replace(punct,'')
        fusus.iloc[position.name, fusus.columns.get_loc('haspunct')] = ""

In [6]:
# Uncomment to set function in motion for all cases of the particular punctuation
# for i in range(0,len(punctMark)):
#     SetPunctuation(i)

Check results and possibly adjust details

In [7]:
# look at one
# fusus.iloc[26801]
# look at line
# fusus[(fusus['page']==74) & (fusus.line==11)]
# change one value
# fusus.iloc[32120, fusus.columns.get_loc("punctAfter")] = ""

Which punctuation remains? We find out below

In [8]:
puncs = fusus[fusus['haspunct']!=''].haspunct.unique().tolist()

In [9]:
puncsLess = []
for punc in puncs:
    if '[' not in punc and ']' not in punc:
        puncsLess.append(punc)
print(puncsLess)

['١‐', '٢‐', '٣‐', '٤‐', '٥‐', '﴾', ':.', '―', '﴿', '﴾،', '«»،', '«».', '«»', '»،', '؟!', '–', '﴾.', '﴿﴾', ':﴿', '»؟', '«»؛', '﴾.()', '﴿﴾،', '؟﴿', '؟»', '﴾؛', '﴿﴾:', '»:', '.:', '٦‐', '٧‐', '٨‐', '٩‐', '٠١‐', '١١‐', '٢١‐', '٣١‐', ':«', '«»؟', '٣٥؛', '..', '؛:', '؟».', '٨٨؛', '«»:', '﴿ی', ':﴿﴾', '……………', '٢٢١؛', '—', '٣،', '٩٢،', '٧١،', '٠٣،', '«،', '!»', '﴾؟', '٩٢؛', '…»', '٦٢،', '٨٢،', '٢،']


In [10]:
for punc in puncsLess:
    times = fusus[fusus['haspunct']==punc].shape[0]
    print(punc + " " + str(times))

١‐ 30
٢‐ 30
٣‐ 23
٤‐ 17
٥‐ 13
﴾ 372
:. 8
― 300
﴿ 487
﴾، 31
«»، 46
«». 23
«» 72
»، 46
؟! 1
– 2
﴾. 3
﴿﴾ 29
:﴿ 6
»؟ 2
«»؛ 3
﴾.() 1
﴿﴾، 4
؟﴿ 1
؟» 2
﴾؛ 1
﴿﴾: 1
»: 1
.: 2
٦‐ 8
٧‐ 4
٨‐ 3
٩‐ 2
٠١‐ 2
١١‐ 2
٢١‐ 1
٣١‐ 1
:« 1
«»؟ 1
٣٥؛ 4
.. 1
؛: 1
؟». 3
٨٨؛ 1
«»: 1
﴿ی 1
:﴿﴾ 1
…………… 2
٢٢١؛ 2
— 6
٣، 1
٩٢، 1
٧١، 1
٠٣، 1
«، 1
!» 1
﴾؟ 1
٩٢؛ 1
…» 1
٦٢، 1
٨٢، 1
٢، 1


Some more functionality to check what is going on

In [20]:
fusus.haspunct.str.contains(pat='').value_counts()

True    41683
Name: haspunct, dtype: int64

In [33]:
fusus[fusus.haspunct.str.contains(pat='١١')]

Unnamed: 0,page,line,column,span,direction,left,top,right,bottom,word,short,haspunct,punctAfter,punctBefore
2200,30,6,1,1,r,360.0,242.0,374.0,263.0,[١١],[١١],[١١],,
4199,47,7,1,1,r,180.0,243.0,193.0,264.0,[١١,[١١,[١١,,
4684,51,4,1,1,r,294.0,235.0,307.0,256.0,[١١,[١١,[١١,,
4951,54,3,1,1,r,79.0,139.0,99.0,160.0,[١١,[١١,[١١,,
5138,56,8,1,1,r,143.0,289.0,155.0,304.0,١١],١١],١١],,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28155,289,4,1,1,r,82.0,169.0,115.0,184.0,٨١و١٦].,٨١و١٦].,٨١١٦].,,
29325,298,5,1,1,r,275.0,195.0,289.0,210.0,١١].,١١].,١١].,,
30752,314,10,1,1,r,130.0,325.0,142.0,340.0,١١],١١],١١],,
30760,314,11,1,1,r,159.0,351.0,174.0,366.0,١١]،,١١]،,١١]،,,
