# A Quick Guide to Using the Quran Detector-Annotator 

<b>The paper describing the approach followed in carrying out this work is available from: </b> https://www.sciencedirect.com/science/article/pii/S1877050921012321 <br>
<b> If using this tool in your work, please use the following citation: </b><br>
<code>Samhaa R El-Beltagy and Ahmed Rafea, "QDetect: An Intelligent Tool for Detecting Quranic Verses in any Text", Procedia Computer Science,Volume 189, Pages 374-381, 2021.  </code>

## First step
### pip install QDetect

In [1]:
import QDetect.qdetect as qdetect
import codecs

## Creating a matcher-annotater object

In [2]:
qAn = qdetect.qMatcherAnnotater()

Matcher/Annotator created! 


## Detecting Verses in text
<b>To detect all qur’anic verses in any piece of text, the following method should be used: <br></b>
```Python
  <qMatcherAnnotaterOBj>.matchAll(inputText, findErr, findMissing, allowedErrPers, minMatch, return_json, delimeterList)
  print(intext)```
The method returns a list of dictionaries  where each dictionary represents an  independent entry as shown in the example below. Here is a brief description of each of the method's input parameters:
Whether you are returning a list of dictionaries or a list of json objects, the following fields exist for each list entry
```Python
      'inputText': the text you would like to detect verses within. This is a mandatory parameter,
      'finderr': a flag that indicates whether you want the code to detect simple typos or not. The default is True,
      'findMissing':  a flag that indicates whether you want the code to detect a missing word.  The default is True,
      'allowedErrPers': a real number representing the allowed error percentage if finderr is True. Default = 0.25 
      'minMatch': minimum number of words to return a result. Default is 3. Lower vals will not affect results, but higher will. 
      'return_json': a flag that indicates whether a list of json objects should be returned or a list of dictionaries. Default is False 
      'd': a custom list of delimiters to use. The default is: '#|،|\\.|{|}|\n|؟|!|\\(|\\)|﴿|﴾|۞|\u06dd|\\*|-|\\+|\\:|…'
```
<b>Since all parameters except <code>inputText</code>, have default values, you can simply call the method like this:</b>

```Python 
    <qMatcherAnnotaterOBj>.matchAll(inText)```
as in the example below. 
Whether you are returning a list of dictionaries or a list of json objects, the following fields exist for each list entry
```Python
      'aya_name': the name of the aya that has been detected,
      'verses': a list of verses belonging to the detected aya,
      'errors': a list of lists whose number is equivalent to the num of detected verses. List entry 1 contains errors        that occurred in verse 1 or an empty list if no errors have occurred, while list 2 contains errors for verse 2, ... 
      'startInText': The index of the word start position  of the aya (in the input text),
      'endInText': the index of the end position of the aya +1  (in the input text),
      'aya_start': the number of the first verse appearing in the text,
      'aya_end': the number of the last verse appearing in the text
    ```

In [3]:
#Calling matchAll without changing any of the values of the default parmaters 
txt = "RT @user: كرامة المؤمن عند الله تعالى؛ حيث سخر له الملائكة يستغفرون له ﴿الذِين يحملونَ العرشَ ومَن حَولهُ يُسبحو بِحمدِ ربهِم واذكر ربك إذا نسيت…"  
vs = qAn.matchAll(txt)
print(vs)
print(len(vs), 'entrie(s) returned.' )

[{'aya_name': 'غافر', 'verses': ['الذين يحملون العرش ومن حوله يسبحون بحمد ربهم'], 'errors': [[('يسبحو', 'يسبحون', 18)]], 'startInText': 13, 'endInText': 21, 'aya_start': 7, 'aya_end': 7}, {'aya_name': 'الكهف', 'verses': ['واذكر ربك اذا نسيت'], 'errors': [[]], 'startInText': 21, 'endInText': 25, 'aya_start': 24, 'aya_end': 24}]
2 entrie(s) returned.


In [4]:
#Another example
txt = 'RT @user: بسْمِ اللهِ الرَّحْمَنِ الرَّحِيمِ  قُلْ هُوَ اللَّهُ أَحَدٌ ۞ اللَّهُ الصَّمَدُ ۞ لَمْ يَلِدْ وَلَمْ يُولَدْ ۞ وَلَمْ يَ…'
vs = qAn.matchAll(txt)
print(vs)
print(len(vs), 'entrie(s) returned.' )

[{'aya_name': 'الإخلاص', 'verses': ['قل هو الله احد', 'الله الصمد', 'لم يلد ولم يولد'], 'errors': [[], [], []], 'startInText': 6, 'endInText': 18, 'aya_start': 1, 'aya_end': 3}]
1 entrie(s) returned.


In [5]:
#an example where a missing word is detected
txt = 'الم ذلك الكتاب لا ريب هدي للمتقين'
vs = qAn.matchAll(txt)
print(vs)
print(len(vs), 'entrie(s) returned.' )

[{'aya_name': 'البقرة', 'verses': ['الم', 'ذلك الكتاب لا ريب فيه هدي للمتقين'], 'errors': [[], [('هدي', 'فيه هدي', 5)]], 'startInText': 0, 'endInText': 7, 'aya_start': 1, 'aya_end': 2}]
1 entrie(s) returned.


In [6]:
txt = 'RT @HolyQraan: من قرأها ثلاث مرات فكأنما قرأ القرآن كاملا ..   ﴿قُلْ هُوَ اللَّهُ أَحَدٌ ۝ اللَّهُ الصَّمَدُ ۝ لَمْ يَلِدْ وَلَمْ يُولَدْ…'
vs = qAn.matchAll(txt)
print(vs)
print(len(vs), 'entrie(s) returned.' )

[{'aya_name': 'الإخلاص', 'verses': ['قل هو الله احد', 'الله الصمد', 'لم يلد ولم يولد'], 'errors': [[], [], []], 'startInText': 11, 'endInText': 23, 'aya_start': 1, 'aya_end': 3}]
1 entrie(s) returned.


#### Using the error detection flags
The error detection options are used by default, but those slow down the matching process significantly. You should <b>only</b> use those when you want to detect spelling mistakes or missing words in a verse. Mis-spelling a word or forgetting the word are the most common mistakes that people make when writing Qur’an from memory. Below is a list of the flags used for error detection and their default values:
```Python
 findErr=True            #this flag should always be set to true if you want to detect spelling mistakes and false otherwise 

 allowedErrPers=0.25     #if findErr is set to true, this flag controls how the % of words that can be considered wrong when returning a match.  If the errs are greater than this %, a match will not be returned. A 4 word verse for example, can only have 1 word which is misspelled. Higher values lead to higher recall but lower precision and vice versa. 

 findMissing=True

```
Below are some examples 

In [7]:
# here we will mis-spell هو
# the mis-spelled word appears in the error list of the first verse. 
txt = 'RT @HolyQraan: من قرأها ثلاث مرات فكأنما قرأ القرآن كاملا ..   ﴿قُلْ هُوَا اللَّهُ أَحَدٌ ۝ اللَّهُ الصَّمَدُ ۝ لَمْ يَلِدْ وَلَمْ يُولَدْ…'
vs = qAn.matchAll(txt)
print(vs)
print(len(vs), 'entrie(s) returned.' )


[{'aya_name': 'الإخلاص', 'verses': ['قل هو الله احد', 'الله الصمد', 'لم يلد ولم يولد'], 'errors': [[('هوا', 'هو', 12)], [], []], 'startInText': 11, 'endInText': 23, 'aya_start': 1, 'aya_end': 3}]
1 entrie(s) returned.


In [8]:
#here we turn off the error detection option. Notice how the verse with the error is no longer detected. 
txt = 'RT @HolyQraan: من قرأها ثلاث مرات فكأنما قرأ القرآن كاملا ..   ﴿قُلْ هُوَا اللَّهُ أَحَدٌ ۝ اللَّهُ الصَّمَدُ ۝ لَمْ يَلِدْ وَلَمْ يُولَدْ…'
vs = qAn.matchAll(txt, findErr= False)
print(vs)


[{'aya_name': 'الإخلاص', 'verses': ['الله الصمد', 'لم يلد ولم يولد'], 'errors': [[], []], 'startInText': 16, 'endInText': 23, 'aya_start': 2, 'aya_end': 3}]


In [9]:
# here we remove الله from the first verse
# the missing word appears in the error list of the first verse. 
txt = 'RT @HolyQraan: من قرأها ثلاث مرات فكأنما قرأ القرآن كاملا ..   ﴿قُلْ هُوَ أَحَدٌ ۝ اللَّهُ الصَّمَدُ ۝ لَمْ يَلِدْ وَلَمْ يُولَدْ…'
vs = qAn.matchAll(txt)
print(vs)
print(len(vs), 'entrie(s) returned.' )


[{'aya_name': 'الإخلاص', 'verses': ['قل هو الله احد', 'الله الصمد', 'لم يلد ولم يولد'], 'errors': [[('احد', 'الله احد', 13)], [], []], 'startInText': 11, 'endInText': 22, 'aya_start': 1, 'aya_end': 3}]
1 entrie(s) returned.


In [10]:
#now we dis-able the detection of missing words. Again, the verse where a missing word exists, is now no longer detected. 
txt = 'RT @HolyQraan: من قرأها ثلاث مرات فكأنما قرأ القرآن كاملا ..   ﴿قُلْ هُوَ أَحَدٌ ۝ اللَّهُ الصَّمَدُ ۝ لَمْ يَلِدْ وَلَمْ يُولَدْ…'
vs = qAn.matchAll(txt, findMissing=False)
print(vs)

[{'aya_name': 'الإخلاص', 'verses': ['الله الصمد', 'لم يلد ولم يولد'], 'errors': [[], []], 'startInText': 15, 'endInText': 22, 'aya_start': 2, 'aya_end': 3}]


In [11]:
#In this example we increase the error tolerance. This is not advised because you might end up with matches that are not accurate

#With the default error tolerance of 25% or 0.25
txt = 'RT @HolyQraan: من قرأها ثلاث مرات فكأنما قرأ القرآن كاملا ..   ﴿قُلْ هُوَا اللَّهُ أَحَ…'
print('With the default error tolerance of 25% or 0.25 (no matches returned):')
vs = qAn.matchAll(txt)
print(vs)
#With the increaced error tolerance
vs = qAn.matchAll(txt, allowedErrPers=0.5)
print('With the increaced error tolerance:')
print(vs)


With the default error tolerance of 25% or 0.25 (no matches returned):
[]
With the increaced error tolerance:
[{'aya_name': 'الإخلاص', 'verses': ['قل هو الله احد'], 'errors': [[('هوا', 'هو', 12), ('اح', 'احد', 14)]], 'startInText': 11, 'endInText': 15, 'aya_start': 1, 'aya_end': 1}]


## Annotating Text
Annotating text in the context of this work, means detecting Quranic verses in text, replacing them with their correct diacriticized forms, and adding a reference to the original verses in Quran. To detect Quranic verses and annotate them, the following method should be used: <br>
```Python
    <qMatcherAnnotaterOBj>.annotateTxt(inputText, findErrs, findMissing, allowedErrPers, minMatch, delimeterList)
    print(intext)```
In most cases, you can simply use
```Python 
    <qMatcherAnnotaterOBj>.annotateTxt(inText)```
as in the example below. The defaults for <b>findErr</b> and <b>findMissing</b> is <b>True</b>. An internal delimiter list is used when one is not specified. Dots before a verse, mean that there is some other text in the original verse preceding the detected text. Dots after, mean that the verse has some more text.  Please note that error detection flags can be used in the exact same way as with the matching examples shown in the above section. 

In [12]:
txt = "RT @user:... كرامة المؤمن عند الله تعالى؛ حيث سخر له الملائكة يستغفرون له ﴿الذِين يحملونَ العرشَ ومَن حَولهُ يُسبحونَ بِحمدِ ربهِم…"
t = qAn.annotateTxt(txt)
print('')
print(t)


RT @user:... كرامة المؤمن عند الله تعالى؛ حيث سخر له الملائكة يستغفرون له"الَّذِينَ يَحْمِلُونَ الْعَرْشَ وَمَنْ حَوْلَهُ يُسَبِّحُونَ بِحَمْدِ رَبِّهِمْ..."(غافر:7)


In [13]:
#note how the last word has been automatically corrected 
txt = ' واستعينوا بالصبر والصلاه وانها لكبيره الا علي الخشعين'
qAn.annotateTxt(txt)

'"وَاسْتَعِينُوا بِالصَّبْرِ وَالصَّلَاةِ ۚ وَإِنَّهَا لَكَبِيرَةٌ إِلَّا عَلَى الْخَاشِعِينَ"(البقرة:45)'

In [17]:
print("This is the list of default delimiters:")
qdetect.globalDelimeters

This is the list of default delimiters:


'#|،|\\.|{|}|\n|؟|!|\\(|\\)|﴿|﴾|۞|\u06dd|\\*|-|\\+|\\:|…'

In [18]:
txt = 'RT @7Life4ever: ﷽  قل هو ﷲ أحد۝ ﷲ الصمد۝لم يلد ولم يولد۝ولم يكن له كفوا أحد  ﷽  قل أعوذ برب الفلق۝من شر ما خلق ۝ومن شر غاسق إذا وقب۝ومن شر ا…'
qAn.annotateTxt(txt)

'RT @7Life4ever: ﷽"قُلْ هُوَ اللَّهُ أَحَدٌ، اللَّهُ الصَّمَدُ، لَمْ يَلِدْ وَلَمْ يُولَدْ، وَلَمْ يَكُنْ لَهُ كُفُوًا أَحَدٌ"(الإخلاص:1-4) ﷽"قُلْ أَعُوذُ بِرَبِّ الْفَلَقِ، مِنْ شَرِّ مَا خَلَقَ، وَمِنْ شَرِّ غَاسِقٍ إِذَا وَقَبَ"(الفلق:1-3)\u06dd ومن شر ا…'