## **1. Normalization functions**

## Table of Contents
1. Installation
2. Importing Normalization Module
3. Normalization Functions
   - Character Normalization
   - Number Normalization
   - Punctuation Normalization
   - Diacritics Removal
   - Spacing Correction
   - Normalize Text

## **Installation**
<p align="left">
  <a href="https://pypi.org/project/SaraikiNLP/"><img src="https://img.shields.io/pypi/v/SaraikiNLP.svg"/></a>
  <a href="https://huggingface.co/SaraikiNLP"><img src="https://img.shields.io/badge/Hugging%20Face-SaraikiNLP-yellow?logo=huggingface"/></a>
<a href="https://github.com/SaraikiNLP"><img src="https://img.shields.io/badge/Github-SaraikiNLP-blue?logo=github"/></a>

</p>

In [1]:
!pip install SaraikiNLP

Collecting SaraikiNLP
  Downloading saraikinlp-0.1.0rc2-py3-none-any.whl.metadata (8.2 kB)
Downloading saraikinlp-0.1.0rc2-py3-none-any.whl (9.6 kB)
Installing collected packages: SaraikiNLP
Successfully installed SaraikiNLP-0.1.0rc2


## **Import Normalization Module**

In [2]:
from SaraikiNLP import normalization

### **Character Normalization**
Normalizes all variants of Arabic, persian script alphabets to Urdu/Saraiki alphabets

In [3]:
example_text ="""
ڈیزل دی فی لیٹر قیمت 251 روپے 29 پیسے تھی ڳئی۔
"""

In [4]:
print(normalization.normalize_characters(example_text))


ڈیزل دی فی لیٹر قیمت 251 روپے 29 پیسے تھی ڳئی۔



### **Number Normalization**
📝 by default **convert_native** is **True**

👉 If True, converts Saraiki and Arabic numbers to Western (1,2,3,...)

👉 If False, converts Western and Arabic numbers to Saraiki (١,٢,٣,...)



In [5]:
print(normalization.normalize_numbers(example_text)) # with default case


ڈیزل دی فی لیٹر قیمت 251 روپے 29 پیسے تھی ڳئی۔



In [6]:
print(normalization.normalize_numbers(example_text, convert_native = False))


ڈیزل دی فی لیٹر قیمت ٢٥١ روپے ٢٩ پیسے تھی ڳئی۔



### **Punctuation Normalization**

📝 by default **convert_native** is **False**

👉 If True, converts native punctuation to Western equivalents.

👉 If False, preserves native punctuation except for minor removals.

In [7]:
print(normalization.normalize_punctuation('،گھر۔۔۔؟ بار ٹھیک ہِن سارے')) # default case

،گھر۔۔۔؟ بار ٹھیک ہِن سارے


In [8]:
print(normalization.normalize_punctuation('،گھر۔۔۔؟ بار ٹھیک ہِن سارے', convert_native= True))

,گھر...? بار ٹھیک ہِن سارے


### **Diacritics Removal**

👉 Removes all kind of Saraiki/Urdu diacritics except hamza as its important in (ئ ۓ scenario)


In [9]:
print(normalization.remove_diacritics('يَٰٓأَيُّهَا ٱلۡمُزَّمِّلُ'))

یأیہا المزمل


### **Spacing Correction**
Sometimes we may need space after puntuation marks. This function is helpful in that case.

In [10]:
print(normalization.insert_space_after_punctuation(
    """
     2  سرائیکی۔۔،وسیب
    """
))


     2  سرائیکی۔ ۔ ، وسیب
    


### **Normalize Text**
Normalizes text by applying **normalize_characters**, then **normalize_numbers** and **normalize_punctuation**

📝 by default **remove_diacritic** is **False**

👉 If True, removes diacritics at last of function.

👉 If False, preserves diacritics.

In [13]:
print(normalization.normalize_text("""
ڈیزل دی فی لیٹر قیمت 251 روپے 29 پیسے تھی ڳئی۔
"""
))


ڈیزل دی فی لیٹر قیمت 251 روپے 29 پیسے تھی ڳئی۔

