# **Basic Data Science Projects using Python, NumPy, Pandas, Matplotlib, Regular Expressions, and SQL**

<center>

*By: Prof. James Abello, Haoyang Zhang*

*Computer Science Department*

*Rutgers University*

*Nov. 21, 2024.*

</center>

## Topic 7: Leet Speak (regular expressions)

#### **Objective:** Translate English text to Leet Speak and vice versa using regular expressions.

#### **Estimated Completion Time: 5 hours**

Leet (or "1337") speak is a language that uses various combinations of characters to replace Latin letters. For example, the word "leet" is written as "1337" in leet speak.

In this project, you will write a Python program to:

- Encode a given string into leet speak by replacing certain letters with their corresponding leet speak characters.
- Decode a leet speak string back to the original string by reversing the substitutions.

A good reference is:

> L33t sp34k ch34t sh33t by Roald Craenen
> 
> https://www.gamehouse.com/blog/leet-speak-cheat-sheet/

#### Level 1

In this level, we will focus on a basic one-to-one mapping of characters without ambiguity. The mapping is as follows:

| Latin | Leet |
|-------|------|
| A     | 4    |
| B     | 8    |
| C     | (    |
| D     | )    |
| E     | 3    |
| F     | ƒ    |
| G     | 6    |
| H     | #    |
| I     | !    |
| J     | ]    |
| K     | \|   |
| L     | 1    |
| M     | м    |
| N     | и    |
| O     | Ø    |
| P     | 9    |
| Q     | 2    |
| R     | Я    |
| S     | 5    |
| T     | 7    |
| U     | µ    |
| V     | √    |
| W     | ω    |
| X     | Ж    |
| Y     | ¥    |
| Z     | %    |

Note: this version does not map any characters to latin letters.

##### Task 1.1

Write a Python function `encode_all_1()` that takes a string as input and encodes all Latin letters to leet speak using the mapping above.

You can assume that the input text contains only uppercase Latin letters, lowercase Latin letters, and spaces.

In [53]:
import re
def encode_all_1(text):
    """
    Encode all Latin letters to leet speak
    IN: text, str, input text
    OUT: str, leet speak text
    """

    text = re.sub('[Aa]', '4', text)
    text = re.sub('[Bb]', '8', text)
    text = re.sub('[Cc]', '(', text)
    text = re.sub('[Dd]', ')', text)
    text = re.sub('[Ee]', '3', text)
    text = re.sub('[Ff]', 'f', text)
    text = re.sub('[Gg]', '6', text)
    text = re.sub('[Hh]', '#', text)
    text = re.sub('[Ii]', '!', text)
    text = re.sub('[Jj]', 'j', text)
    text = re.sub('[Kk]', '|', text)
    text = re.sub('[Ll]', '1', text)
    text = re.sub('[Mm]', 'M', text)
    text = re.sub('[Nn]', 'n', text)
    text = re.sub('[Oo]', '0', text)
    text = re.sub('[Pp]', '9', text)
    text = re.sub('[Qq]', '2', text)
    text = re.sub('[Rr]', 'R', text)
    text = re.sub('[Ss]', '5', text)
    text = re.sub('[Tt]', '7', text)
    text = re.sub('[Uu]', 'µ', text)
    text = re.sub('[Vv]', '√', text)
    text = re.sub('[Ww]', 'ω', text)
    text = re.sub('[Xx]', 'X', text)
    text = re.sub('[Yy]', '¥', text)
    text = re.sub('[Zz]', '%', text)
    return text

In [54]:
if __name__ == "__main__":
    print(encode_all_1("aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ"))

4488(())33ff66##!!jj||11MMnn009922RR5577µµ√√ωωXX¥¥%%


##### Task 1.2

Write a Python function `decode_all_1()` to reverse the operation of `encode_all_1()`.

You may assume that the final text contains only uppercase Latin letters, lowercase Latin letters, and spaces.

##### Task 1.3

In [55]:
import re
def decode_all_1(text):
    """
    Decode all leet speak to Latin letters
    IN: text, str, leet speak text
    OUT: str, Latin letters text
    """
    text = re.sub('4', 'A', text)
    text = re.sub('8', 'B', text)
    text = re.sub('\(', 'C', text)
    text = re.sub('\)', 'D', text)
    text = re.sub('3', 'E', text)
    text = re.sub('f', 'F', text)
    text = re.sub('6', 'G', text)
    text = re.sub('#', 'H', text)
    text = re.sub('!', 'I', text)
    text = re.sub('j', 'J', text)
    text = re.sub('\|', 'K', text)
    text = re.sub('1', 'L', text)
    text = re.sub('M', 'M', text)
    text = re.sub('n', 'N', text)
    text = re.sub('0', 'O', text)
    text = re.sub('9', 'P', text)
    text = re.sub('2', 'Q', text)
    text = re.sub('R', 'R', text)
    text = re.sub('5', 'S', text)
    text = re.sub('7', 'T', text)
    text = re.sub('µ', 'U', text)
    text = re.sub('√', 'V', text)
    text = re.sub('ω', 'W', text)
    text = re.sub('X', 'X', text)
    text = re.sub('¥', 'Y', text)
    text = re.sub('%', 'Z', text)
    return text

In [56]:
if __name__ == "__main__":
    print(decode_all_1(f"4488(())33ff66##!!jj||11MMnn009922RR5577µµ√√ωωXX¥¥%%"))

AABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUUVVWWXXYYZZ


Write a Python function `encode_partially_1()` that takes a string and a number `p` between 0 and 1 as input, and encodes each Latin letter to leet speak with probability `p`.

For example, if `p = 0.5`, then each Latin letter has a 50% chance of being encoded to leet speak.

In [57]:
import random
import re
def encode_partially_1(text, p):
    """
    Encode each Latin letter to leet speak with probability p
    IN: text, str, input text
        p, float, probability of encoding
    OUT: str, partially encoded text
    """
    def replace_prob(match):
        if random.random() < p:
            char = match.group(0)
            return {
                'A': '4', 'a': '4', 'B': '8', 'b': '8',
                'C': '(', 'c': '(', 'D': ')', 'd': ')',
                'E': '3', 'e': '3', 'F': 'f', 'f': 'f',
                'G': '6', 'g': '6', 'H': '#', 'h': '#',
                'I': '!', 'i': '!', 'J': 'j', 'j': 'j',
                'K': '|', 'k': '|', 'L': '1', 'l': '1',
                'M': 'M', 'm': 'M', 'N': 'n', 'n': 'n',
                'O': '0', 'o': '0', 'P': '9', 'p': '9',
                'Q': '2', 'q': '2', 'R': 'R', 'r': 'R',
                'S': '5', 's': '5', 'T': '7', 't': '7',
                'U': 'µ', 'u': 'µ', 'V': '√', 'v': '√',
                'W': 'ω', 'w': 'ω', 'X': 'X', 'x': 'X',
                'Y': '¥', 'y': '¥', 'Z': '%', 'z': '%'
            }[char]
        return match.group(0)

    return re.sub('[a-zA-Z]', replace_prob, text)

In [58]:
if __name__ == "__main__":
    print(encode_partially_1("AABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUUVVWWXXYYZZ", 0.5))

4A8BC())3EfF66##!Ijj||11MMnnO0PPQ2RR55TTµU√VWωXX¥¥Z%


##### Task 1.4

Write a Python function `decode_partially_1()` to reverse the operation of `encode_partially_1()`. 

Observation: Any Latin letter can be "decoded" as it is, since this version of leet speak does not map any character back to a Latin letter.

In [59]:
import re
def decode_partially_1(text):
    """
    Decode each Latin letter from leet speak
    IN: text, str, partially encoded text
    OUT: str, partially decoded text
    """
    text = re.sub('4', 'A', text)
    text = re.sub('8', 'B', text)
    text = re.sub('\(', 'C', text)
    text = re.sub('\)', 'D', text)
    text = re.sub('3', 'E', text)
    text = re.sub('f', 'F', text)
    text = re.sub('6', 'G', text)
    text = re.sub('#', 'H', text)
    text = re.sub('!', 'I', text)
    text = re.sub('j', 'J', text)
    text = re.sub('\|', 'K', text)
    text = re.sub('1', 'L', text)
    text = re.sub('n', 'N', text)
    text = re.sub('0', 'O', text)
    text = re.sub('9', 'P', text)
    text = re.sub('2', 'Q', text)
    text = re.sub('5', 'S', text)
    text = re.sub('7', 'T', text)
    text = re.sub('µ', 'U', text)
    text = re.sub('√', 'V', text)
    text = re.sub('ω', 'W', text)
    text = re.sub('¥', 'Y', text)
    text = re.sub('%', 'Z', text)
    return text

In [60]:
if __name__ == "__main__":
    print(decode_partially_1(f"448BCC)DE3fFG6##!!jJKKLLMMnn00PPQQRR55TTUU√√WWXX¥Y%%"))

AABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUUVVWWXXYYZZ


#### Level 2

In this level, users can define their own mapping of characters to leet speak as a JSON dictionary, ensuring no ambiguity (using a prefix-free code).

A prefix-free code is a type of coding system in which **no code is the prefix of another code**. For example, if `C` is encoded as `(`, then no other character can be encoded as `(` or as a sequence of characters starting with `(`.

There is a mathematical proof that **if a code is prefix-free, there is a unique way to decode the encoded text**. In other words, there is no ambiguity when decoding the encoded text.

For example, the JSON dictionary for the basic mapping in Level 1 is:

```json
{
    "A": ["4"],
    "B": ["8"],
    "C": ["("],
    "D": [")"],
    "E": ["3"],
    "F": ["ƒ"],
    "G": ["6"],
    "H": ["#"],
    "I": ["!"],
    "J": ["]"],
    "K": ["|"],
    "L": ["1"],
    "M": ["м"],
    "N": ["и"],
    "O": ["Ø"],
    "P": ["9"],
    "Q": ["2"],
    "R": ["Я"],
    "S": ["5"],
    "T": ["7"],
    "U": ["µ"],
    "V": ["√"],
    "W": ["ω"],
    "X": ["Ж"],
    "Y": ["¥"],
    "Z": ["%"]
}
```

A more complex version that allows one Latin letter mapping to multiple leet speak characters or sequences of characters is:

```json
{
    "A": ["4", "@", "Д"],
    "B": ["8", "ß"],
    "C": ["(", "<", "©", "¢"],
    "D": [")", ">"],
    "E": ["3", "£"],
    "F": ["ƒ"],
    "G": ["6", "&"],
    "H": ["#", "|-|"],
    "I": ["!"],
    "J": ["]", "_|"],
    "K": ["|<"],
    "L": ["1", "|_"],
    "M": ["м", "|\/|"],
    "N": ["и", "|\\|"],
    "O": ["Ø"],
    "P": ["9", "|°"],
    "Q": ["2"],
    "R": ["Я", "|~"],
    "S": ["5", "$", "§"],
    "T": ["7", "-|-"],
    "U": ["µ"],
    "V": ["√"],
    "W": ["ω", "\^/"],
    "X": ["Ж", "×"],
    "Y": ["¥", "γ"],
    "Z": ["%"]
}
```
Notice The character `K` has been changed to `|<` instead of `|` to allow other characters to be mapped to a sequence starting with `|`.

##### Task 2.1

Write a Python function `check_prefix_free_2()` that takes a JSON file name as input and checks if the JSON file specifies a prefix-free code.

In [61]:
import json
def check_prefix_free_2(json_file):
    """
    Check if the JSON file specifies a prefix-free code
    IN: json_file, str, JSON file name
    OUT: dict or None, dictionary of characters mapping to leet speak or None if not prefix-free
    """
    with open(json_file, 'r') as f:
        mapping = json.load(f)
    
    # Get all possible leet codes
    all_codes = []
    for codes in mapping.values():
        all_codes.extend(codes)
    
    # Check if any code is a prefix of another
    for code1 in all_codes:
        for code2 in all_codes:
            if code1 != code2 and (code2.startswith(code1) or code1.startswith(code2)):
                return None
    
    return mapping

In [62]:
if __name__ == "__main__":
    #check json file 1
    json_file = 'examples.json'
    result = check_prefix_free_2(json_file)
    if result is not None:
        print(check_prefix_free_2(json_file))
    else:
        print('This Code is NOT prefix free')

{'A': ['4'], 'B': ['8'], 'C': ['('], 'D': [')'], 'E': ['3'], 'F': ['f'], 'G': ['6'], 'H': ['#'], 'I': ['!'], 'J': ['j'], 'K': ['|'], 'L': ['1'], 'M': ['M'], 'N': ['n'], 'O': ['0'], 'P': ['9'], 'Q': ['2'], 'R': ['R'], 'S': ['5'], 'T': ['7'], 'U': ['u'], 'V': ['v'], 'W': ['w'], 'X': ['x'], 'Y': ['y'], 'Z': ['%']}


In [64]:
if __name__ == "__main__":
    #check json file 2
    json_file = 'examples2.json'
    result = check_prefix_free_2(json_file)
    if result is not None:
        print(check_prefix_free_2(json_file))
    else:
        print('This Code is NOT prefix free')

This Code is NOT prefix free


##### Task 2.2

Write a Python function `encode_partially_2()` that takes a string and a number `p` between `0` and `1` as input and encodes each Latin letter to leet speak with probability `p` using the user-defined mapping. When a multiple mapping is possible, choose one randomly.

```python
def encode_partially_2(text, p, mapping):
    """
    Encode each Latin letter to leet speak with probability p using the user-defined mapping
    IN: text, str, input text
        p, float, probability of encoding
        mapping, dict, user-defined mapping
    OUT: str, partially encoded text
    """
    def replace_prob(match):
        if random.random() < p:
            char = match.group(0).upper()
            if char in mapping:
                # Randomly choose one of the possible encodings
                return random.choice(mapping[char])
        return match.group(0)
    
    # Create pattern to match any mappable character
    pattern = f"[{''.join(mapping.keys())}]"
    # Apply substitution with probability
    return re.sub(pattern, replace_prob, text, flags=re.IGNORECASE)
```

Or, you can create a constructor for `encode_partially_2()` that takes the mapping as an argument:

```python
def encode_partially_2(mapping):
    """
    Constructor for encoding each Latin letter to leet speak with probability p using the user-defined mapping
    IN: mapping, dict, user-defined mapping
    OUT: function, (text: str, p: float) -> str, encode latin letters to leet speak with probability p
    """
    # define the encoding function using the mapping
    def encode_partially_2(text, p):
        """
        Encode each Latin letter to leet speak with probability p using the user-defined mapping
        IN: text, str, input text
            p, float, probability of encoding
        OUT: str, partially encoded text
        """
        # specify the encoding logic using the mapping
        pass

    # return the encoding function
    return encode_partially_2
```

In [65]:
import json
import random
def encode_partially_2(text, p, mapping):
    """
    Encode each Latin letter to leet speak with probability p using the user-defined mapping
    IN: text, str, input text
        p, float, probability of encoding
        mapping, dict, user-defined mapping
    OUT: str, partially encoded text
    """
    def replace_prob(match):
        if random.random() < p:
            char = match.group(0).upper()
            if char in mapping:
                # Randomly choose one of the possible encodings
                return random.choice(mapping[char])
        return match.group(0)
    
    # Create pattern to match any mappable character
    pattern = f"[{''.join(mapping.keys())}]"
    # Apply substitution with probability
    return re.sub(pattern, replace_prob, text, flags=re.IGNORECASE)

In [69]:
if __name__ == "__main__":
    json_file = 'examples.json'
    result = check_prefix_free_2(json_file)
    print(encode_partially_2("WAXING", 0.5, result))

WAXInG


In [70]:
if __name__ == "__main__":
    json_file = 'examples2.json'
    result = check_prefix_free_2(json_file)
    print(encode_partially_2("WAXING", 0.5, result))

AttributeError: 'NoneType' object has no attribute 'keys'

##### Task 2.3

Write a Python function `decode_partially_2()` to reverse the operation of `encode_partially_2()` using the user-defined mapping.

```python
def decode_partially_2(text, mapping):
    """
    Decode each Latin letter from leet speak using the user-defined mapping
    IN: text, str, partially encoded text
        mapping, dict, user-defined mapping
    OUT: str, partially decoded text
    """
    # Create reverse mapping
    reverse_mapping = {}
    for letter, codes in mapping.items():
        for code in codes:
            reverse_mapping[re.escape(code)] = letter
    
    # Sort codes by length (longest first) to handle multi-character codes correctly
    sorted_codes = sorted(reverse_mapping.keys(), key=len, reverse=True)
    
    # Create pattern for all possible codes
    pattern = '|'.join(sorted_codes)
    
    def replace_with_letter(match):
        return reverse_mapping[re.escape(match.group(0))]
    
    # Apply substitution
    return re.sub(pattern, replace_with_letter, text)
```

Or, you can create a constructor for `decode_partially_2()` that takes the mapping as an argument:

```python
def decode_partially_2(mapping):
    """
    Constructor for decoding each Latin letter from leet speak using the user-defined mapping
    IN: mapping, dict, user-defined mapping
    OUT: function, (text: str) -> str, decode latin letters from leet speak
    """
    # define the decoding function using the mapping
    def decode_partially_2(text):
        """
        Decode each Latin letter from leet speak using the user-defined mapping
        IN: text, str, partially encoded text
        OUT: str, partially decoded text
        """
        # specify the decoding logic using the mapping
        pass

    # return the decoding function
    return decode_partially_2
```

In [71]:
import json
import random
def decode_partially_2(text, mapping):
    """
    Decode each Latin letter from leet speak using the user-defined mapping
    IN: text, str, partially encoded text
        mapping, dict, user-defined mapping
    OUT: str, partially decoded text
    """
    # Create reverse mapping
    reverse_mapping = {}
    for letter, codes in mapping.items():
        for code in codes:
            reverse_mapping[re.escape(code)] = letter
    
    # Sort codes by length (longest first) to handle multi-character codes correctly
    sorted_codes = sorted(reverse_mapping.keys(), key=len, reverse=True)
    
    # Create pattern for all possible codes
    pattern = '|'.join(sorted_codes)
    
    def replace_with_letter(match):
        return reverse_mapping[re.escape(match.group(0))]
    
    # Apply substitution
    return re.sub(pattern, replace_with_letter, text)

In [74]:
if __name__ == "__main__":
    json_file = 'examples.json'
    result = check_prefix_free_2(json_file)
    print(decode_partially_2(encode_partially_2("WAXING", 0.5, result), result))

WAXING


In [75]:
if __name__ == "__main__":
    json_file = 'examples2.json'
    result = check_prefix_free_2(json_file)
    print(encode_partially_2("WAXING", 0.5, result))
    print(decode_partially_2(encode_partially_2("WAXING", 0.5, result), result))

AttributeError: 'NoneType' object has no attribute 'keys'

#### Level 3

In this level:
- Words can be emphasized by adding the suffix "-zorz". For example, "leet" can be emphasized as "leetzorz" and further encoded to "1337%or|~z".
- Users can define their own mapping, not only for single Latin letters but also for words, to leet speak without ambiguity (using a prefix-free code).

For example: 

```json
{
    "words": {
        "real": ["٢٤٨١"],
        "eye": ["٤٢٤"],
        "age": ["٨٩٤"],
        "euro": ["٤٧٢٥"],
        "total": ["٢٥٢٨١"]
    },
    "letters": {
        "A": ["4", "@", "Д"],
        "B": ["8", "ß"],
        "C": ["(", "<", "©", "¢"],
        "D": [")", ">"],
        "E": ["3", "£"],
        "F": ["ƒ"],
        "G": ["6", "&"],
        "H": ["#", "|-|"],
        "I": ["!"],
        "J": ["]", "_|"],
        "K": ["|<"],
        "L": ["1", "|_"],
        "M": ["м", "|\/|"],
        "N": ["и", "|\\|"],
        "O": ["Ø"],
        "P": ["9", "|°"],
        "Q": ["2"],
        "R": ["Я", "|~"],
        "S": ["5", "$", "§"],
        "T": ["7", "-|-"],
        "U": ["µ"],
        "V": ["√"],
        "W": ["ω", "\^/"],
        "X": ["Ж", "×"],
        "Y": ["¥", "γ"],
        "Z": ["%"]
    }
}
```

The encoding and decoding sequence is as follows:

```mermaid
graph TB
    Original(["Original text"])
    AddSuffix["Add suffix '-zorz' to emphasize"]
    EncodeWord["Encode words to leet speak"]
    EncodeLetter["Encode letters to leet speak"]
    LeetSpeak(["Leet speak text"])
    DecodeLetter["Decode letters from leet speak"]
    DecodeWord["Decode words from leet speak"]
    RemoveSuffix["Remove suffix '-zorz'"]
    Original2(["Original text"])

    Original --> AddSuffix
    AddSuffix --> EncodeWord
    EncodeWord --> EncodeLetter
    EncodeLetter --> LeetSpeak
    LeetSpeak --> DecodeLetter
    DecodeLetter --> DecodeWord
    DecodeWord --> RemoveSuffix
    RemoveSuffix --> Original2
```

##### Task 3.1

Write a Python function `check_prefix_free_3()` similar to Task 2.1 that checks if the JSON file specifies a prefix-free code for both words and letters.

In [76]:
def check_prefix_free_3(json_file):
    """
    Check if the JSON file specifies a prefix-free code for both words and letters
    IN: json_file, str, JSON file name
    OUT: dict or None, dictionary of characters mapping to leet speak or None if not prefix-free
    """
    with open(json_file, 'r') as f:
        mapping = json.load(f)
    
    # Get all codes from both words and letters
    all_codes = []
    if 'words' in mapping:
        for codes in mapping['words'].values():
            all_codes.extend(codes)
    if 'letters' in mapping:
        for codes in mapping['letters'].values():
            all_codes.extend(codes)
    
    # Check for prefix relationships
    for code1 in all_codes:
        for code2 in all_codes:
            if code1 != code2 and (code2.startswith(code1) or code1.startswith(code2)):
                return None
    
    return mapping

In [77]:
if __name__ == "__main__":
    json_file = 'examples3.json'
    result = check_prefix_free_3(json_file)
    if result is not None:
        print(check_prefix_free_3(json_file))
    else:
        print("The Code is NOT Prefix Free")

{'words': {'hello': ['h3ll0x'], 'world': ['w0rld1'], 'computer': ['c0mpu74r'], 'python': ['py7h0n2'], 'code': ['c0d3x'], 'programming': ['pr06x'], 'testing': ['73s7ing'], 'debug': ['d3bu6x'], 'syntax': ['syn7x1'], 'function': ['funk7x']}, 'letters': {'A': ['4x'], 'B': ['8x'], 'C': ['(x'], 'D': [')x'], 'E': ['3x'], 'F': ['fx'], 'G': ['6x'], 'H': ['#x'], 'I': ['!x'], 'J': ['jx'], 'K': ['kx'], 'L': ['1x'], 'M': ['mx'], 'N': ['nx'], 'O': ['0x'], 'P': ['9x'], 'Q': ['qx'], 'R': ['rx'], 'S': ['5x'], 'T': ['7x'], 'U': ['ux'], 'V': ['vx'], 'W': ['wx'], 'X': ['xx'], 'Y': ['yx'], 'Z': ['zx']}}


##### Task 3.2

Write a Python function `add_emphasis_3()` that takes a string and a list of important words as input and emphasizes the important words in the string by adding the suffix "-zorz".

In [32]:
import json
import random
import re

def add_emphasis_3(text, important_words):
    """
    Add emphasis to important words by adding the suffix '-zorz'
    IN: text, str, input text
        important_words, list[str], list of important words
    OUT: str, text with emphasized words
    """
    def replace_word(match):
        word = match.group(0)
        if word.lower() in [w.lower() for w in important_words]:
            return f"{word}-zorz"
        return word
    
    # Create pattern to match whole words
    pattern = r'\b(' + '|'.join(map(re.escape, important_words)) + r')\b'
    return re.sub(pattern, replace_word, text, flags=re.IGNORECASE)

In [None]:
if __name__ == "__main__":
    important_words = ['pancakes', 'donuts']
    test_text = "hello there pancakes and donuts"
    result = add_emphasis_3(test_text, important_words)
    print(result)

##### Task 3.3

Write a Python function `encode_partially_words_3()` that takes a string and a number `p` between `0` and `1` as input and encodes words to leet speak with probability `p` using the user-defined mapping.

```python
import random
import json
import re
def encode_partially_words_3(text, p, mapping):
    """
    Encode words to leet speak with probability p using the user-defined mapping
    IN: text, str, input text
        p, float, probability of encoding
        mapping, dict, user-defined mapping
    OUT: str, partially encoded text
    """
    def encoder(text, p):
        def replace_word(match):
            word = match.group(0)
            if random.random() < p and word.lower() in mapping['words']:
                return random.choice(mapping['words'][word.lower()])
            return word
        
        # Create pattern to match whole words from mapping
        pattern = r'\b(' + '|'.join(map(re.escape, mapping['words'].keys())) + r')\b'
        return re.sub(pattern, replace_word, text, flags=re.IGNORECASE)
    
    return encoder
```

Or, you can create a constructor for `encode_partially_words_3()` that takes the mapping as an argument:

```python
def encode_partially_words_3(mapping):
    """
    Constructor for encoding words to leet speak with probability p using the user-defined mapping
    IN: mapping, dict, user-defined mapping
    OUT: function, (text: str, p: float) -> str, encode words to leet speak with probability p
    """
    # define the encoding function using the mapping
    def encode_partially_words_3(text, p):
        """
        Encode words to leet speak with probability p using the user-defined mapping
        IN: text, str, input text
            p, float, probability of encoding
        OUT: str, partially encoded text
        """
        # specify the encoding logic using the mapping
        pass

    # return the encoding function
    return encode_partially_words_3
```

In [34]:
import random
import json
import re
def encode_partially_words_3(text, p, mapping):
    """
    Encode words to leet speak with probability p using the user-defined mapping
    IN: text, str, input text
        p, float, probability of encoding
        mapping, dict, user-defined mapping
    OUT: str, partially encoded text
    """
    def replace_word(match):
        word = match.group(0)
        if random.random() < p and word.lower() in mapping['words']:
            return random.choice(mapping['words'][word.lower()])
        return word
    
    # Create pattern to match whole words from mapping
    pattern = r'\b(' + '|'.join(map(re.escape, mapping['words'].keys())) + r')\b'
    return re.sub(pattern, replace_word, text, flags=re.IGNORECASE)

In [None]:
if __name__ == "__main__":
    json_file = 'examples3.json'
    result = check_prefix_free_3(json_file)
    print(encode_partially_words_3('code code code just stick to the code', 0.5, result))

##### Task 3.4

Write Python function `encode_partially_letters_3()` similar to Task 2.2 that encodes each Latin letter to leet speak with probability `p` using the user-defined mapping.

```python
import json
import re
import random
def encode_partially_letters_3(text, p, mapping):
    """
    Encode each Latin letter to leet speak with probability p using the user-defined mapping
    IN: text, str, input text
        p, float, probability of encoding
        mapping, dict, user-defined mapping
    OUT: str, partially encoded text
    """
    def encoder(text, p):
        def replace_letter(match):
            letter = match.group(0)
            if random.random() < p:
                upper_letter = letter.upper()
                if upper_letter in mapping['letters']:
                    return random.choice(mapping['letters'][upper_letter])
            return letter
        
        # Create pattern to match any letter from mapping
        pattern = f"[{''.join(mapping['letters'].keys())}]"
        return re.sub(pattern, replace_letter, text, flags=re.IGNORECASE)
    
    return encoder
```

Or, you can create a constructor for `encode_partially_letters_3()` that takes the mapping as an argument:

```python
def encode_partially_letters_3(mapping):
    """
    Constructor for encoding each Latin letter to leet speak with probability p using the user-defined mapping
    IN: mapping, dict, user-defined mapping
    OUT: function, (text: str, p: float) -> str, encode latin letters to leet speak with probability p
    """
    # define the encoding function using the mapping
    def encode_partially_letters_3(text, p):
        """
        Encode each Latin letter to leet speak with probability p using the user-defined mapping
        IN: text, str, input text
            p, float, probability of encoding
        OUT: str, partially encoded text
        """
        # specify the encoding logic using the mapping
        pass

    # return the encoding function
    return encode_partially_letters_3
```

In [36]:
import json
import random
def encode_partially_letters_3(text, p, mapping):
    """
    Encode each Latin letter to leet speak with probability p using the user-defined mapping
    IN: text, str, input text
        p, float, probability of encoding
        mapping, dict, user-defined mapping
    OUT: str, partially encoded text
    """
    def replace_letter(match):
        letter = match.group(0)
        if random.random() < p:
            upper_letter = letter.upper()
            if upper_letter in mapping['letters']:
                return random.choice(mapping['letters'][upper_letter])
        return letter
    
    # Create pattern to match any letter from mapping
    pattern = f"[{''.join(mapping['letters'].keys())}]"
    return re.sub(pattern, replace_letter, text, flags=re.IGNORECASE)

In [None]:
if __name__ == "__main__":
    json_file = 'examples3.json'
    result = check_prefix_free_3(json_file)
    print(encode_partially_letters_3("Hello World", 0.5, result))

##### Task 3.5

Write a Python function `decode_partially_words_3()` to reverse the operation of `encode_partially_words_3()` using the user-defined mapping.

```python
import random
import json
import re
def decode_partially_words_3(text, mapping):
    """
    Decode words from leet speak using the user-defined mapping
    IN: text, str, partially encoded text
        mapping, dict, user-defined mapping
    OUT: str, partially decoded text
    """
    reverse_mapping = {}
    for word, codes in mapping['words'].items():
        for code in codes:
            reverse_mapping[re.escape(code)] = word
    
    # Sort codes by length (longest first) to handle potential overlaps
    sorted_codes = sorted(reverse_mapping.keys(), key=len, reverse=True)
    
    # Create pattern and replace all matching codes with their original words
    pattern = '|'.join(sorted_codes)
    return re.sub(pattern, lambda m: reverse_mapping[re.escape(m.group(0))], text)
```

Or, you can create a constructor for `decode_partially_words_3()` that takes the mapping as an argument:

```python
import random
import json
import re
def decode_partially_words_3(mapping):
    """
    Constructor for decoding words from leet speak using the user-defined mapping
    IN: mapping, dict, user-defined mapping
    OUT: function, (text: str) -> str, decode words from leet speak
    """
    # define the decoding function using the mapping
    # Create reverse mapping for words
    reverse_mapping = {}
    for word, codes in mapping['words'].items():
        for code in codes:
            reverse_mapping[re.escape(code)] = word
    
    # Sort codes by length (longest first) to handle potential overlaps
    sorted_codes = sorted(reverse_mapping.keys(), key=len, reverse=True)
    
    def decoder(text):
        """
        Decode words from leet speak using the user-defined mapping
        IN: text, str, partially encoded text
        OUT: str, partially decoded text
        """
        pattern = '|'.join(sorted_codes)
        return re.sub(pattern, lambda m: reverse_mapping[re.escape(m.group(0))], text)
    
    return decoder
```

In [42]:
import random
import json
import re
def decode_partially_words_3(text, mapping):
    """
    Decode words from leet speak using the user-defined mapping
    IN: text, str, partially encoded text
        mapping, dict, user-defined mapping
    OUT: str, partially decoded text
    """
    reverse_mapping = {}
    for word, codes in mapping['words'].items():
        for code in codes:
            reverse_mapping[re.escape(code)] = word
    
    # Sort codes by length (longest first) to handle potential overlaps
    sorted_codes = sorted(reverse_mapping.keys(), key=len, reverse=True)
    
    # Create pattern and replace all matching codes with their original words
    pattern = '|'.join(sorted_codes)
    return re.sub(pattern, lambda m: reverse_mapping[re.escape(m.group(0))], text)

In [None]:
if __name__ == "__main__":
    json_file = 'examples3.json'
    result = check_prefix_free_3(json_file)
    print(decode_partially_words_3('5yn7x', result))


##### Task 3.6

Write a Python function `decode_partially_letters_3()` to reverse the operation of `encode_partially_letters_3()` using the user-defined mapping.

```python
import re
import json
import random
def decode_partially_letters_3(text, mapping):
    """
    Decode each Latin letter from leet speak using the user-defined mapping
    IN: text, str, partially encoded text
        mapping, dict, user-defined mapping
    OUT: str, partially decoded text
    """
    reverse_mapping = {}
    for letter, codes in mapping['letters'].items():
        for code in codes:
            reverse_mapping[re.escape(code)] = letter
    
    # Sort codes by length (longest first) to handle potential overlaps
    sorted_codes = sorted(reverse_mapping.keys(), key=len, reverse=True)
    
    # Create pattern and replace all matching codes with their original letters
    pattern = '|'.join(sorted_codes)
    return re.sub(pattern, lambda m: reverse_mapping[re.escape(m.group(0))], text)
```

Or, you can create a constructor for `decode_partially_letters_3()` that takes the mapping as an argument:

```python
def decode_partially_letters_3(mapping):
    """
    Constructor for decoding each Latin letter from leet speak using the user-defined mapping
    IN: mapping, dict, user-defined mapping
    OUT: function, (text: str) -> str, decode latin letters from leet speak
    """
    # define the decoding function using the mapping
    # Create reverse mapping for letters
    reverse_mapping = {}
    for letter, codes in mapping['letters'].items():
        for code in codes:
            reverse_mapping[re.escape(code)] = letter
    
    # Sort codes by length (longest first) to handle potential overlaps
    sorted_codes = sorted(reverse_mapping.keys(), key=len, reverse=True)
    
    def decoder(text):
        """
        Decode each Latin letter from leet speak using the user-defined mapping
        IN: text, str, partially encoded text
        OUT: str, partially decoded text
        """
        pattern = '|'.join(sorted_codes)
        return re.sub(pattern, lambda m: reverse_mapping[re.escape(m.group(0))], text)
    
    return decoder
```

In [44]:
import re
import json
import random
def decode_partially_letters_3(text, mapping):
    """
    Decode each Latin letter from leet speak using the user-defined mapping
    IN: text, str, partially encoded text
        mapping, dict, user-defined mapping
    OUT: str, partially decoded text
    """
    reverse_mapping = {}
    for letter, codes in mapping['letters'].items():
        for code in codes:
            reverse_mapping[re.escape(code)] = letter
    
    # Sort codes by length (longest first) to handle potential overlaps
    sorted_codes = sorted(reverse_mapping.keys(), key=len, reverse=True)
    
    # Create pattern and replace all matching codes with their original letters
    pattern = '|'.join(sorted_codes)
    return re.sub(pattern, lambda m: reverse_mapping[re.escape(m.group(0))], text)

In [None]:
if __name__ == "__main__":
    json_file = 'examples3.json'
    result = check_prefix_free_3(json_file)
    print(decode_partially_letters_3('70', result))

##### Task 3.7

Write a Python function `remove_emphasis_3()` that takes a string and removes the suffix "-zorz" from words.

```python
def remove_emphasis_3(text):
    """
    Remove the suffix '-zorz' from words
    IN: text, str, input text
    OUT: str, text with emphasized suffix `-zorz` removed
    """
    return re.sub(r'-zorz\b', '', text)
```


In [3]:
import re
def remove_emphasis_3(text):
    """
    Remove the suffix '-zorz' from words
    IN: text, str, input text
    OUT: str, text with emphasized suffix `-zorz` removed
    """
    return re.sub(r'-zorz\b', '', text)

In [None]:
if __name__ == "__main__":
    print(remove_emphasis_3('hello-zorz'))

### References

> - Blashki, Katherine; Nichol, Sophie (2005). "Game Geek's Goss: Linguistic Creativity In Young Males Within An Online University Forum" (PDF). Australian Journal of Emerging Technologies and Society. 3 (2): 77–86.
> - LeBlanc, Tracy Rene (May 2005). "Is There A Translator in Teh House?": Cultural and Discourse Analysis of a Virtual Speech Community on an Internet Message Board (MA thesis). Louisiana State University. doi:10.31390/gradschool_theses.4112
> - Perea, M.; Duñabeitia, J. A.; Carreiras, M. (2008). "R34D1Ng W0Rd5 W1Th Numb3R5" (PDF). Journal of Experimental Psychology: Human Perception and Performance. 34 (1): 237–241. doi:10.1037/0096-1523.34.1.237. ISSN 0096-1523. PMID 18248151. S2CID 6054151
> - Raymond, Eric R.; Steele, Guy L. (1996). The New Hacker's Dictionary. MIT Press. ISBN 978-0-262-68092-9.