<a href="https://colab.research.google.com/github/animesh-11/AI_ML/blob/main/Q4_Encrypting_DNA_Codes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Q4 Encrypting DNA Codes

### **Question Description**

In this problem, you are tasked with designing a function that **encrypts base pairs** into single, lowercase letters of the English alphabet. The DNA alphabet consists of both standard and non-standard bases.

- Standard Bases:
  - Adenine (**A**)
  - Thymine (**T**)
  - Guanine (**G**)
  - Cytosine (**C**)

- Non-Standard Bases:
  - 5-hydroxycytosine (**H**)
  - Inosine (**I**)
  - 5-methylcytosine (**M**)
  - 2'-O-methylated base (**O**)
  - Pseudouridine (**P**)
  - Queuosine (**Q**)

Your goal is to write a function **`encrypt_bases(input_bases)`** which takes a string of bases and returns an encrypted string where each base or base pair is replaced by a corresponding lowercase letter of the English alphabet, based on mapping rules provided below.

### **Input Format**
- A sequence (**str**) of bases (e.g., `'AACGT'`, `'MIOP'`, `'ATGC'`, `'H'`)

### **Output Format**
- An encrypted sequence (**str**) where each base or base pair is replaced by the corresponding letter, based on the alphabetical mapping rules as follows:
  - **Pairs of standard bases**: All possible pairs of standard bases (e.g., `'AA'`, `'AT'`, `'AG'`, etc.) should be mapped to letters **`'a'`** to **`'p'`**, sorted alphabetically, i.e., `'AA'` → `'a'`, `'AC'` → `'b'`, ..., `'TG'` → `'o'`, `'TT'` → `'p'`
  - **Single standard bases**: Standard bases that **cannot appear as part of a pair** (for any reason) should be mapped to letters **`'q'`** to **`'t'`**, sorted alphabetically, i.e., `'A'` → `'q'`, `'C'` → `'r'`, `'G'` → `'s'`, and `'T'` → `'t'`
  - **Non-standard bases**: There are six non-standard bases that should be mapped to letters **`'u'`** to **`'z'`**, sorted alphabetically, i.e., `'H'` → `'u'`, `'I'` → `'v'`, ..., `'Q'` → `'z'`

### **Constraints**
- `input_bases` > 0
- `input_bases` must contain bases among the ones mentioned above

### **Example Cases**

**Example Case 1**
```
Input
ATGC

Output
dj
```

**Example Case 2**
```
Input
ATGCHIMOPQA

Output
djuvwxyzq
```

### **Code Stub**
```python
def encrypt_bases(input_bases):
    # Your code here

# Input and output processing (do not edit)
print(encrypt_bases(input()))
```

Here is a step-by-step algorithm for the `encrypt_bases` function.

### Algorithm

1.  **Define Mappings:**
    * Create a dictionary or a similar data structure to store the mappings for **pairs of standard bases**.
    * The standard bases are 'A', 'C', 'G', 'T'.
    * Generate all possible 16 pairs by combining these bases. For example, `'AA'`, `'AC'`, `'AG'`, etc.
    * Sort these pairs alphabetically.
    * Map the sorted pairs to the first 16 lowercase letters of the alphabet, from 'a' to 'p'. For example, `{'AA': 'a', 'AC': 'b', ..., 'TT': 'p'}`.
    * Create a separate dictionary for **single standard bases**.
    * The single standard bases are 'A', 'C', 'G', 'T'.
    * Sort these bases alphabetically.
    * Map them to the letters 'q' to 't'. For example, `{'A': 'q', 'C': 'r', 'G': 's', 'T': 't'}`.
    * Create another dictionary for **non-standard bases**.
    * The non-standard bases are 'H', 'I', 'M', 'O', 'P', 'Q'.
    * Sort these bases alphabetically.
    * Map them to the letters 'u' to 'z'. For example, `{'H': 'u', 'I': 'v', ..., 'Q': 'z'}`.

2.  **Initialize Variables:**
    * Initialize an empty string, `encrypted_string`, to store the result.
    * Initialize an index variable, `i`, to 0 to iterate through the input string.

3.  **Iterate and Encrypt:**
    * Use a `while` loop that continues as long as `i` is less than the length of the `input_bases` string.
    * Inside the loop, check for the next two characters.
    * Create a temporary string `pair` containing the characters at index `i` and `i + 1`.
    * Check if this `pair` exists as a key in the **pairs of standard bases** mapping.
    * If it exists:
        * Append the corresponding encrypted letter to `encrypted_string`.
        * Increment `i` by 2 (to move to the next potential pair).
    * If the pair does **not** exist:
        * Check for the single character at index `i`.
        * Check if the single character is in the **single standard bases** mapping.
        * If it is:
            * Append the corresponding encrypted letter to `encrypted_string`.
            * Increment `i` by 1.
        * If the single character is **not** in the single standard bases mapping, check the **non-standard bases** mapping.
        * If it is:
            * Append the corresponding encrypted letter to `encrypted_string`.
            * Increment `i` by 1.

4.  **Return Result:**
    * After the loop finishes, return the `encrypted_string`.

This algorithm ensures that the function correctly prioritizes two-character pairs before single characters, and handles all three types of base-to-letter mappings efficiently.

In [None]:
def encrypt_bases(input_bases):
    # Your code here
    standard_bases = ['A', 'C', 'G', 'T']
    non_standard_bases = ['H', 'I', 'M', 'O', 'P', 'Q']

    # Create mapping for pairs of standard bases
    pair_mapping = {}
    letter_index = 0
    for base1 in standard_bases:
        for base2 in standard_bases:
            pair = base1 + base2
            pair_mapping[pair] = chr(ord('a') + letter_index)
            letter_index += 1

    # Create mapping for single standard bases
    single_standard_mapping = {}
    single_standard_letter_index = 0
    for base in standard_bases:
        single_standard_mapping[base] = chr(ord('q') + single_standard_letter_index)
        single_standard_letter_index += 1

    # Create mapping for non-standard bases
    non_standard_mapping = {}
    non_standard_letter_index = 0
    for base in non_standard_bases:
        non_standard_mapping[base] = chr(ord('u') + non_standard_letter_index)
        non_standard_letter_index += 1

    encrypted_string = ""
    i = 0
    while i < len(input_bases):
        # Check for standard base pairs
        if i + 1 < len(input_bases):
            pair = input_bases[i:i+2]
            if pair in pair_mapping:
                encrypted_string += pair_mapping[pair]
                i += 2
                continue

        # Check for single standard bases
        if input_bases[i] in single_standard_mapping:
            encrypted_string += single_standard_mapping[input_bases[i]]
            i += 1
            continue

        # Check for non-standard bases
        if input_bases[i] in non_standard_mapping:
            encrypted_string += non_standard_mapping[input_bases[i]]
            i += 1
            continue

        # If none of the above, something is wrong with the input according to constraints
        # However, the constraints say input_bases must contain valid bases, so this case shouldn't happen
        # For robustness, we could potentially handle this, but following constraints we won't.
        i += 1 # Should not reach here with valid input


    return encrypted_string

# Input and output processing (do not edit)
print(encrypt_bases(input()))