# Assignment 4
## By: **Nils Dunlop, e-mail: gusdunlni@student.gu.se**

## Problem 1

In [1]:
def encode(input_string):
    # Return None if the input string is empty
    if not input_string:
        return
    current_char, count = input_string[0], 1
    for char in input_string[1:]:
        if char == current_char:
            count += 1
        else:
            # Yield the encoded substring
            yield f"{current_char}{count}"
            # Reset for the next character and count
            current_char, count = char, 1
    # Yield the encoded substring for the last character group
    yield f"{current_char}{count}"

def decode(encoded_string):
    decoded_string = []
    character, number = '', ''
    for char in encoded_string:
      # If the character is a digit, append it to the number string
        if char.isdigit():
            number += char
        else:
            # If character and number strings are not empty, append the decoded substring
            if character and number:
                decoded_string.append(character * int(number))
            # Reset the character and number variables for next iteration
            character, number = char, ''
    # Append the decoded substring for the last character group
    if character and number:
        decoded_string.append(character * int(number))
    # Join and return the list of decoded substrings to get the final decoded string
    return ''.join(decoded_string)

In [2]:
# Example of usage 1:
s = "It's __sooooo__ wowww!!!"
encoded = ''.join(encode(s))
print(encoded)

decoded = ''.join(decode(encoded))
print(decoded)

I1t1'1s1 1_2s1o5_2 1w1o1w3!3
It's __sooooo__ wowww!!!


In [3]:
# Example of usage 2
s = "‘‘‘‘‘´´´´´´*******^^^^^"
encoded = ''.join(encode(s))
print(encoded)
decoded = ''.join(decode(encoded))
print(decoded)

‘5´6*7^5
‘‘‘‘‘´´´´´´*******^^^^^


## Problem 2:

Analysing Algorithms (seen below) Complexity

In [4]:
def merge(left, right):
    if len(left) == 0:
        return right
    elif len(right) == 0:
        return left

    result = []
    i = j = 0

    while len(result) < len(left) + len(right):
        if left[i] <= right[j]:
            result.append(left[i])
            i += 1
            if i == len(left):
                result += right[j:]
                break
        else:
            result.append(right[j])
            j += 1
            if j == len(right):
                result += left[i:]
                break

    return result

def sort1(L, start=0, end=None):
    if end is None:
        end = len(L) - 1

    for i in range(start + 1, end + 1):
        key = L[i]
        j = i - 1
        while j >= start and L[j] > key:
            L[j + 1] = L[j]
            j -= 1
        L[j + 1] = key
    return L

def sort2(L):
    n = len(L)
    slice_size = 32

    for i in range(0, n, slice_size):
        sort1(L, i, min((i + slice_size - 1), n - 1))

    size = slice_size
    while size < n:
        for start in range(0, n, size * 2):
            middle_point = start + size - 1
            end = min((start + size * 2 - 1), (n - 1))
            merged_list = merge(
                left=L[start: middle_point + 1],
                right=L[middle_point + 1: end + 1])
            L[start: start + len(merged_list)] = merged_list
        size *= 2
    return L

### A: What is the connection between the two algorithms?

**Algorithm descriptions:**

* `sort1()` is an simple implementation of the sorting algorithm known as Insertion Sort.
* `sort2()` is an simple implementation of Timsort the sorting algorithm built in python with the `list.sort()` and `sorted()` functions where it is known for its practical and optimized use of Insertion Sort and Merge Sort.

The connection between `sort1()` (Insertion Sort) and `sort2()` (Timsort) lies in Timsort's utilization of Insertion Sort for its initial sorting phase. In this phase, Timsort sorts small sublists, or 'runs,' using Insertion Sort, exploiting its efficiency on small or partially sorted data. After the runs are sorted, Timsort employs Merge Sort, a stable, efficient, comparison-based, divide and conquer sorting algorithm, to merge these runs. The combination of Insertion Sort and Merge Sort in Timsort allows it to effectively handle diverse data sets, optimizing for different types of data arrangements and sizes, which addresses some of the inefficiencies of using only Insertion Sort on large, unsorted data sets.

### B: What is the complexity of `sort1` in the best case, in the worst case, and on average? Give details on your analysis to get to the answer.

`sort1()` which implements Insertion Sort, can have varying time complexities depending on the nature of the input. Here is a breakdown:

1. **Best Case Complexity**: O(n)

  The best case occurs when the input list is already sorted. In this case, each element is compared only once with the preceding elements. Since there are nn elements, the best case time complexity is O(n).
  
  **Best Case Scenario:**

  In each iteration of the main loop, the inner while loop will not execute, as the list is already sorted, resulting in a linear time complexity.

2. **Worst Case Complexity**: O(n^2)

  The worst case occurs when the input list is in reverse order. Every element must be compared with every other element, leading to a time complexity of O(n^2).

  **Worst Case Scenario:**

  In the worst case, for every element i, i comparisons and i swaps will be made, resulting in 1 + 2 + 3 + ... + (n-1) = n(n-1)/2 = O(n^2) time complexity.

3. **Average Case Complexity** O(n^2)
  
  On average, the time complexity of Insertion Sort is also O(n^2) since, in the average case, each element is likely to be compared with half of the other elements.

  **Average Case Scenario:**

  The average number of comparisons for each element is roughly n/2, leading to a total of n^2/2 comparisons and swaps. Resulting in a average time complexity of O(n^2).

#### **Conclusion**

`sort1()`, implementing Insertion Sort, is optimal for smaller or partially sorted datasets but faces efficiency issues, having quadratic time complexity, with larger inputs.

### C:  What is the complexity of `sort2` in the best case, in the worst case, and on average? Give details on your analysis to get to the answer.

`sort2()` which implements Timsort, has a unified time complexity regardless of the nature of the input data. Here is a breakdown of the complexities:

1. **Best Case Complexity**: O(n log n)

  The best case is observed when the input list is already sorted. Even in this scenario, Timsort needs to divide the list into runs and merge them, maintaining a time complexity of O(n log n).

2. **Worst Case Complexity**: O(n log n)

  The worst case arises when the input list is in reverse order or is random. Regardless, Timsort remains stable and optimal due to its efficient merge process, maintaining a time complexity of O(n log n).

3. **Average Case Complexity**: O(n log n)
  
  On average or for random input lists, due to its adaptive nature, Timsort generally exhibits a time complexity of (n log n).

#### **Conclusion**

`sort2()`, implementing Timsort, consistently demonstrates a time complexity of O(n log n) in the best, worst, and average cases, thanks to its adaptive and hybrid methodology, making it highly efficient and stable across varied datasets.

### D: Compare the two algorithms and summarise the advantages and disadvantages of using each of the algorithms.

**`sort1()` - Insertion Sort:**

**Advantages:**
* **Simplicity**: Easy to understand and implement.
* **In-Place Sorting**: Requires a constant amount of extra memory space, O(1), making it memory efficient.
* **Efficient on Small Lists**: Especially optimal for small-sized lists or partially sorted lists due to its linear best-case time complexity.

**Disadvantages:**
* **Scalability**: Inefficient on large lists, with a worse case O(n^2) time complexity, making it less scalable for larger datasets.
* **Time Complexity**: Quadratic average time complexity means it is generally outperformed by more advanced algorithms like Merge Sort and Quick Sort on random lists.

**`sort2()` - Timsort:**

**Advantages:**
* **Optimal Merging**: Efficiently handles large datasets by merging sorted subarrays, maintaining a time complexity of O(n log n) in all cases.
* **Adaptive**: Optimally utilizes Insertion Sort for small subarrays and Merge Sort for merging, balancing efficiency and optimality.

**Disadvantages:**
* **Complexity**: More complex to understand and implement compared to simpler sorting algorithms like Insertion Sort.
* **Overhead**: The overhead of advanced mechanisms might make it less optimal for small arrays compared to simpler algorithms.

### **Conclusion:**
`sort1()` is optimal for smaller or partially sorted datasets due to its simplicity and in-place sorting, while `sort2()` is more suited for large and diverse datasets, offering optimal time complexity and stability.