# Course 22 - Improvisation

This course will be mix of topic and TDs

## Reminder on Reccursion
Give me the two ideas behind a recurvise function

### TD - Binary search

Binary search is an algorithm for finding a specific `value` in a sorted list or array. It works by repeatedly dividing the search space in half. Here's how it works:
* Start with two boundaries, `low` and `high`, representing the range of elements to search
* Then calculate the `middle` index ; use the `//` operator for the integer division
* Compare the middle element with the target value:
  * If the `middle` element is equal to the target, the search is successful, and the index is returned
  * If the `middle` element is greater than the target, update `high = mid - 1` to search the left half
  * If the `middle` element is less than the target, update `low = mid + 1` to search the right half.
* Repeat until the `value` is found ; or not...

> Note: Binary search has a time complexity of O(log n), making it much more efficient than linear search for large datasets.

In [None]:
def binary_search(arr, target, low, high):
    if low > high:
        return -1  # Target not found
    middle = (low + high) // 2
    if arr[middle] == target:
        return middle  # Target found at index middle
    elif arr[middle] > target:
        return binary_search(arr, target, low, middle - 1)  # Search left half
    else:
        return binary_search(arr, target, middle + 1, high) 

sorted_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
target_value = 7
result = binary_search(sorted_list, target_value, 0, len(sorted_list) - 1)
print(f"Index of {target_value} in the list: {result}")

# Introduction on Bit, Bytes, and Hexa

## Bits
Are the basic units of digital information, representing binary digits (0 or 1).

They represent the two possible states, 0 and 1, which align with the electrical states of off and on in electronic systems.

Computers process and store data in the form of bits. Memory addresses, CPU instructions, and data are all ultimately represented using bits.

## Bytes
Are groups of 8 bits, forming the fundamental unit of storage in computers.

This grouping allows for a more convenient representation of data. Many computer architectures and programming languages are designed to handle data in byte-sized chunks.

Bytes are commonly used to represent characters in character encoding systems like ASCII and UTF-8.

## Hexadecimal Notation
Is a base-16 system used to represent binary data more compactly.

Hexadecimal is often used to represent memory addresses. It is also used to represent colors in web development and graphic design.

> Note: Python is able to manipulate each of these units

In [None]:
decimal_number = 1024

print(f"Binary: {bin(decimal_number)}")
print(f"Decimal: {decimal_number}")
print(f"Hexadecimal: {hex(decimal_number)}")

# ASCII (American Standard Code for Information Interchange)
ASCII was developed in the early 1960s to establish a standardized way to represent characters and control characters in computers, ensuring compatibility and interchangeability of data between different systems.

In the early days of computing, each computer manufacturer had its own character set, leading to compatibility issues when sharing data between systems.

ASCII originally defined a **7-bit** character set, with 128 possible characters. These characters include uppercase and lowercase letters, numerals, punctuation marks, and control characters.

Even though more advanced character encodings like UTF-8 are widely used today, ASCII remains relevant in many contexts, especially in basic text processing and communication protocols.

## Exercises
### 1. Alphabet

Using above functions write a function to display the ASCII table in the format:
```
Character: A 	 Decimal: 65 	 Binary: 0b1000001 	 Hex: 0x41
```
>Hint: the [`chr()` function](https://docs.python.org/3/library/functions.html#chr) cast a decimal into its character representation ; `ord()` does the opposite.

### 2. Full ASCII table
Do the same starting from 0 ; what do you notice ?


In [None]:
# CODE HERE
for i in range(0,129):
   
    print(f"{chr(i)} Binary: {bin(i)}")


# UTF-8 - Unicode Transformation Format - 8-bit

Unicode is a standardized character set that assigns a unique code point (an integer value) to every character, symbol, and emoji from every script and language in the world.

The Unicode Consortium is the organization responsible for deciding and assigning the codes for Unicode.

UTF-8 was designed by [Ken Thompson](https://en.wikipedia.org/wiki/Ken_Thompson) and [Rob Pike](https://en.wikipedia.org/wiki/Rob_Pike) at Bell Labs in 1992. The specification was later published by the Internet Engineering Task Force (IETF) as RFC 3629 in 2003.

The same team later worked together at Google to create the [golang language](https://go.dev/).

UTF-8 was created to address the limitations of existing character encodings, such as ASCII. It aimed to provide a flexible and efficient way to represent the vast array of characters from different writing systems and to maintain compatibility with ASCII.



Unicode is a character set, and UTF-8 is one of the encoding schemes used to represent Unicode characters in binary form.


## Key features
* Variable-Length Encoding: UTF-8 uses a variable-length encoding scheme, where different characters may be represented using 1 to 4 bytes. Commonly used characters are represented with fewer bytes, ensuring backward compatibility with ASCII.

* Compatibility with ASCII: UTF-8 is designed to be fully compatible with ASCII. The first 128 characters (0 to 127) in UTF-8 are identical to ASCII, making it easy to represent and process ASCII text using UTF-8.

* Unicode Support: UTF-8 can represent all Unicode characters, making it a comprehensive and widely adopted character encoding standard.

## Encoding

UTF-8 encodes code points in one to four bytes, depending on the value of the code point. In the following table, the x characters are replaced by the bits of the code point:


| First code point | Last code point    | Byte 1    | Byte 2    | Byte 3    | Byte 4    |
|------------------|--------------------|-----------|-----------|-----------|-----------|
| U+0000           | U+007F             | 0xxxxxxx  |           |           |           |
| U+0080           | U+07FF             | 110xxxxx  | 10xxxxxx  |           |           |
| U+0800           | U+FFFF             | 1110xxxx  | 10xxxxxx  | 10xxxxxx  |           |
| U+10000          | U+10FFFF           | 11110xxx  | 10xxxxxx  | 10xxxxxx  | 10xxxxxx  |


## Exercises
### 1. Possible characters
Reading at the above table, what is in theory to number of possible character values in UTF-8

> Hint: How many bits can change ?

### 2. : UTF-8 notation to print

Let's say you have a `unicode_value = "U+1F604"` write a function to extract the hexadecimal value and print the character.

> Hint: you cannot cast a string to hexa decimal, you have to use an `int` and parse it in the corresponding base.

In [1]:
unicode_value = "U+1F604"
print(chr(int(unicode_value.split("+")[1],16)))

print(chr(int(unicode_value[2:],16)))




😄
😄


### 3. Print all available emojis
From [official Unicode page](https://unicode.org/emoji/charts/full-emoji-list.html) get the code for the first and last emojis.

Using previous function or piece of code print all available emojis

In [None]:

a=int("1F600",16)
b=int("1F64D",16)
for i in range(a,b):
    print(chr((i))
    