# Lesson 03 — Terminal, Libraries & text encoding

This lesson will have you **building** a tiny Python library to practice how to manage
a project seperated into different folders and files. It will then teach you about text encoding and binary, to end with giving you a coding challenge!

## 0) A bit about python naming conventions...
To make the coming segment a little less confusing we will first learn a little more about how python works.
in the last lesson you might have noticed the strange function name `__init__`. funciton names is usually just lowercase letters with underscores between words, like : `def example_function(input)`.
what __init__ is, is a special functiont (called dunder function for "double underscore", or magic methods because they seem like magic) These are functions that other python functionality uses. Suppose you create a python class. Consider these questions:<br>
- How does python know how you create a new object of the class. It is full of functions, what to run?
- If you wanted to use print("your_object"), what should python put into the output?

To solve these kinds of things, you have the special functions. Lets make an example. Try the code below, and remove """ """ for each function and then re-run the code (first init, then str, then eq):



In [None]:
class Ball:
    
    def __init__(self, colour,size):
        self.col = colour
        self.radius = size/2
        self.diameter = size
    

    """
    def __str__(self):
        return f"{str(self.col)} ball, {self.diameter}cm diameter"
    """
    
    """
    def __eq__(self, other):
        return (self.diameter == other.diameter) and (self.col == other.col )
    """
    
my_ball = Ball("red",3) # without __init__ python doesn't know how to use the class to create an object
my_second_ball = Ball("red",3)
print(my_ball) #without __str__ python can only write a place where an object is stored, not anythin about it.
print(my_ball == my_second_ball) #without __eq__ python can only check if it is the same object, not if they are similar.


*note: - What is a class actually in a technical sense? A class is bunch of functions and variables bundled together. Even though the usage for classes are more of a "template for objects" and more, there isn't technically a lot more than functions and variables. When we start with packages  you can therefore see a lot of likeness in syntax for using packages(bundles of scripts and functionality) and classes(other bundles of scripts and functionality)*

## 1) Terminal movements in the file system

You did this a bit of this when we set up the workspace, lets practice it a bit more.

Two key ideas:
- A **filesystem** is a tree of folders and files.
- Your process (Jupyter kernel or a terminal) has a **current working directory** (CWD).  
- When a path is *relative*, it's interpreted starting from the CWD. *(e.g /subdirectory)*
- When a path is *absolute*, it's interpreted from the top point i the file system "/" *(e.g \\wsl.localhost\Ubuntu-22.04\home\user\siargao-training\subdirectory)*
<br>(this top *"\"* is called root directory, just like the administrator account is called root user(top of permissions), although they are different things.)

Let’s inspect the current working directory using Python.


In [None]:
from pathlib import Path

cwd = Path.cwd()
print("Current working directory:", cwd)
print("\nHere are the entries in this folder:")
for p in sorted(cwd.iterdir()):
    print(" •", ("[DIR]" if p.is_dir() else "[FILE]"), p.name)


### ... and in the Terminal

In a **terminal**, you would run commands like:
```bash
pwd          # print working directory
ls           # list files
cd myfolder  # change directory
mkdir newfoldername    # make a folder
```

**Task:**
1) Move around a bit in your file system using the terminal. You can go upwards in the file system by using `cd ..`, it even works like this `cd ../som_other_folder` which is equal to `cd ..` + `cd some_other_folder`. You can view the content of your current folder with `ls`, or the content of another folder with `ls ..` or `ls some_folder/some_other_folder` etc, just like you would change into that folder with `cd`. If you start an adress with `/`, then you are putting in an absolute path, and if you start with a `folder_name` or `..` you are putting in a relative path.
2) in your training environment, create a new directory(folder) named `binary_text`



## 2) Turning files and folders into packages

We have created a folder named `binary_text` (what an oddly specific name...) and will turn this folder into a package!

Note:
> A **module** is a single `.py` file you can import.
> A **package** is a folder that contains modules.

1) First we will use one of the special names to let python know the folder should be considered a python library. We use this by creating a python file in the folder named __init__.py (one of the special names). The file can be empty (although we might put in some nice-to-haves later)
There are many ways to create a new file, like using your file explorer, the file system bar to the left in Jupyter or different ways in the terminal. For example, you can cd into binary_text and run the comand <br>`touch __init__.py`

2) Then we should also create a file with some code. lets create one called converter.py: `touch converter.py` *(make sure you're in the directory binary_text)*

3) Open the converter.py. You can do it here in jupyter, look for the folder and the file in the file system on the left. You can also open it in your IDE *(Integrated Development Environment - like visual studio, notepad++, etc)* <br>**Paste the following code into the file:**


Let's try it and see if you succeeded: <br>*(note that python will import libraries first time it runs: If you rerun the code without restarting the kernel it will look into its loaded librery and say "already there" and therefore not update changes you might have done in the library. So you need to restart the kernel if you have made changes.)*

In [None]:
from binary_text.converter import test #note that we import like this:  folder.file import function we can make this easier later.
from pathlib import Path

cwd = Path.cwd()

print(f"remember that Jupyter runs from your project folder: {cwd}, \n so python will look for the folder in your current directory (and a few other places)\n")
response = test()
print(response)

## 3) Coding challenge!
Let's practice coding! ...almost, first we have to learn some stuff first. 

### 3.1 the converter.py file - sidetrack learning
We will later create two functions: `def text_to_binary(text):` and `def binary_to_text(binary_str):`  
one takes a string e.g 'Hello' and returns a string of binary e.g '0100100001100101011011000110110001101111'. The other one does the opposite. 
You will make it yourself! But first let's learn a little about how text binary works.

**How strings are represented in the computer**<br>
Binary in mathematical term is just counting, but with a base of 2, instead of 10 (how we usually count). Mathematically it is possible to count with any base.
Some manual examples of counting with different bases:<br>*(a=10, b=11)*
<br>**10 as base with 4 digits (span 0000-9999, total of 10^4 = 10000 different numbers ):**
<br>0000, 0001, 0002, 0003, 0004, 0005, 0006, 0007, 0008, 0009, 0010, 0011, etc

**11 as a base with 4 digits (span 0000-aaaa, total of 11^4 = 14641 different numbers):**
<br>0000, 0001, 0002, 0003, 0004, 0005, 0006, 0007, 0008, 0009, 000a, 0010, etc 

**5 as a base with 4 digits (span 0000-4444, total of 5^4 = 625 different numbers):**
<br>0000, 0001, 0002, 0003, 0004, 0010, 0011, 0012, 0013, 0014, 0020, 0021, etc

**2 as a base with 4 digits (span 0000-1111, total of 2^4 = 16 different numbers):**
<br>0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111

*BTW: this mean you could count to 2^5=32 on one hand if you use your fingers as binary*
<br>Just for fun, lets build a python function that can count with different bases:

In [None]:
digit_anotations = ["0","1","2","3","4","5","6","7","8","9",
                    "a","b","c","d","e","f","g","h","i","j",
                    "k","l","m","n","o","p","q","r","s","t",
                    "u","v","w","x","y","z"]
    

def digits_in_base(n, base=10, width=8):
    """Return the digits of n in the given base, as a string."""
    if n == 0:
        return "0".rjust(width, "0")
    digits = []
    while n > 0:
        digits.append(digit_anotations[(n % base)])
        n //= base
    return "".join(digits[::-1]).rjust(width, "0")  # reverse for normal order

def count_in_base(base=10,max_n=100,min_digits=1):
    n = 0
    count_string = ""
    break_line=1
    while n < max_n:
        count_string = count_string + digits_in_base(n,base,min_digits)+" "

        # this is just to break lines
        if break_line == base*3:
            count_string = count_string + str("\n")
            break_line = 0
        
        n += 1
        break_line += 1
        
    return count_string

print("Short example with 3 as base:\n"+( count_in_base(3, 20, 3) ))

print( "\nFull example of counting binary with with 8 digits.\n"+count_in_base(2, 256, 8) )
print("Note the final number, this was the entire span!")


##

### Okay, so depending on how many digits we have, we get different number of combinations. 
With 8 digits we get 256 different combinations. Why 8? Different ones were tried over time but: 
- Its a power of the base 2 (2^3=8) so it is easy to make memory allocations and designs.
- 256 was many enough to cover the common letters.
<br>So we now map these 256 two-based numbers to our human standard(decimal system) of counting. We can arbitrary decide which byte(8 bits) should represent every letter or symbol we want. **Great!**
<br> ...but there are a lot more symbols we use...
<br> ...and different languages use different symbols...
<br> ...and everyone started creating their own standard...
<br> To solve this UTF (unicode) was created. There are two approaches to creating more symbols.<br>
- One is to use the most common symbols in a byte, and use some bytes to say "next 1/2/3 bytes belong together" to handle less common symbols. That mean you can have different data lengths for different symbols, but you need to use space to tell the computer how to interpret the bytes.
- The second one is to just use longer sequences, instead of using 8 bits you can use 16 or 32. Then almost all symbols can have their own number, but instead there will be no symbols taking less than that number of byte (e.g the first number in a 32-bit system is 000000000000000000000000000000001 instead of 00000001 )
<br> You can check out the UTF-8 mapping here: https://www.charset.org/utf-8


### 3.2 the converter.py file - back to business

<br>We want to create the functions that convert text to strings representing bits, and vice versa. <br>
For this you should create two functions in converter.py:<br> `def text_to_binary(text):` and <br>`def binary_to_text(binary_str):`
<br>

That means that we need to go from string --> encode it so symbols are mapped to some choosen integer --> format that integer to represent it in binary. I will give you some helpful methods and functions:

In [None]:
text = "hello"
encoded_text = text.encode("utf-8")

print (encoded_text) #note that python makes it easier to read while marking that it is bytes with b''
print (type(encoded_text))

print("the integers: ")
for integer in encoded_text:
    print (integer)

format(108,"b") #format with input "b" means binary formatting



Now it should be reasonably doable to create the `text_to_binary` function yourself! The `binary_to_text` will be more challenging as you don't have the steps layed out for you. But you can understand the concepts now, so it will be a good opportunity to practice finding good functions yourself. (or createing them)

**Happy Coding!**