# Elements Of Data Processing - Python Revision

### Getting Started with Jupyter Notebook
Jupyter notebook is an extremely useful tool for developing and presenting projects (particularly in python).  You can include code segments and view their output directly in your browser.  You can also add rich text, visualisations, equations and more.

The difference between this and Grok (from COMP10001) is that you can run your code line by line (without having to run all of your code at once for an output).

### Cells
Jupyter notebook contains two main types of cells:
- Markdown cells: These can be used to contain text, equations and other non-code items.  The cell that you're reading right now is a markdown cell.  You can use [Markdown](https://www.markdownguide.org/) to format your text.  If you prefer, you can also format your text using <b>HTML</b>.  Clicking the **Run** button will format and display your text.
- Code cells: These contain code segments that can be executed individually.  When executed, the output of the code will be displayed below the code cell.

### General Tips for Jupyter Notebook
Cell shortcuts:
- `shift + enter` : Run current cell
- `ctrl + enter` : Run selected cells

Command mode (press `esc` to enter):
- Enter command mode pressing `esc` (blue highlight)
- `a` to create a cell **above**
- `b` to create a cell **below**
- `dd` (double d) to **delete** a cell
- `m` to make the cell render in **markdown**
- `r` to make the cell render in **raw** text
- `y` to make the cell render python code
- `enter` to "edit" the cell

Code Shortcuts:
- `shift + tab` : brings function/method arguments up

Magic Cells:
```bash
%%time # times the cell execution
%%bash # allows bash commands (or cmd) to be run
%%html # renders html syntax
%%writefile script.py # outputs the lines of code into a script of choice
%run script.py # runs a script of choice
%run -i script.py # runs a script of choice and adds all variables to the notebooks' namespace
```

# Revision

## Variables and Printing
Try running the code segments below and verify that the output is correct.

In [2]:
message = "Hello World!"
print(message)

Hello World!


Variables are retained between code segments.  You can, for example, refer the message variable created in the code segment above

In [4]:
print("The COMP20008 team wishes to say: " + message)

The COMP20008 team wishes to say: Hello World!


By default, Jupyter Notebok will *display* the output of a variable if there is nothing else afterwards. If you seperate it with commas, it will output a *tuple* of results

In [3]:
message, message + " Wow this is cool :o"

('Hello World!', 'Hello World! Wow this is cool :o')

Try adding your own code cell below using the keyboard shortcut (`esc + b` or `esc + a`) and use it to print a different message.  

## Data Types
You should be familiar with `int()`, `float()`, `bool()`, and `str()`

In [None]:
# empty data structures and 0 = False
bool([]), bool(""), bool(0)

In [None]:
# non-empty data structures and any non-zero number = True
bool(["value 1"]), bool("non empty string"), bool(1), bool(-1)

## Data Structures
You should be familiar with `list()`, `set()`, `dict()`, and `tuple()` (immutable).

Additionally, you should be familiar with indexing / slicing them.

In [None]:
list("abcd")

In [None]:
set("aaaab")

In [None]:
{"key1": "value1", "key2": "value2"}

In [None]:
tuple("abcd")

In [None]:
s = "this is a string"
s[0], s[-1], s[:5], s[5:], s[::2]

## Loops
- `for` and `while` loops
- List comprehension

In [None]:
for i in range(5):
    print(i)

In [None]:
lst = list("abcd")
for idx, val in enumerate(lst):
    print(f"index {idx} = {val}")

In [None]:
# no if statement
print([(i, j) for i, j in enumerate(lst)])

# if statement
MIN = 1
print([(i, j) for i, j in enumerate(lst) if i > MIN])

# if else statement
print([(i, j) if i > MIN else (-1, "INVALID") for i, j in enumerate(lst)])

In [None]:
i = 0
while i < 5:
    print(i)
    i += 1

In [None]:
i = 1
while True:
    print(i)
    if i % 10 == 0:
        break
    i += 1

## Functions 
- You should be familiar with writing functions, as well as advanced functions with multiple returns.
- It's good to also be able to use `lambda` functions as they will be useful for the project

In [None]:
def f(x):
    return x + 1

f(5), f(7), f(True) # recall bool can be treated as an int

In [None]:
def g(x=1, y=1):
    return x * y

g(5), g(y=5), g(10, 10), g() # default arguments

Lambda functions are essentially "one line" versions of your normal functions. Syntax:  

`function name = lambda input_arguments: statement`

In [None]:
# lambda function of f(x)
f_lambda = lambda x: x + 1

f_lambda(5), f_lambda(7), f_lambda(True)

In [None]:
# lambda function of g(x)
g_lambda = lambda x=1, y=1: x * y

g_lambda(5), g_lambda(y=5), g_lambda(10, 10), g_lambda() 

In [None]:
# advanced lambda function which returns based on conditions
advanced = lambda x: float('inf') if x > 0 else 0 if x == 0 else float('-inf')

advanced(100), advanced(0), advanced(-1)

## File I/O
- In COMP10001, you would have learned about File I/O (notably using `open()` and the `csv` library)
- For COMP20008, you will need to know how to read file names in (and be a bit more familiar with file paths)

There are two main ways of opening/running/saving python scripts in this subject.

In [None]:
# os is the python library for interacting with the operating system
import os

# os.remove("super_cool.py")

# current working directory (note I am using WSL2 - a linux environment)
# print(os.getcwd())

# list files in current directory
for i in os.listdir():
    print(i)


Let's write a script and save it using magic cells.

In [None]:
%%writefile super_cool.py
variable = "This is super cool"
print(variable)

We can see it has now appeared in the directory.

In [None]:
print(os.listdir())

Let's run the file! Depending on your OS version, you will either have `python` or `python3`.

In [None]:
# method 1a (with python)
print(os.popen('python super_cool.py').read())

# method 1b (with python3)
print(os.popen('python3 super_cool.py').read())

In [None]:
# method 2 (preferred)
%run super_cool.py

Say we are looking to get the `hidden_test` variable from a script in the same directory...

In [None]:
hidden_test

In [None]:
%run -i saved_variables.py # the -i indicates we want to save variables into our namespace
hidden_test

Finally, there are some bash (linux) commands you should get familiar with (especially for the server).
- `ls` to list out files in the current directory
- `pwd` for the present working directory 
- `cd` to change directory (use `\` to denote spaces, i.e `cd USER/My\ Documents` for "My Documents")

There are two main ways of running bash commands straight from the notebook:
- `%%bash` for one or more commands in succession
- `!` for a single command

In [None]:
%%bash
pwd
ls

In [None]:
!ls

In [None]:
%%bash
cd .. # go to parent folder
pwd
ls

# Exercises

### Exercise 1
Write a Python program that will print the first $n$ numbers ($n > 0$) of the Fibonacci sequence in **reverse** order.  Verify it works for $n=10$

### Exercise 2
The following two exercises are about getting used to running python through a shell. The program for these exercises is a simple hello world program.
1. Open a new terminal using the plus button above the file browser or File → New Launcher.
2. Use `ls` to show the contents of the folder you're in.
3. Use `cd workshops` to change into the workshops folder, followed by `cd revision` to change to the current folder.
4. Run `python exercise2.py` to execute the script `exercise2.py` in python. Your output should be the same as the output of the next cell. If this does not display anything, try `python3` (as yo"u may have different installations of python on your machine)

### Exercise 3
Take your code from Exercise 1 and store it in a new file, `exercise3.py`. Run this similarly to Exercise 2. You can create a new file through the launcher by clicking on the New Text File button, or by running `touch exercise2.py` in the terminal, or by using the cell magic. 

Your output should be the same as Exercise 1. Remember that files are case sensitive, so `Exercise3.py` is not the same as `exercise3.py`.

## Recommended Readings
[This article on Dataquest](https://www.dataquest.io/blog/jupyter-notebook-tutorial/) is an excellent introduction to Jupyter notebook.  If you haven't used Jupyter notebook before, I recommend familiarising yourself with it.