# Scientific Python
Python being used scientifically requires a good understanding of handling data. We will be using two libraries to do this Numpy. Numpy holds numeric arrays, but is written in a language called C (which is faster than Python). C is faster but harder to learn (arguibly), so Python acts as a front end between the efficiently stored arrays. The library comes with many different features for quick mathematical operations across large arrays.

This sheet is to help you understand loading in a dataset and working with numbers.

First of all lets import the libraries:


In [None]:
!pip install matplotlib

In [1]:
import matplotlib.pyplot as plt #so we can visualise easily
import numpy as np #the python maths library 
import pandas as pd
import time
import traceback
import sys
from rich.progress import Progress
from rich import print
from rich.panel import Panel

## Understanding the difference

### Arrays
Run the following code and take a look at the outputs. look at the difference between ```boring_array``` which is a normal array, and ```cool_array``` which is a numpy array. 
If we want to add to th whole array we must loop through a normal array which can take time. A numpy array lets you call the line once and does this efficiently.

In [None]:
#lets make an empty array
boring_array=[0,0,0,0,0,0,0,0,0]
cool_array=np.zeros((9,))

print(boring_array,cool_array)

#now we want to change the first element to 1
boring_array[0]=1
cool_array[0]=1
print("Add one to index:",boring_array,cool_array)

#simple right? WHat if we want to add one to all the items in the array
for i in range(len(boring_array)):
    boring_array[i]+=1
cool_array+=1
print("Add one to add:",boring_array,cool_array)

#now you are starting to see the ease of using numpy... lets look at a better example, what if we want to make sure all items above one are capped to one
for i in range(len(boring_array)):
    if boring_array[i]>1:
        boring_array[i]=1
cool_array[cool_array>1]=1
print("Cap values:",boring_array,cool_array)

### Matrices
Arrays are one thing, but what about matrices?
Look at the difference again, numpy is much simpler

In [None]:
#we have already established that numpy is more efficient at doing tasks, lets look at matrices
boring_array=[[1 for i in range(5)] for i in range(5)]
cool_array=np.ones((5,5))
print(boring_array,"\n\n",cool_array)

#gathering elements is exactly the same
print("Items:",boring_array[0][0],cool_array[0][0])
#or gathhering rows
print("Rows:",boring_array[0],cool_array[0])
#what if we gather coloumns
print("Columns:", [boring_array[i][0] for i in range(5)],cool_array[:,0])

#When it gets to more complex operations.... then it gets more efficient to use numpy
#We can also generate random arrays

random_matrix=np.random.random((2,2)) #random decimals
print("Random:",random_matrix)

mean=0
std=0.3
gaussian_matrix=np.random.normal(mean,std,(2,2)) #gaussian (normal distribution)
print("Gaussian:",gaussian_matrix)

## Showing the data

Matplotlib is the plotting library for making graphs and showing images. It is widely used in the field. The graph you use will depend on your data, here are some examples

In [None]:
coordinates=np.random.random((50,2)) #stored in format [(x,y),(x,y)]
plt.scatter(coordinates[:,0],coordinates[:,1]) #grab x column and y column
plt.title("This is the title")
plt.xlabel("X coordinate")
plt.ylabel("Y coordinate")
plt.show()

#we can also show images
image=np.random.random((50,50)) #random image of 50 x 50 pixels
plt.imshow(image)
plt.title("This is the title")
plt.show()

#labelling coordinates is another feature
data=np.random.random((100,3)) #3 classes, 100 data points
plt.plot(data,label=["class 1","class 2","class 3"])
plt.legend(loc="upper right")
plt.show()

## Task 1:

### a:
    Make a numpy array of ones of size 5 by 6

### b:
    add 2 to all positions

In [18]:
array = ...


In [None]:
#Run this to check your code
data = {
    "array":np.ones((5,6))+2
}

def success_panel(msg: str, title: str) -> None:
    print(Panel(msg, title=title, border_style="green"))

def problem_panel(msg: str, title: str) -> None:
    print(Panel(msg, title=title, border_style="red"))


with Progress() as progress:
    assert_task = progress.add_task("[green]Checking solution...", total=len(data.keys()))
    failed = 0
    for key in data.keys():
        value = data[key]
        try:
            assert(globals()[key].shape == value.shape and np.sum(value)==np.sum(globals()[key]))
            success_panel(f"Congratulations! \"{key}\" is equal to the expected value of {value}", title="Data verified")
        except KeyError as e:
            failed += 1
            problem_panel(f"\"{key}\" isn't defined...", "Undefined variable")
        except AssertionError as e:
            failed += 1
            problem_panel(f"Oops! It looks like \"{key}\" isn't quite right. Double check your logic, make some changes, and re-run!", "Data error")
        progress.update(assert_task, advance=1)
    
    if failed > 0:
        problem_panel(f"Oops! {failed} of {len(data.keys())} things aren't quite right\nDouble check your logic, make some changes, and re-run!", "")
    else:
        success_panel("Congratulations! Everything seems all good :) Feel free to move onto the next exercise!", "Exercise complete!")

## Datatypes
Another thing to be aware of is datatypes with numpy. Making sure that your numpy array is of the correct datatype, and memory size. 

In computer science it is important to be aware of memory size. In binary, 8-bits (also known as a byte) can contain any number from 0 to 255. If you are working with images then thats perfect as pixels are any value between 0-255. If you are working with sensor measurements such as [0.5,0.334,0.345,0.7676,...] then this is not very good. We will need the data to be a float. 

Numpy allows you to specify this, but remember, the larger the value, the more memory.

ANother thing to be aware of is signed and unsigned binary. Signed means that for 8-bits, one of the values represents if the number is negative. If you are only working with positive you will want to have an array that is unsigned.

```
signed=np.array([1,2,3],dtype=np.int8)
unsigned=np.array([1,2,3],dtype=np.uint8)
```

Here are some examples:

In [36]:
bit_8 = np.zeros((1,1),dtype=np.uint8)
# if we add a number that cannot be represented by 8, it changes itself to 16
print("8 bit number",bit_8,":",bit_8.dtype,"\n8 bit number",bit_8+1444,":",(bit_8+1444).dtype)


# but if we now start big and make it small, the array will stay big
bit_16 = np.zeros((1,1),dtype=np.uint16)
print("16 bit number",bit_16,":",bit_16.dtype,"\n16 bit number",bit_16+1444,":",(bit_16+1444).dtype)


# and now lets make it negative
bit_16 = np.zeros((1,1),dtype=np.uint16)
print("16 bit number",bit_16,":",bit_16.dtype,"\n16 bit number",bit_16-1000,":",(bit_16-1000).dtype)

# as you can see the number is really large, not negative. THis is because we need to manually set the type
bit_16 = np.zeros((1,1),dtype=np.int16)
print("16 bit number",bit_16,":",bit_16.dtype,"\n16 bit number",bit_16-1000,":",(bit_16+1444).dtype)

#finally lets look at floats
float32 = np.zeros((1,1),dtype=np.float32)
# if we add a number that cannot be represented by 8, it changes itself to 16
print("float32 bit number",float32,":",float32.dtype,"\nfloat32 bit number",float32+1444,":",(float32+1444).dtype)

It is important to make sure that our array has a dtype that will hold our data. As you can see, quite often the array corrects itself to the right type, but sometimes this can not be the case. Sometimes you may come into errors where your dataset is taking up too much space, because you are loading in images as unit64 (64 bits) when you only need uint8 (8 bits). Another error is your image looks weird because you have a signed int instead of uint. This is something worth being aware of for debugging reasons. 