# Pandas (and a bit of numPy) introduction Workshop

DISCLAIMER: I am normally a C# developer and my style of coding might not always be following the "pythonic" way. I am no fan of snake case and prefer camelCase for functions and variables and PascalCase for Classes. Even if this might shine trough in this document, you can write the code however you like. If you want to learn Python the "pythonic" way, then use snake case for variables and functions.

This workbook is created for Python 3.6.x and above. Some of the language basics shown here is not available for earlier versions of Python.

 - Python documentation: [docs.python.org](docs.python.org).
 - Jupyter documentation: [jupyter-notebook.readthedocs.io](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html).
 - NumPy documentation: [docs.scipy.org](https://docs.scipy.org/doc/numpy/user/index.html)
 - Pandas documentation [pandas.pydata.org](https://pandas.pydata.org/pandas-docs/stable/)


## Jupyter Notebooks crash course

This is a Jupyter notebook. Opening this notebook is a proof that you have read the initial `README` file for this workshop. Congratulations!

A Jupyter Notebook is a interavtive REPL (read–eval–print loop) tool. You can think of it as a "advanced" markdown editor with a built-in coding possibility. 
There are two terms that might be new if you have never used or seen a Jupyter Notebook before.

 - `Kernel` - A "process" that executes the code that are written in a notebook. You can look on it as a runtime. There are kernels for multiple languages. Python, R, Julia, C#, Java are some of them.
 - `Cell` - A notebook is split up in multiple cells. A cell is a container for ether text or code.
 
 <img src="img/Notebook-interface.png" alt="Drawing" style="width: 600px;"/>
 


 


### Cells

Cells are the main components in a workbook. There are 3 cell types, but we will only cover two of them. The `Raw` cell type will note be used.

The two cell types we will be using is:

 - `Code cell` - This cell contains the code that will be executed
 - `Markdown cell` - This cell contains markdown that will be rendered when executed.

When you start a notebook there will always be one cell ready. This cell is a `code cell`. 
All new cells that are created will be a `code cell` by default. 
You have to change it to a `markdown cell` if you want to write markdown instead.

![Notebook menu](img/Notebook-menu.png)

To change it to a markdown cell you just have to press `m` when you have the cell selected. (Not when the cursor is inside the cell).
To change it back to a code cell you do the same and press `y` instead. 
You can also do this by using the dropdown at the top of the notebook indicating the cell type. (As marked in orange in the image)

To execute a cell you can press `Shift-Enter`. This executes a cell and creates a new code cell after that cell if there is no other cell to advance to.
If you press `Ctrl-Enter` instead you execute a cell, but do not advance to the next cell or create a new cell.
You can also execute or stop the execution of a cell by using the start and stop buttons highlighted in yellow and green in the image above.

The results of a cell execution will be printed beneath the cell if the cell is a code cell. If it is a markdown cell then the resulting html of the markdown will be shown in the cell instead.

To quickly edit a markdown cell after it has been rendered, you can just press `Enter`.



### Exercise 1 - Testing out Jupyter

So now that you have recieved a crash course in the use of Jupyter Notebooks. Let us give it a try!

First exercercise is to write a python snippet that prints hello world. The expected result is the words "Hello world" printed beneath the cell.




In [3]:
# Exercise 1 - Hello jupyter world!
#
# Write the code below




### Scope

Great! You hav now run your first cell in a Jupyter Notebook! 

The code in each code cell belongs to the same scope.
So this means that variables, functions or classes declared in one cell can be used in another cell. (So imports as well)
Just remember to run the cell firsts. If a cell has not been run, the code it contains are not run and any variables, functions or classes are not declared.

So let us test this out as well.

**NOTE:** If a variable, function or class has been declared in a cell, and the cell has been executes. Than the variable will exist as long as the workbook are runnuing.
Even if you clear out the cell and run the cell again.



### Exercise 2 - Multiple cells

Create string variable containing "Hello world" in one cell and print the content of that string in another cell

In [8]:
# Exercise 2 - Create your variable here




In [7]:
# Exercise 2 - Print your variable to the console here




## NumPy

NumPy is a Python package that are mainly used for scientific computing. It has a near C like performance. [[ref](https://www.ibm.com/developerworks/community/blogs/jfp/entry/A_Comparison_Of_C_Julia_Python_Numba_Cython_Scipy_and_BLAS_on_LU_Factorization?lang=en)] It is used to process arrays, multidimensional arrays and to perform mathematical operations on these arrays.

NumPy’s array class is called `ndarray`. It is also known by the alias `array`. Note that `numpy.array` is not the same as the Standard Python Library class `array.array`, which we covered in the previous workshop and is not really recomended to use. (It was very limited in what it could do.)

So let us see some of what NumPy can do:

In [30]:
import numpy as np

# Create a array with range from 0 to 15. (Very much same as range(), just a ndarray instead of an sequence)
array: np.ndarray = np.arange(15)
print(array)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]


In [31]:
# Reshape the array into a multidimensional array
multidim: np.ndarray = array.reshape(3,5)
print(multidim)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


In [32]:
# You can also create a array by using the np.array() method
array: np.ndarray = np.array([1,2,3,4])
print(array)

[1 2 3 4]


In [33]:
# create empty arrays using np.zeroes()
array: np.ndarray = np.zeros((3,4))
print(array)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


In [34]:
# Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.
a: np.ndarray = np.array([20,30,40,50])
b: np.ndarray = np.array([0,1,2,3])
print(a-b)
print(a*b)

[20 29 38 47]
[  0  30  80 150]


In [35]:
# NOTE: The product operator * operates elementwise in NumPy arrays. 
#       The matrix product can be performed using the @ operator (in python >=3.5) 
#       or the dot function or method

print(a @ b)
print(a.dot(b))

260
260


In [36]:
# Flatten a multidimensional array back to a linear one
flatarray: np.ndarray = array.ravel();
print(flatarray)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [39]:
# NumPy array indexing is identical to Python's indexing scheme
print(multidim[0])
print(multidim[0][2])

[0 1 2 3 4]
2


In [55]:
# Python's concept of lists slicing is extended to NumPy.

print(multidim[::-1]) # Reverse the rows
print(multidim[::2])  # Every second row
print(multidim[::,::2]) # Every second column
print(multidim[0:2, 0:2]) # First two rows and the first two columns

[[10 11 12 13 14]
 [ 5  6  7  8  9]
 [ 0  1  2  3  4]]
[[ 0  1  2  3  4]
 [10 11 12 13 14]]
[[ 0  2  4]
 [ 5  7  9]
 [10 12 14]]
[[0 1]
 [5 6]]


In [56]:
# ndarray usefull attributes

print(multidim.shape) # Returns a tuple consisting of array dimensions
print(multidim.ndim) # Attribute returns the number of array dimensions
print(multidim.itemsize) # Returns the length of each element of array in bytes

(3, 5)
2
8


### Exercise 3 - TODO:

Create a exercise for numpy!


In [9]:
# RGB to greyscale image using numpy
# Formulae: New grayscale image = ( (0.3 * R) + (0.59 * G) + (0.11 * B) ). (Weighted method or luminosity method)