# Python Introduction

If you are new to Python, before you start 2.3 (*pandas: Python Data Analysis Library*), it will be useful to go thorugh this notebook to understand basic concepts about Python programming language. For this chapter we will mainly use Pandas and some plotting libraries. Even though these libraries will be explained in greater detail in the book and in the other notebooks, this notebook will try to cover some basic definitions that will be used in the chapter. 

To run a cell (markdown or code) just press <kbd>Shift</kbd> + <kbd>Return</kbd> or, the run botton above. 

## Finding help in Jupyter Notebook

Even though you are experienced using Python, sometimes you need a reminder on what a function can do or what type of parameters you can pass to the funciton. For these, with the Jupyter Notebook you have a few options whenever this happens: For example, let us imagine you want to perform a mean operation using NumPy. You forgot which parameters you can use with this funciton and you need some quick help to remind you:

[ 1 ] Whenever you do not know what a function does or you want to know more about the parameters you can use, write ``np.mean?``.

[ 2 ] Another option is to use ``help(np.mean)`` 

[ 3 ] If you do ``np.mean()`` <kbd>Shift</kbd> + <kbd>Tab</kbd>, you will get some information on the function you want. 

## Learn Markdown

As you have already realized, you can have cells with code or also with Markdown. It is a simple language that allows you to write HTML in an easier way. It also allows you to include mathematical equations by using latex, for example: $x = \sum_{i=0}e^{x}$

If you are interested in learning Markdown, have a look at their [guide-documentation](https://markdown-guide.readthedocs.io/en/latest/index.html). Jupyter Notebook also has some [documentation](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html) on how to use the Markdown cells. 

## Simple python data structures

In this chapter we will mainly focus on pandas, bokeh and holoviews packages. However, we will be using a basic library, [NumPy](https://numpy.org). But before we give a brief introduction to these libraries, we will first learn some basic data structures from Python that we will be using during the entire chapter. 

### Lists

Lists are a collection which can be ordered and changable. It allows duplicates in its members. 

In [2]:
# A List of integers
my_list = [0,1,2,3]
my_list

[0, 1, 2, 3]

In [3]:
# A list of strings
list_strings = ['a', 'b', 'c', 'abc']
list_strings

['a', 'b', 'c', 'abc']

In [4]:
# Accessing items in a list
print('First value from my_list: ', my_list[0])
print('Last value from list_strings: ', list_strings[-1])
print('First and second values from my_list: ', my_list[1:3])

First value from my_list:  0
Last value from list_strings:  abc
First and second values from my_list:  [1, 2]


Note that when indexing a list, the last element is not taken into account: $[0,1,2,...,len(n)-1]$

In [5]:
# List of lists
list_of_lists = [[1,2,3,4,5], ['a','b','c','d','e']]
list_of_lists

[[1, 2, 3, 4, 5], ['a', 'b', 'c', 'd', 'e']]

In [6]:
# Basic operations you can do with list indexes
print('sum integers: ', my_list[0]+my_list[1])
print('sum strings: ', list_strings[0]+list_strings[1])
print('multiply: ', my_list[1]*my_list[2])
print('Divide: ', my_list[1]/my_list[3])

sum integers:  1
sum strings:  ab
multiply:  2
Divide:  0.3333333333333333


But be careful, operations between lists will result in either joining lists or also in increasing its length.

In [7]:
print('sum of lists (all integers): ', my_list+my_list)
print('sum of lists (integers and strings): ', my_list+list_strings)
print('multiplication of list and an integer: ', my_list*3)

sum of lists (all integers):  [0, 1, 2, 3, 0, 1, 2, 3]
sum of lists (integers and strings):  [0, 1, 2, 3, 'a', 'b', 'c', 'abc']
multiplication of list and an integer:  [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]


The following lines are examples of built-in methods you can use on lists.

In [8]:
print('Append: ')
list_strings.append(my_list) # It will append my_list at the end of list_strings
print(list_strings)

print('Count: ')
my_other_list = [1,1,1,1,1,1,2,2,3,3]
print(my_list.count(1)) # Number of times the number "1" appears (in this case)

print('Sort: ')
not_ordered_list = [4,8,9,1,2,6,7,8,3]
not_ordered_list.sort()
print(not_ordered_list) # It will by default order in increasing order

Append: 
['a', 'b', 'c', 'abc', [0, 1, 2, 3]]
Count: 
1
Sort: 
[1, 2, 3, 4, 6, 7, 8, 8, 9]


### Dictionaries

A Dictionary is a collection which is unordered, changable and can be indexed. As you will see, a dictionary is written using: 

``dictionary = {keys: values, keys:, values, ...}``. 

In [9]:
# A Dictionary
my_dictionary = {'A': [1,2,3], 'B': [4,5,6], 'C': [7,8,9]}
my_dictionary

{'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}

In [10]:
# Which are the keys? 
my_dictionary.keys()

dict_keys(['A', 'B', 'C'])

In [11]:
# Which are the values? 
my_dictionary.values()

dict_values([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

Accessing items in a dictionary is done using the keys, not the values. 

In [12]:
# Accesing items in a dictionary 
my_dictionary['A']

[1, 2, 3]

# NumPy

NumPy is an open-source, fast and versatile Python library for linear algebra and array computing. Allows N-Dimensional arrays and offers many mathematical functions, random number generators, Fourier transforms and much more. The core of NumPy is well-optimized C code which makes it a fast library but with the flexibility that the Python language can offer. 

The only thing you need to use NumPy is Python itself. We will not go through all the NumPy documentation but we will try to cover some basics that will be needed later in the chapter. For a [quick start tutorial](https://numpy.org/doc/stable/user/quickstart.html), visit their webpage. 

Firts, we need to import NumPy and as a convention, it is imported with the name ``np``. So everytime you want to call a NumPy function, you need to add the ``np.`` at the beginning. 

In [13]:
import numpy as np

We will start by their main object, an ``np.ndarray`` . It is usually called, an array. 

In [14]:
# Initialize an ndarray

my_array = np.array([0,1,2,3])
my_array

array([0, 1, 2, 3])

In [15]:
type(my_array)

numpy.ndarray

There are also many other ways to create a NumPy array, like the following examples

In [16]:
# Creating an array of zeros
print(np.zeros(10))
type(np.zeros(10))

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


numpy.ndarray

In [17]:
# You can also make a 2D array
print(np.ones((10,2)))
type(np.ones((10,2)))

[[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]


numpy.ndarray

In [18]:
# Using a range of numbers: start, stop, step
np.arange(0,10,1)  

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [19]:
# Similar to the arange but we have now start, stop, number of values
np.linspace(0,10,20)  

array([ 0.        ,  0.52631579,  1.05263158,  1.57894737,  2.10526316,
        2.63157895,  3.15789474,  3.68421053,  4.21052632,  4.73684211,
        5.26315789,  5.78947368,  6.31578947,  6.84210526,  7.36842105,
        7.89473684,  8.42105263,  8.94736842,  9.47368421, 10.        ])

In [20]:
# Create an array taking samples from a uniform distribution
np.random.uniform(0,10,60) # From 0 to 10, take 60 samples

array([8.51162174, 9.64012813, 9.62155149, 0.06804477, 2.26183804,
       7.37773256, 0.20951853, 6.06025398, 0.62372821, 3.36900513,
       3.6152459 , 3.54419587, 2.35053404, 7.49261995, 7.77248497,
       9.01648134, 4.73641902, 4.81803255, 8.06503233, 9.62514529,
       8.45613343, 9.15220023, 9.52745131, 6.57160712, 7.28078441,
       3.13530863, 6.09100729, 5.25421489, 6.22259424, 7.85432689,
       3.15767569, 8.22057808, 7.90560314, 9.86956987, 1.06715329,
       0.62612211, 3.35540178, 2.22279988, 5.85551995, 8.66789785,
       6.75524757, 9.79180213, 2.05200461, 5.86123738, 8.75490599,
       1.6229012 , 3.51030221, 1.15574745, 7.57361952, 7.61480277,
       6.00715618, 8.79875871, 9.38236272, 8.76618187, 3.63016515,
       7.99986727, 4.43430465, 5.42904479, 6.4596398 , 5.38125066])

In [21]:
# Create an array taking random intefers from a discrete uniform distribution
np.random.randint(0,10,60) # From 0 to 10, take 60 samples

array([4, 7, 6, 4, 6, 5, 0, 3, 1, 9, 1, 1, 9, 2, 6, 2, 5, 7, 3, 6, 6, 9,
       4, 4, 0, 1, 3, 5, 6, 4, 3, 3, 1, 3, 1, 4, 8, 6, 0, 3, 0, 2, 5, 5,
       4, 7, 0, 0, 5, 7, 0, 5, 4, 2, 4, 7, 7, 7, 2, 9])

You can also perform some basic operations, like for example:

In [22]:
my_2d_array = np.array([[0,1,2,3],[4,5,6,7],[8,9,10,11]])
print('My 2D Array: ')
print(my_2d_array)
print('Mean: ', np.mean(my_2d_array))
print('Mean only using the columns : ', np.mean(my_2d_array, axis=0))
print('Mean only using the rows : ', np.mean(my_2d_array, axis=1))
print('Sine: ', np.sin(my_2d_array))
print('Cosine: ', np.cos(my_2d_array))

My 2D Array: 
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Mean:  5.5
Mean only using the columns :  [4. 5. 6. 7.]
Mean only using the rows :  [1.5 5.5 9.5]
Sine:  [[ 0.          0.84147098  0.90929743  0.14112001]
 [-0.7568025  -0.95892427 -0.2794155   0.6569866 ]
 [ 0.98935825  0.41211849 -0.54402111 -0.99999021]]
Cosine:  [[ 1.          0.54030231 -0.41614684 -0.9899925 ]
 [-0.65364362  0.28366219  0.96017029  0.75390225]
 [-0.14550003 -0.91113026 -0.83907153  0.0044257 ]]


Accessing elemtns in a NumPy array is using indexes which start with 0.

In [23]:
# Since it is a 2d array, this will access the first row
my_2d_array[0]

array([0, 1, 2, 3])

In [24]:
# Since it is a 2d array, this will access the first column
my_2d_array[:,0]

array([0, 4, 8])

In [25]:
# You can also access several values at the same time
my_2d_array[:2, 1:3]

array([[1, 2],
       [5, 6]])

## Create a Python Function

The last thing you will need to know in order to follow this chapter is how to define a function in Python. 

A function is a block of code which only runs when it is called. You can pass data, known as parameters, into a function and it returns the result to the applied operation. For example, if you want to calculate the following simple operation several times in your code: $ \frac{x+y}{5} $ , it is simple to create a function where you pass the x and y parameters and then returns the result (instead of  having to compute this operation every time you want to use it)

The typical syntax and structure is the following:

In [26]:
# Define a function
def my_function(input1, input2):
    return (input1 + input2)/5

# Create 2 arrays to use with the function
my_array1 = np.array([0,5,10,15,20,25,30])
my_array2 = np.array([5,10,15,20,25,30,35])

# Apply the function
my_function(my_array1, my_array2)

array([ 1.,  3.,  5.,  7.,  9., 11., 13.])

We will not be using many functions in this chapter but they are mentioned in the Pandas section. 
(If you want to learn more about how to create a function with Python and all the great advantages they offer, visit the [link](https://docs.python.org/2.0/ref/function.html).

Another important function you should know about is the ``lambda`` function. It is a small anonymous function which can take any number of arguments but, it can only have one expression. In order to create a lambda function, you will need the following structure:

``lambda arguments : expression`` . 

(To know more about these functions, visit the [link](https://www.w3schools.com/python/python_lambda.asp))

In [27]:
# Define the lambda function
my_lambda = lambda x : x + 1

# Pass the function 
my_lambda(3)

4

In [28]:
# Define the lambda function
my_lambda = lambda x,y : (x*y)/10

# Pass the function 
my_lambda(50,60)

300.0

These are some vary basic notions on how to define your data using python dictionaries and lists, how to use NumPy arrays and functions and what are functions like in Python. With these basics you should be able to better understand the content of the upcoming notebooks. 