# Python basics and numpy review
================================


# Semantics

Indentation: Whitespace Matters!

Next, we get to the main block of code:

for i in range(10):
    if i < midpoint:
        lower.append(i)
    else:
        upper.append(i)

        
## Whitespace *Within* Lines Does Not Matter
While the mantra of *meaningful whitespace* holds true for whitespace *before* lines (which indicate a code block), white space *within* lines of Python code does not matter.


## Everything Is an Object!

Python is an object-oriented programming language, and in Python everything is an object.

* **What is an object?**: An object is an entity that contains data along with associated metadata and/or functionality. Python is an object-oriented programming language, and in Python everything is an **object**, which means every **entity** has some **metadata (called attributes)** and **associated functionality (called methods)**. These attributes and methods are accessed via the dot syntax.

## Python Variables Are Pointers

* **What is a pointer?** A pointer is an object or variable that stores the memory address of another value located in the memory. A pointer references a location in memory where the variable of the object is stored.
 
Assigning variables in Python is as easy as putting a variable name to the left of the equals (``=``) sign:

```python
# assign 4 to the variable x
x = 4
```

This may seem straightforward, but if you have the wrong mental model of what this operation does, the way Python works may seem confusing.
We'll briefly dig into that here.

In many programming languages, variables are best thought of as containers or buckets into which you put data.
So in C, for example, when you write

```C
// C code
int x = 4;
```

you are essentially defining a "memory bucket" named ``x``, and putting the value ``4`` into it.
**In Python, by contrast, variables are best thought of not as containers but as pointers.**
So in Python, when you write

```python
x = 4
```

you are essentially defining a *pointer* named ``x`` that points to some other bucket containing the value ``4``.
Note one consequence of this: because Python variables just point to various objects, there is no need to "declare" the variable, or even require the variable to always point to information of the same type!


        
# Built-In Data Structures

We have seen Python's simple types: ``int``, ``float``, ``complex``, ``bool``, ``str``, and so on.
Python also has several built-in compound types, which act as containers for other types.
These compound types are:

| Type Name | Example                   |Description                            |
|-----------|---------------------------|---------------------------------------|
| ``list``  | ``[1, 2, 3]``             | Ordered collection                    |
| ``tuple`` | ``(1, 2, 3)``             | Immutable ordered collection          |
| ``dict``  | ``{'a':1, 'b':2, 'c':3}`` | Unordered (key,value) mapping         |
| ``set``   | ``{1, 2, 3}``             | Unordered collection of unique values |

As you can see, round, square, and curly brackets have distinct meanings when it comes to the type of collection produced.


## Dictionaries
Dictionaries are extremely flexible mappings of keys to values, and form the basis of much of Python's internal implementation.
They can be created via a comma-separated list of ``key:value`` pairs within curly braces:

```
numbers = {'one':1, 'two':2, 'three':3}

# Access a value via the key
numbers['two']

# Set a new key:value pair
numbers['ninety'] = 90
```



## Arithmetic Operations
Python implements seven basic binary arithmetic operators, two of which can double as unary operators.
They are summarized in the following table:

| Operator     | Name           | Description                                            |
|--------------|----------------|--------------------------------------------------------|
| ``a + b``    | Addition       | Sum of ``a`` and ``b``                                 |
| ``a - b``    | Subtraction    | Difference of ``a`` and ``b``                          |
| ``a * b``    | Multiplication | Product of ``a`` and ``b``                             |
| ``a / b``    | True division  | Quotient of ``a`` and ``b``                            |
| ``a // b``   | Floor division | Quotient of ``a`` and ``b``, removing fractional parts |
| ``a % b``    | Modulus        | Integer remainder after division of ``a`` by ``b``     |
| ``a ** b``   | Exponentiation | ``a`` raised to the power of ``b``                     |
| ``-a``       | Negation       | The negative of ``a``                                  |
| ``+a``       | Unary plus     | ``a`` unchanged (rarely used)                          |

## Comparison Operations

Another type of operation which can be very useful is comparison of different values.
For this, Python implements standard comparison operators, which return Boolean values ``True`` and ``False``.
The comparison operations are listed in the following table:

| Operation | Description 
| ----- | -------- 
| ``a == b``| ``a`` equal to ``b``
| ``a != b``| ``a`` not equal to ``b``
| ``a < b`` | ``a`` less than ``b`` 
| ``a > b`` | ``a`` greater than ``b``
| ``a <= b``| ``a`` less than or equal to ``b`` 
| ``a >= b`` | ``a`` greater than or equal to ``b`` 


# Control Flow

## Conditional Statements: ``if``-``elif``-``else``:
Conditional statements, often referred to as *if-then* statements, allow the programmer to execute certain pieces of code depending on some Boolean condition.

Indentation: Whitespace Matters!

Next, we get to the main block of code:
```python
if x==0:
    print("x is zero)
elif x>0:
    print("x is positive)          
else:
    print("x is negative)
```
In programming languages, a block of code is a set of statements that should be treated as a unit. In Python, code blocks are denoted by indentation:
In Python, indented code blocks are always preceded by a colon (:) on the previous line.


## ``for`` loops
Loops in Python are a way to repeatedly execute some code statement.
So, for example, if we'd like to print each of the items in a list, we can use a ``for`` loop:

In [11]:
for number in [2, 3, 5, 7]:
    print(number, end=' ') # print all on same line
    
for number in range(3,5):
    print(number, end=';') # print all on same line

2 3 5 7 3;4;

## The join fuinction for strings

``<separator_string>.join(iterable)``

Return a string which is the concatenation of the strings in iterable. A TypeError will be raised if there are any non-string values in iterable, including bytes objects. The separator between elements is the <separator_string> providing this method.
    
Examples:

In [2]:
my_list = ['1','2','3','4']

#Using ; as separator"
print(';'.join(my_list))

1;2;3;4


In [3]:
#Using " - " as separator"
print(' - '.join(my_list))

1 - 2 - 3 - 4


## Exercise 1:

### Write a program which will find all such numbers which are divisible by 7 but are not a multiple of 5, between 2000 and 3200 (both included). The numbers obtained should be printed in a comma-separated sequence on a single line.

In [12]:
# a%b :  Modulus operator, iInteger remainder after division of a by b.

# Tips:
#   First you will need to iterate over each number between 2000 and 3200 (both included).
#   At each iteration, you should use the if statements to check if the number meet the criteria.
#   If it meets the criteria, added to a list (create an empty list before the loop).
#   Finally, print the numbers in a comma-separated sequence on a single line.
#      For this use the join function.
#   Before this, convert the list of ints into a list of string with the number (cast each element using int() function ).
    

## Identity and Membership Operators

Like ``and``, ``or``, and ``not``, Python also contains prose-like operators to check for identity and membership.
They are the following:

| Operator      | Description                                       |
|---------------|---------------------------------------------------|
| ``a is b``    | True if ``a`` and ``b`` are identical objects     |
| ``a is not b``| True if ``a`` and ``b`` are not identical objects |
| ``a in b``    | True if ``a`` is a member of ``b``                |
| ``a not in b``| True if ``a`` is not a member of ``b``            |


In [14]:
print("John" == "John Doe")

False


In [16]:
print("John" in "John Doe")

True


In [18]:
print("Doe" in "John Doe")

True


## The split function

``str.split(sep=None, maxsplit=-1)``

Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified or -1, then there is no limit on the number of splits (all possible splits are made).

If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, '1,,2'.split(',') returns ['1', '', '2']). The sep argument may consist of multiple characters (for example, '1<>2<>3'.split('<>') returns ['1', '2', '3']). Splitting an empty string with a specified separator returns [''].

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].
  

In [1]:
'1,2,3'.split(',')

['1', '2', '3']

In [2]:
'1,2,3'.split(',', maxsplit=1)

['1', '2,3']

In [3]:
'1,2,,3,'.split(',')

['1', '2', '', '3', '']

## Exercise 2: Write a function that count the ocurrences of a given name in a list of Full names.

In [None]:
# Lets first generate random Full names.
# For this we are going to use the names package.

!pip install names  # Install the name package.

In [None]:
# Lets create a list with the Full names.

#Import the names package
import names

# Import the random module to set the seed.
import random 

random.seed(0)

# Create a list with N=number_of_names random names
number_of_names = 1000

#Lets use list comprehensions. This is faster than the for loops.
full_names = [names.get_full_name() for i in range(number_of_names)]

# The full_names list is now available


# Now we can continue with the exercise
# Lets write the function that count the ocurrences of a given name in a list of full names
def count_name_occurence(full_names_list, name_to_check):
    # One way to do this is to iterate over each full name in the list (using a for loop).
    #  Then, for each full name, we need to divide the string in two strings:
    #     - one with the name and other with the last name
    #  Then, if the name matchs, add 1 to a counter (initialized to 0 at the beggining).
    pass # remove this once you write the code





## Exercise 3: Write a function that count the number of names in a list of Full names that start with a certain string.
E.g. names that start with John (like Johnathan).

In [31]:
def count_name_occurence2(full_names_list, string_to_check):
    # One way to do this is to iterate over each full name in the list (using a for loop).
    #  Then, for each full name, we need to divide the string in two strings:
    #     - one with the name and other with the last name
    #  Then, if the name starts with the string_to_check to check, add 1 to a counter.
    pass # remove this once you write the code 

## 2) Write a function that count the number of names in a list of Full names that start with a certain string, but is not equal to it.
E.g. names that start with John (like Johnathan), excluding the names John.

In [33]:
def count_name_occurence3(full_names_list, string_to_check):
    # One way to do this is to iterate over each full name in the list (using a for loop).
    #  Then, for each full name, we need to divide the string in two strings:
    #     - one with the name and other with the last name
    #  Then, if the name starts with the string_to_check, but is not equal to the string, add 1 to a counter.
    pass # remove this once you write the code 

## Exercise 4: Using the function create in exercise 2), create a dictionary with the counts of each name in the list. That is a dict with names as keys, and the name counts as values.

In [35]:
# Iterate over each element in the list of full names. Every time that you have a coincidence in the name, 
# increment the value in the dict by one.
# If the name was not added to the dict, (used key in my_dict to check this), set the value to 1.


## Exercise 5: Write a Python function to remove all even numbers from a list

In [5]:
input_list=[1,2,3,4,5,6,7,8,9,10]
def remove_even(input_list):
    # Write code here
    return input_list

# expected output [1,3,5,7,9]

# Numpy review

## The Basics

**NumPy’s main object is the homogeneous multidimensional array**. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In NumPy, dimensions are called axes.

**NumPy’s array class is called ndarray**. It is also known by the alias array. *Note that numpy.array is not the same as the Standard Python Library class array.array, which only handles one-dimensional arrays and offers less functionality.* 

The more important attributes of an ndarray object are:

* **ndarray.ndim** : the number of axes (dimensions) of the array.
* **ndarray.shape** : the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.
* **ndarray.size** : the total number of elements of the array. This is equal to the product of the elements of shape.
* **ndarray.dtype** : an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.
* **ndarray.itemsize**: the size in bytes of each element of the array. For example, an array of elements of type float64 has itemsize 8 (=64/8), while one of type complex32 has itemsize 4 (=32/8). It is equivalent to ndarray.dtype.itemsize.
* **ndarray.data** : the buffer containing the actual elements of the array. **Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.**


## Array Creation

There are several ways to create arrays.

For example, you can create an array from a regular Python list or tuple using the array function. 
**The elements data type of the resulting array is deduced from the data type of the elements in the sequences.**



In [9]:
import numpy as np

list_of_integers = [2,3,4]
my_array = np.array(list_of_integers)
my_array

array([2, 3, 4])

In [12]:
list_of_lists =  [ [1.5,2,3] , [4,5,6] ]
my_2D_array = np.array(list_of_lists)
my_2D_array

array([[1.5, 2. , 3. ],
       [4. , 5. , 6. ]])

## Array initialization

Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to create arrays with initial placeholder content. These minimize the necessity of growing arrays, an expensive operation.

The function **zeros** creates an array full of zeros, the function **ones** creates an array full of ones, and the function **empty** creates an array whose initial content is random and depends on the state of the memory.
Finally, the function **full** creates an array filled with given value. **By default, the dtype of the created array is float64.**


In [13]:
np.zeros( (3,4) )

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [15]:
np.ones( (2,3,4), dtype=np.int16 ) # dtype can also be specified

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int16)

In [18]:
np.empty( (2,3) )  # uninitialized, output may vary

array([[1.5, 2. , 3. ],
       [4. , 5. , 6. ]])

In [19]:
np.full((2, 2), 10)  # Array of 10s

array([[10, 10],
       [10, 10]])

arange function

To create sequences of numbers, NumPy provides a function analogous to range that returns arrays instead of lists.

The calling arguments for this function are: arange([start, ]stop, [step, ]dtype=None), see the documentation for more details.

This function can be called in different ways:

* **np.arange(stop)** : returns a sequence from 0 to stop-1 , with an step of 1.
* **np.arange(start,stop)** : returns a sequence from start to stop-1 , with an step of 1.
* **np.arange(start, stop, step)** : : returns a sequence from start to stop-1 , with an step defined by the user.

The dtype keyword can be aso used to specify the desired data type of sequence.

In [21]:
np.arange( 10) # equivalent to np.arange(0,10) and np.arange(0,10,1)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [22]:
np.arange( 10, 30, 5 ) # np.arange(start, stop, step) 

array([10, 15, 20, 25])

## Array operations

**The operations are element-wise!**

| Operator     | Name           | Description                                            |
|--------------|----------------|--------------------------------------------------------|
| ``a + b``    | Addition       | Sum of ``a`` and ``b``                                 |
| ``a - b``    | Subtraction    | Difference of ``a`` and ``b``                          |
| ``a * b``    | Multiplication | Product of ``a`` and ``b``                             |
| ``a / b``    | True division  | Quotient of ``a`` and ``b``                            |
| ``a // b``   | Floor division | Quotient of ``a`` and ``b``, removing fractional parts |
| ``a % b``    | Modulus        | Integer remainder after division of ``a`` by ``b``     |
| ``a ** b``   | Exponentiation | ``a`` raised to the power of ``b``                     |

## Aggregations: min, max, sum, prod, mean, std, var, any, all

We commonly need to compute statistics on a large amount of data. NumPy has fast built-in aggregation functions to compute the following statistics:

* **min** : Return the minimum along a given axis
* **argmin** : Returns the indices of the minimum values along an axis
* **max** : Return the maximum along a given axis
* **argmax** : Returns the indices of the maximum values along an axis
* **sum** : Return the sum of the array elements over the given axis.
* **cumsum** : Return the cumulative sum of the array elements over the given axis.
* **prod** : Return the product of the array elements over the given axis.
* **mean** : Returns the average of the array elements along given axis.
* **std** : Returns the standard deviation of the array elements along given axis.
* **var** : Returns the variance deviation of the array elements along given axis.
* **any** : Test whether any array element along a given axis evaluates to True.
* **all** : Test whether all array elements along a given axis evaluate to True.

Many unary operations, such as computing the sum of all the elements in the array, are also implemented as methods of the ndarray class.

## Indexing and slicing

One-dimensional arrays can be indexed sliced and iterated over, much like lists and other Python sequences.


In [25]:
my_1D_array = np.arange(10)**3
my_1D_array[2] # Access the 3rd elementmy_1D_array = np.arange(10)**3
my_1D_array

my_1D_array[2] # Access the 3rd element

my_1D_array[2:5] # Return a 1D view of the array from the 2nd to the 4th element.

# equivalent to a[0:6:2] = -1000; from start to position 6, exclusive, set every 2nd element to -1000
my_1D_array[:6:2] = -1000  
my_1D_array

my_1D_array[ : :-1]   # reversed
pass # Suppress output

## Boolean array indexing

This advanced indexing occurs when indexing element is an array object of Boolean type, such as may be returned from comparison operators.

If obj.ndim == x.ndim, x[bool_array] returns a 1-dimensional array filled with the elements of x corresponding to the True values of bool_array.
If obj has True values at entries that are outside of the bounds of x (different shape), then an index error will be raised. If bool_array is smaller than x it is identical to filling it with False.

**Better if you use the bools of the same shape for now! In the next workshop we will talk about broadcasting.**

### Example

A common use case for this is filtering for desired element values. For example one may wish to select all entries from an array which are not NaN:

In [31]:
x = np.array([1., -1., -2., 3])
my_bool_arary = (x < 0)
my_bool_arary

array([False,  True,  True, False])

In [29]:
x[my_bool_arary] += 20
x

array([ 1., 19., 18.,  3.])

## Exercise 6: write a function that compute the Euclidean norm of an array.

, and the std, over the flattened array.

In [33]:
def my_norm(input_array):
    # The norm is the square root of the sum of the square of every elements.
    pass

# test the function
array_1 = np.arange(10)
my_norm(array_1)

## Exercise 7: write a function that compute the Euclidean norm of an array (flattened).

In [42]:
def my_rmse(input_array1,input_array2):
    # The norm is the root mean square error ever the flattened array.
    pass


# test the function
array_1 = np.arange(10)
array_2 = np.linspace(0,100,10)
my_rmse(array_1,array_2)

## Exercise 8: Now assume that you have two 2D arrays, the first dimension is time and you want to compute the rmse ever the time dimension (first dimension).

In [44]:
def my_rmse_over_time(input_array1,input_array2):
    # The norm is the root mean square error ever the flattened array.
    pass

# test the function
array_1 = np.arange(10*4).reshape(4,10)
array_2 = np.random.rand(4,10)*6
my_rmse_over_time(array_1,array_2)
