# Python Intro (for R programmers)

Teaching a programming language is outside the scope of a software package webinar, but we do realize that most of our audience is more familiar with the [R programming language](https://www.r-project.org) than [Python](https://www.python.org/). This notebook will provide with some of the most common features and structures used by [MGSurvE](https://github.com/Chipdelmal/MGSurvE) so that our potential audience can get acquainted with the code structure and follow along as easily as possible.

## Warnings and Notes

Some important things for [R](https://www.r-project.org) programmers who are new to [Python](https://www.python.org/) are the following:

* In line with most programming languages, [Python](https://www.python.org/) is **zero-indexed**, meaning that arrays, lists, and other collections start from index `0` rather than `1`.
* In contrast to most other programming languages, [Python](https://www.python.org/) **does not use brackets for code-blocks rather, it uses indentation** to delimit sections of our code for functions and logic structures. While it might seem strange the first couple of times it's used, people who are used to following code-style practices will quickly get used to the idea.
* Semi colons (`;`) are not necessary to end lines, but they can be used to do multiple commands in one line.
* Multiple variables can be assigned in the same line through "unpacking": `(a, b, c) = (10, 4, 2)` (see the ["tuples section"](#tuple) for more information).
* Lists are mutable and copies have to be done explicitly to avoid inadvertenly changing the values of elements by reference (see [lists](#list) section for more information).
* Growing lists in [Python](https://www.python.org/) is not as computationally expensive as it is in R (due to [amortized list expansion](https://medium.com/analytics-vidhya/amortized-runtime-analysis-for-python-lists-35e935e290db)).
* Logic comparisons are done with `==`, whereas comparing memory elements is done with `is` (for more info have a look at the ["boolean" section](#boolean)).
* Snake-casing is preferred over other types of words separators for variable names (such as camel-casing), and using the `.` usually results in errors as it is an operator to access attributes or methods in objects (see the [objects section](#objects) for more info).
* Array and matrix operations are done under an external library (see [Numpy section](#arrays-and-matrices-numpy) for more info).
* Array and matrix objects are static in size and homogeneous in data type (see [Numpy section](#arrays-and-matrices-numpy) for more information).
* Array and matrix objects are modified "in-place", so copies of the objects have to be explicitly requested in order to modify different sections of memory (see [Numpy section](#arrays-and-matrices-numpy) for more information).

## Handling Packages

### Installation

In [Python](https://www.python.org/), we usually install packages outside of our applications by running the following command on the terminal:

```bash
pip install MGSurvE
```

The use of virtual environments such as [venv](https://docs.python.org/3/library/venv.html) and [conda](https://docs.conda.io/en/latest/) are strongly suggested to avoid package clashes and python installation paths. If we are working in Jupyter notebooks we can install packages directly in our working environment by running:

```bash
!pip install MGSurvE
```

By adding the `!` symbol we are telling the Jupyter notebook to interpret the line as a bash command, rather than a python one.

### Import

To import a package into our running application, we use the `import` command, followed by the package name.

In [1]:
import math
math.sqrt(49)

7.0

If we want to import all the functions and variables in a package we can use:

In [2]:
from math import *
sqrt(49)

7.0

Which is more similar to what we would use in [R](https://www.r-project.org). We have to be careful, however, as this can cause problems with clashing functions and variables if different packages use the same names (if two packages, for example, define the `sqrt` function).
A way to import specific functions from a package is:

In [3]:
from math import sqrt
sqrt(49)

7.0

### Virtual Environments

Virtual Environments are strongly suggested for anyone who does development in [Python](https://www.python.org/). This is because they help solve a lot of problems with packages dependencies, making the code reproducible, and installing packages in systems where we don't have admin privileges. Learning what are and how to use virtual environments is outside the scope of this quick guide (as throughout the webinar we will be working with a Docker image that already contains all the required dependencies to run the code), but the main idea of these tools is to create isolated "versions" of the [Python](https://www.python.org/) language and packages required to run a specific application so that several different versions can co-exist in the same system. For more info on working with virtual environments, have a look at these guides: [venv](https://realpython.com/python-virtual-environments-a-primer/) and [conda](https://whiteboxml.com/blog/the-definitive-guide-to-python-virtual-environments-with-conda)

## Core Types

### Numeric


In most applications we can use numbers (both float and integers) the same way we would in [R](https://www.r-project.org). The only thing we have to be a bit careful with is staying within [underflow and overflow](https://pythonnumericalmethods.berkeley.edu/notebooks/chapter09.02-Floating-Point-Numbers.html) limits:

In [4]:
import sys
sys.float_info

sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

### Strings

Strings, same as numerics, are pretty standard when compared to other programming languages. Since Python 3.X strings are unicode by default, so adding accents and other symbols is straightforward:

In [5]:
stringOne = "This is a string."
c = '''
This is a multiline (multilínea)
string.
'''
print(c)


This is a multiline (multilínea)
string.



One thing to take into account, though, is that strings are an immutable type. This means that we can't change the contents of a string "in place".

In [6]:
a = "This is a string"
# a[5] = "x" # Returns an error
# We replace the entries and re-assign the variable
a = a.replace("i","X")
a

'ThXs Xs a strXng'


We can also format and concatenate strings in complex ways:

In [7]:
template = '{:.2f} {:s} are worth ${:d} US'
print(template.format(4.5560, 'Argentinian Pesos', 1))

(name, height) = ("chipdelmal", 1.75555)
f"Hello, {name}. You are {height:.2f} m tall," + " and this string was concatenated"

4.56 Argentinian Pesos are worth $1 US


'Hello, chipdelmal. You are 1.76 m tall, and this string was concatenated'

### Boolean

One of the advantages of using booleans in python is that logic comparisons tend to be easy to read:

In [8]:
(True and False) or (False or not True)

False

One minor detail with booleans in [Python](https://www.python.org/) is that they are considered numeric (where `False` takes the value of `0` and `True` a value of `1`), so operations like the following are allowed:

In [9]:
True+(False/True)

1.0

And any number other than `0` is considered to be `True` if used in a boolean operation:

In [10]:
print(5 and True)
print(not (False and 0))

True
True


Finally, an important thing to remember is that the operator `is` is different from the logical comparison `==`. The `==` comparison returns `True` when two variables contain the same information while `is` returns `True` when two variables are the same object (point towards the same location in memory).

In [11]:
a = [1, 2, 3]
b = a
a[0] = 0
print(f"Information in 'a' {a}"); 
print(f"Information in 'b' {b}")
print(">>> b is a: {}".format(b is a))


a = [1, 2, 3]
b = a.copy()
print(f"Information in 'a' {a}"); 
print(f"Information in 'b' {b}")
print(f">>> b is a: {b is a}")

Information in 'a' [0, 2, 3]
Information in 'b' [0, 2, 3]
>>> b is a: True
Information in 'a' [1, 2, 3]
Information in 'b' [1, 2, 3]
>>> b is a: False


### None

The None object exists as an instance in memory, and there is only one instance of it for every [Python](https://www.python.org/) program. One important thing to take into account is that we should always use the is operator to make comparisons to the None object:

In [12]:
(a, b) = (None, None)
(a is None) and (b is None)

True

## Collections

In [R](https://www.r-project.org) we are used to handling lists and matrices as the most versatile collections of elements. Lists are still some of the most flexible collections of elements in [Python](https://www.python.org/), but named lists get usually replaced by dictionaries, and matrices are replaced by Numpy arrays. In this section we will go through some of the most widely used collection types.

### List

Lists are mutable collections that can contain elements of different types. We can define, slice, and manipulate lists as we would in most programming languages:

In [13]:
lstA = [1, 2, "a", 4]
lstA[0] = 5
print(lstA)

# Getting elements through indices
a = [7, 1, 2, 6, 0, 3, 2]
a[2:4]
# Replacing elements
a[2:3] = [0,0]
# Getting elements from the end
a[-2]

[5, 2, 'a', 4]


3

One important thing to take into account, though, is that lists are modified "in-place", meaning that if a copy of the list is not explicitly created, we could run into "side-effect" problems when changing the values of its elements. For example, if we created two lists as follows, and modified one of them, both would be changed as they refer to the same object in memory:

In [14]:
# A and B are exactly the same list (same location in memory)
a = [1, 2, 3]
b = a
print(a is b)
# So, when we modify A, B is affected too
a[0] = 1000
print(a)
print(b)

True
[1000, 2, 3]
[1000, 2, 3]


To avoid this behaviour, we would need to create a copy of the first list when creating the second:

In [15]:
# A and B are different lists 
a = [1, 2, 3]
b = a.copy()
print(a is b)
# So, when we modify A, B is not affected
a[0] = 1000
print(a)
print(b)

False
[1000, 2, 3]
[1, 2, 3]


For [R](https://www.r-project.org) programmers it is important to note and remember that Python is zero-indexed, meaning that all lists start at `0`, unlike [R](https://www.r-project.org) that is one-indexed.

### Tuple

Tuples are similar to lists, with the difference that these collections are immutable, meaning that once they're defined they can't be modified:

In [16]:
myTuple = (2, 4, 6, 8)
print(myTuple[3::-1])

# This line would return an error, as we can't modify tuple's elements
# myTuple[0] = 10

(8, 6, 4, 2)


### Dictionary

Dictionaries in [Python](https://www.python.org/) (hash-tables in other programming languages) are quite efficient data storage-retreival collections. Rather than storing elements in a given order, the dictionary stores pairs of: key-value. Creating dictionaries is fairly simple and similar to defining named lists:

In [17]:
# We can create a dictionary in various different ways:
dictEx = {"age": 10 , "name": "Pusheen", "animal": "cat"}
dictEx = dict(name="Pusheen", animal="cat", age=10)
dictEx = dict(zip(["name","age","animal"], ["Pusheen",10,"cat"]))

Adding and retreiving elements can be done easily too:

In [18]:
print(dictEx["age"])
# We can also add a new element by key, or replace an existing one
dictEx["age"] = 11
dictEx["hobby"] = "blogging"
dictEx

10


{'name': 'Pusheen', 'age': 11, 'animal': 'cat', 'hobby': 'blogging'}

A couple of the few things we have to be careful with is that they keys for the dictionary have to be immutable (we can't use lists, for example, as keys), and that they are unsorted, so we shouldn't assume that if we iterate through their elements we will get the values in any given order.

### Set

Sets are useful collections whenever we want to check an element for pertenence in an unsorted group. With this data collection we can perform mathematical operations such as the union, intersection, difference, symmetric difference, amongst others:

In [19]:
mySet = {2, 4, 6, 8}
myExt = {6, 8, 12, 14}
print(2 in mySet)
print(mySet.intersection(myExt))

True
{8, 6}


### Arrays and Matrices ([Numpy](https://numpy.org/))

Unlike [R](https://www.r-project.org), we usually do not handle matrix operations natively, rather we use a Numpy, a library created mostly in C and C++ with [Python](https://www.python.org/) bindings to perform fast numeric operations on our mathematical constructs. To install numpy, we can simply run `pip install numpy` or `conda install numpy`, and to use it we import the package and do operations as we usually would in any [Python](https://www.python.org/) application:

In [20]:
import numpy as np

myMatrix = np.array([
    [1, 2, 3],
    [6, 5, 4]
])
myMask = np.array([
    [1, 0, 0],
    [1, 1, 1]
])
matrixOperation = ((myMask*myMatrix)**2).T
print(matrixOperation)

[[ 1 36]
 [ 0 25]
 [ 0 16]]


It is important to note a couple of things regarding arrays: first, they are static in size (elements in the array are contiguous in memory); and second, elements in the array must be homogeneous in [type](https://numpy.org/doc/stable/user/basics.types.html). For example, let's define a 2 by 5 array of boolean elements:

In [21]:
myArray = np.array([
    [0, 1, 0, False, True],
    [10.2, 9, 0, 0, 0]
], dtype=np.bool_)
print(f'My array is:\n{myArray};\nwith the "{myArray.dtype}" data type which uses {myArray.nbytes} bytes of memory.')


My array is:
[[False  True False False  True]
 [ True  True False False False]];
with the "bool" data type which uses 10 bytes of memory.


We can see that, as we passed a value to the `dtype` optional argument, all the elements in the array were coerced to a boolean type. Let's see what would have happened if we hadn't used the `dtype` argument:

In [22]:
myArray = np.array([
    [0, 1, 0, False, True],
    [10.2, 9, 0, 0, 0]
])
print(f'My array is:\n{myArray};\nwith the "{myArray.dtype}" data type which uses {myArray.nbytes} bytes of memory.')

My array is:
[[ 0.   1.   0.   0.   1. ]
 [10.2  9.   0.   0.   0. ]];
with the "float64" data type which uses 80 bytes of memory.


We can see that numpy coerces elements to the data type of the element that requires the largest precision, which might result in more memory being used. 
Finally, same as with lists, we have to take into account that numpy arrays are modified "in-place", so doing the following:

In [23]:
# A and B are exactly the same array (same location in memory)
a = np.array([1, 2, 3])
b = a
print(a is b)
# So, when we modify A, B is affected too
a[0] = 1000
print(a)
print(b)

True
[1000    2    3]
[1000    2    3]


we modify the object in memory to which both `a` and `b` point, so they both change their value. To avoid this, we need to create a copy explicitly:

In [24]:
# A and B are exactly the same array (same location in memory)
a = np.array([1, 2, 3])
b = a.copy()
print(a is b)
# So, when we modify A, B is affected too
a[0] = 1000
print(a)
print(b)

False
[1000    2    3]
[1 2 3]


Numpy are widely used internally by MGSurvE to perform all the algebraic operations required in the optimization cycle, so some knowledge on this package would be required for people interested in going deeper into the development of our software.

### Dataframes ([Pandas](https://pandas.pydata.org/))

Same as with arrays, dataframes are handled with external libraries in Python, where the most popular one by far is [Pandas](https://pandas.pydata.org/). We won't go into much detail here in terms of how to create and handle dataframes, as it's a wide subject and MGSurvE usually just makes use of these structures for initialization of objects or exporting optimization logs. Defining DataFrames and accessing elements can be done as follows:

In [25]:
import pandas as pd
 
data = {
    'x-coordinate': [10, 0, 1],
    'y-coordinate': [22, 14, 10],
    'trap-type': [1, 1, 0] 
}
df = pd.DataFrame(data)
print(df)
print(f"Trap types: {df['trap-type'].values}")

   x-coordinate  y-coordinate  trap-type
0            10            22          1
1             0            14          1
2             1            10          0
Trap types: [1 1 0]


## Control Flow

Control flow structures work in pretty much the same way as they do in other programming languages, although python does have some convenient shortcuts to make our lives easier.

### If-Elif-Else

If statements are pretty straightforward, the only thing we need to remember is that code-blocks are delimited by indentation:

In [26]:
x = -1
if x < 0:
  print("The number is negative.")
print("End of program")

The number is negative.
End of program


If-elif-else statements work the same way as in any other general-purpose language:

In [27]:
x = -10
if x == 0:
  print("The number is zero")
elif x < 0:
  print("The number less than zero")
else:
  print("The number more than zero")

The number less than zero


And we can use single-line shorthands to make comparisons and assign values:

In [28]:
x = 0
result = (True if (x > 0) else False)
print(result)

False


### For

For loops iterate over collections of items (lists or tuples). We can use these loops in a traditional way:

In [29]:
sum = 0
list(range(0,10,2))
for i in range(0,10,1):
  sum = sum + i
  print(sum)

0
1
3
6
10
15
21
28
36
45


And we can iterate elements of some collections directly:

In [30]:
numbers = ["One", "Two", "Three", "Four"]
for num in numbers:
  print("N: " + num)

N: One
N: Two
N: Three
N: Four


Furthermore, we can traverse a list and use the index of the currently inspected element at the same time by taking advantage of the enumerate function:

In [31]:
numbers = ["One", "Two", "Three", "Four"]
for (i, name) in enumerate(numbers):
  print("Number " + str(i) + ": " + name)

Number 0: One
Number 1: Two
Number 2: Three
Number 3: Four


### List comprehension

One of the most beloved features in Python is the ability to do list comprehensions. By using some quick shorthands we can define and store list values easily:

In [32]:
squaredInts = [i*2 for i in range(10)]
print(squaredInts)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]


We can even combine if-else statements in list comprehensions for more complex behaviors:
* if: `[f(x) for x in sequence if condition]`
* if else: `[f(x) if condition else g(x) for x in sequence]`

In [33]:
# Square of even numbers
compA = [(i**2) for i in range(1, 10) if (i%2==0)]
print(compA)
# Square of odds, root of evens
compB = [(i**2) if (i%2!=0) else (i**(1/2)) for i in range(1, 10)]
print(compB)

[4, 16, 36, 64]
[1, 1.4142135623730951, 9, 2.0, 25, 2.449489742783178, 49, 2.8284271247461903, 81]


And we can even do comprehensions with structures such as [dictionaries](#dictionary):

In [34]:
{ix+1: val for (ix, val) in enumerate(['one', 'two', 'three'])}

{1: 'one', 2: 'two', 3: 'three'}

### While

While loops hold little surprises in comparison to other programming languages. An example in which we sample random uniform numbers from 0 to 10 while our sampled value is less than 8, and is run at most 10 times would be:

In [35]:
from random import randint

(x, total) = (0, 0)
while (x <= 8):
  if total >= 10:
    break
  total = total + 1
  x = randint(0, 10)
print(total)

3


## Functions

Defining functions can be done by using the following template:
```python
def FUNCTION_NAME(ARGUMENTS): 
    FUNCTION_BODY
    return FUNCTION_RESULT
```

For example, defining a simple function to return the square of a number can be done as follows:

In [36]:
def squared(number):
    sqrdNum = number**2
    return sqrdNum

squared(2)

4

And adding optional arguments is the same as we do in [R](https://www.r-project.org):

In [37]:
def power(number, power=2):
    pwdNum = (number**power)
    return pwdNum

print(power(2))
print(power(2, power=8))

4
256


## Objects

Most programmers in [R](https://www.r-project.org) can get by without interacting much with [R6 objects](https://r6.r-lib.org/articles/Introduction.html). In Python, however, interacting with objects is core to most applications (many of Python's core types are objects themselves), and it pays off understanding at least some of the basic syntax to interact with them. Teaching the principles of object-oriented programming is, again, outside the scope of this quick guide, but let's get the general idea of how to do basic operations on generic objects:

In [38]:
# After defining an object, we can check its available methods with "dir"
myString = "Testing objects"
dir(myString)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',


To call one of the object methods or attributes, we use the dot operator (`.`):

In [39]:
myString = "Testing objects"
print(f"String is {myString.__len__()} elements long and capitalized, it looks like this: {myString.upper()}")

String is 15 elements long and capitalized, it looks like this: TESTING OBJECTS


The general syntax to access attributes and methods is:

```python
object.ATTRIBUTE_NAME
object.METHOD_NAME()
```

This is why using the dot operator as a separator in Python (or almost any other programming language) is prohibited by the language.

We will go through some object manipulation examples in the webinar (as the main structure in the package is an object), but one final thing to note is that some methods can modify the object's variables in place, and some might require assignment (depending on the `class` implementation).

<hr>

# More Information

* [dataPy CADi](https://github.com/Chipdelmal/dataPy_CADi)
  * [Python 101 (part 1)](https://github.com/Chipdelmal/dataPy_CADi/blob/master/md/python101.md)
  * [Python 101 (part 2)](https://github.com/Chipdelmal/dataPy_CADi/blob/master/md/python101b.md)
  * [Python 101 (part 3)](https://github.com/Chipdelmal/dataPy_CADi/blob/master/md/python101c.md)
  * [Python 102](https://github.com/Chipdelmal/dataPy_CADi/blob/master/md/python102.md)
  * [Advanced Python](https://github.com/Chipdelmal/dataPy_CADi/blob/master/md/python103.md)
* [Reticulate's: Primer on Python for R Users](https://cran.r-project.org/web/packages/reticulate/vignettes/python_primer.html)
* [NumPy: the absolute basics for beginners](https://numpy.org/doc/stable/user/absolute_beginners.html)
* [10 minutes to pandas](https://pandas.pydata.org/docs/user_guide/10min.html)
* [Python Crash Course (Eric Matthes)](https://nostarch.com/python-crash-course-3rd-edition)