# Curso de introducción a Python    

1. Introducción a Jupyter Notebooks
2. Variables y estructuras de datos
3. Listas y loops
4. Estructuras condicionales
5. Diccionarios y tablas de frecuencia
6. Funciones
7. Funciones avanzadas
8. Lenguaje orientado a objetos
9. Introducción a NumPy
10. Introducción a Pandas
11. Creación de gráficos y visualización

## 1. Jupyter Notebooks

<img src="Figures/Jupyter.png" alt="Drawing" style="width: 200px;"/>

Jupyter notebooks es un entorno (IDE) que nos permite intercalar contenido de texto (se llamará cela 'Markdown') con código Python (tipo de cela 'Code') que se puede ejecutar cela por cela como se puede ver en los siguientes ejemplos. 

Los comando básicos para poder seguir este curso y empezar a usar Jupyter Notebooks son:
* `Shift+Enter`: Ejecutar cela y seleccionar siguiente cela
* `Alt+Enter`: Ejecutar cela y insertar una cela nueva
* `ESC`: para entrar en modo 'command'. Una vez en este modo:
    * `H` lista de comandos
    * `A` insertar cela debajo
    * `B` insterar cela arriba
    * `D,D` D dos veces eleimina la cela
    * `Y` cambiar cela a tipo 'Code'
    * `M` cambiar cela a 'Markdown'
    * `Enter`pasar a modo edit

#### Ejecutar la cela de debajo usando alguno de los comandos aprendidos

In [9]:
# Ejemplo de código en Python. 
# Cambiar los valores de a y b, y observar el resultado al ejecutar la cela 
a=87
b=34
c=a+b
print('El resultado de sumar {} y {} es {}.'.format(a,b,c))

El resultado de sumar 87 y 34 es 121.


## 2. Variables y estructuras de datos

En Python no hace falta declarar las variables. Cada vez que se crea o modifica una variable él mimso interpreta qué tipo de variable es. Los posibles tipos de variables que se pueden encontrar son:
* **Numericas**
    * **Integer (int):** Números naturales positivos y negativos.
    * **Float:** Números reales.
    * **Complex:** Números complejos. La parte imaginaria se multiplica por `j` para representar la raíz de `-1`
* **Boleanas**
    Son variables que pueden tomar como valor cierto o falso, representados en Python por `True` y `False` usando mayúsculas.
* **Estructuras de datos (secuencias de datos)**
    Lista ordenada de valores del mismo o distinto tipo. en Python exizten:
    * **String**
    * **Lista**    
    * **Tuple** 
    * **Diccionario**    

In [12]:
# Declarar una variable a y mirar qué tipo de variable es

a=2
type(a)

int

In [14]:
# Probar de declarar otro tipo de variable y mirar qué tipo es

a=2.0
type(a)

float

## 3. Listas y loops

## 4. Estructuras Condicionales

## 5. Diccionarios y tablas de frecuencia

## Python

Once you have IPython installed, you are ready to perform all sorts of operations. 

The software program that you use to invoke operators is called an **interpreter**. You enter your commands as a ‘dialog’ between you and the interpreter. Commands can be entered as part of a script (a text file with a list of commands to perform) or directly at the *cell*. 

In order to ask to the interpreter what to do, you must **invoke** an operator:

In [16]:
3 + 4 + 9

16

In [18]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

It’s helpful to think of the computation carried out by an operator as involving four parts:

+ The name of the operator
+ The input arguments
+ The output value
+ Side effects

A typical operation takes one or more input arguments and uses the information in these to produce an output value. Along the way, the computer might take some action: display a graph, store a file, make a sound, etc. These actions are called side effects.

## Modules

Python is a general-purpose programming language, so when we want to use more specific commands (such as statistical operators or string processing oeprators) we usually need to import them before we can use them. For Scientific Python, one of the most important libraries that we need is **numpy** (Numerical Python), which can be loaded like this:

In [25]:
import numpy as np
np.sqrt(25)

5.0

In [26]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Access to the functions, variables and classes of a module depends on the way the module was imported:

In [27]:
import math
math.cos(math.pi)

-1.0

In [29]:
import math as m  # import using an alias
m.cos(m.pi)

-1.0

In [30]:
from math import cos,pi # import only some functions
cos(pi)

-1.0

In [32]:
from math import *   # global import
cos(pi)

-1.0

## Variables
Often the value returned by an operation will be used later on. Values can be stored for later use with the **assignment operator**:

In [37]:
a = 101
type(a)

int

The command has stored the value 101 under the name <code>a</code>. Such stored values are called **objects**. 

Making an assignment to an object defines the object. Once an object has been defined, it can be referred to and used in later computations. 

To refer to the value stored in the object, just use the object’s name itself. For instance:

In [39]:
a = np.sqrt(a)
a
type(a)

numpy.float64

There are some general rules for object names:

+ Use only letters and numbers and ‘underscores’ (_)
+ Do NOT use spaces anywhere in the name
+ A number cannot be the first character in the name
+ Capital letters are treated as distinct from lower-case letters (i.e., Python is case-sensitive)

In [40]:
3a = 10

SyntaxError: invalid syntax (<ipython-input-40-f986eee6e224>, line 1)

## Dynamic Typing

When you assign a new value to an existing object (*dynamic typing*), the former values of that object is erased from the computer memory. The former value of b was 10.0498756211, but after a new assignment:

In [41]:
b = 'a'
print(b)

a


The value of an object is changed only via the assignment operator. Using an object in a computation does not change the value. 

The brilliant thing about organizing operators in terms of input arguments and output values is that the output of one operator can be used as an input to another. This lets complicated computations be built out of simpler ones.

One way to connect the computations is by using objects to store the intermediate outputs:

In [42]:
a = np.arange(5)
np.sqrt(a)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ])

You can also pass the output of an operator directly as an argument to another operator:

In [15]:
np.sqrt(np.arange(5))

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ])

### Data Types

Most of the examples used so far have dealt with numbers. But computers work with other kinds of information as well: text, photographs, sounds, sets of data, and so on. The word *type* is used to refer to the kind of information. 

It’s important to know about the types of data because operators expect their input arguments to be of specific types. When you use the wrong type of input, the computer might not be able to process your command.

For our purposes, it’s important to distinguish among several basic types:

+ Numeric (positive and negative) data: 
    + decimal and fractional numbers (**floats**), <code>a = 3.5</code>
    + arbitrary length whole numbers (**ints**):  <code>c=1809109863596239561236235625629561</code>
+ **Strings** of textual data - you indicate string data to the computer by enclosing the text in quotation marks (e.g., <code>name = "python"</code>).
+ **Boolean** data: <code>a = True</code> or <code>a = False</code>.
+ **Complex** numbers: <code>a = 2+3j</code>
+ Sequence types: **tuples, lists, sets, dictionaries** and **files**.

In [59]:
a = True
type(a)

bool

In [56]:
a = 'a'
print(a)

a


## Operators

+ Addition (also string, tuple and list concatenation) <code>a + b</code>
+ Subtraction (also set difference): <code>a - b</code>
+ Multiplication (also string, tuple and list replication): <code>a * b</code>
+ Division: <code>a / b</code>
+ Truncated integer division (rounded towards minus infinity): <code>a // b</code>
+ Modulus or remainder: <code>a % b</code>
+ Exponentiation: <code>a ** b</code>
+ Assignment: <code>=</code>, <code>-=</code>, <code>+=</code>,<code>/=</code>,<code>*=</code>, <code>%=</code>, <code>//=</code>, <code>**=</code>
+ Boolean comparisons: <code>==</code>, <code>!=</code>, <code><</code>,<code>></code>,<code><=</code>, <code>>=</code>
+ Boolean operators: <code>and</code>, <code>or</code>, <code>not</code>
+ Membership test operators: <code>in</code>, <code>not in</code>
+ Object identity operators: <code>is</code>, <code>is not</code>
+ Bitwise operators (or, xor, and, complement): <code>|</code>, <code>^</code>, <code>&</code>, <code>~</code>
+ Left and right bit shift: <code><<</code>, <code>>></code>

### Python as a calculator

The Python language has a concise notation for arithmetic that looks very much like the traditional one.

In [61]:
a = 3+2
b= 3.5 * -8
c = 10/6
print(a, b, c, 10./6.)

5 -28.0 1.6666666666666667 1.6666666666666667


Some math functions are not available in the basic Python module, and they need to be imported from a specific module:

In [62]:
import math   # this instruction is not executed if the module has already been imported
print(math.pi + math.sin(100) + math.ceil(2.3))

5.635227012480034


### String processing with Python

Strings are list of characters:

In [63]:
a = 'python'
type(a)

str

In [76]:
print("Hello" )

Hello


In [77]:
print("This is 'an example' of the use of quotes and double quotes")
print('This is "another example" of the use of quotes and double quotes')

This is 'an example' of the use of quotes and double quotes
This is "another example" of the use of quotes and double quotes


We can use the operator ``+`` to concatenate strings:

In [81]:
a = 'He'
b = 'llo'
c = a+b+'!'
print(c)

Hello!


Substrings within a string can be accessed using **slicing**. Slicing uses ``[]`` to contain the indices of the characters in a string, where the first index is $0$, and the last is $n - 1$ (assuming the string has $n$ characters). 

In [90]:
a = 'Python'
print(a[:], a[0], a[2:], a[:3], a[2:4], a[::2], a[1::2])

Python P thon Pyt th Pto yhn


The most advanced string functions are stored in an external module called ``string``

In [34]:
import string as st
help(st)

Help on module string:

NAME
    string - A collection of string operations (most are no longer used).

FILE
    /Users/eloi/anaconda2/lib/python2.7/string.py

MODULE DOCS
    https://docs.python.org/library/string

DESCRIPTION
    Beginning with Python 1.6, many of these functions are implemented as
    methods on the standard string object. They used to be implemented by
    a built-in module called strop, but strop is now obsolete itself.
    
    Public module variables:
    
    whitespace -- a string containing all characters considered whitespace
    lowercase -- a string containing all characters considered lowercase letters
    uppercase -- a string containing all characters considered uppercase letters
    letters -- a string containing all characters considered letters
    digits -- a string containing all characters considered decimal digits
    hexdigits -- a string containing all characters considered hexadecimal digits
    octdigits -- a string containing all characters 

In [91]:
a = 'a'

In [37]:
# press tab for help after .
a.

In [92]:
a='Hello'
b = a.lower()
print(b)

hello


In [94]:
import string as st
print(st.ascii_letters)
'a' in st.ascii_letters

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ


True

### Conditionals

The conditional structure in Python is **<code>If</code>**. It is usally combined with 
relational operators: <code> <, <=, ==, >=, >, != </code>.

In [95]:
def main(celsius):
    fahrenheit = 9.0 /5.0 * celsius + 32
    print("The temperature in Fahrenheit is", fahrenheit)
    if fahrenheit > 90:
        print("It's really hot out there.")
    elif fahrenheit < 30:
        print("It's really cold out there.")
    else: pass
        
main(35)

The temperature in Fahrenheit is 95.0
It's really hot out there.


### Boolean operators.

In [96]:
a = 4
b = 40
(a>2) and (b>30)

True

In [97]:
(a>2) or (b>100)

True

In [98]:
not(a>2)

False

### Loops

* For
* While

In [99]:
for i in range(0,10,1):
    print(i)

0
1
2
3
4
5
6
7
8
9


In [101]:
i=0
while i < 10 :
    print(i)
    i= i+1

0
1
2
3
4
5
6
7
8
9


**``If``** statesments can be combined with **loops** (``for``, ``while``):

In [102]:
numbers = [-5, 3,2,-1,9,6]
total = 0

for n in numbers:
    if n >= 0:
        total += n

print(total)

20


In [103]:
def average(a):
    sum = 0.0
    for i in a:
        sum = sum + i
    return sum/len(a)

average([1,2,3,4])

2.5

In [104]:
def main(n):
    cont = 0
    while (int(n) > 0):
        cont += 1
        n = n/2
#        print n
    return cont-1

main(10)
# main(10.3)

3

## Data Collections

We need to represent data collections: words in a text, students in a course, experimental data, etc., or to store intermediate results. The most simple data collection is the <code>list</code> (an ordered sequence of objects):

### Lists

Lists are a built-in data type which require other data types to be useful. A list is a collection of other objects – floats, integers, complex numbers, strings or even other lists.

Lists also support slicing to retrieve one or more elements. Basic lists are constructed using square braces, ``[]``, and values are separated using
commas.

In [105]:
l=[]
type(l)

list

In [108]:
x=[1,2,3,4,[1,2,3,4],'jordi']
print(x[4:], x[0], x[5])

[[1, 2, 3, 4], 'jordi'] 1 jordi


In [109]:
x[-2:]  # The stride can also be negative which can be used to select the
        # elements of a list in reverse order.

[[1, 2, 3, 4], 'jordi']

Lists can be multidimensional and slicing can be done directly in higher dimensions:

In [110]:
x = [[1,2,3,4], [5,6,7,8]]
print(x[0], x[1][3])

[1, 2, 3, 4] 8


In [115]:

list(range(10))


[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [117]:
b = "This is an example".split()
print(b)

['This', 'is', 'an', 'example']


A list is an *ordered, mutable and dynamic* collection of *non-homogeneous* objects:

In [118]:
a = [1,2,3,4]
a[1] = 7
print(a)

[1, 7, 3, 4]


In [119]:
c = a + b
print(c)

[1, 7, 3, 4, 'This', 'is', 'an', 'example']


In [121]:
zeroes = [0] * 10
del zeroes[5:]
print(zeroes)

[0, 0, 0, 0, 0]


In [129]:
zeroes.append(1)
print(zeroes)

[0, 0, 0, 0, 0, 1]


In [130]:
zeroes.remove(1)
print(zeroes)

[0, 0, 0, 0, 0]


In [131]:
if 1 in zeroes:
    print(False)
else: 
    print(True)

True


### Dictionaries

A dictionary is a collection that allows the access of an *element* by using a *key*:

In [133]:
dict = {"d": "D", "b":"B", "c":"C"}
dict["d"]

'D'

Dictionaries are *mutable, dynamic* and *unordered*:

In [134]:
dict["a"]="A"
dict

{'d': 'D', 'b': 'B', 'c': 'C', 'a': 'A'}

In [135]:
"a" in dict

True

In [136]:
del dict["a"]
print(dict)

{'d': 'D', 'b': 'B', 'c': 'C'}


In [137]:
dict = {"d": "D", "b":"B", "c":"C"}
dict.items()

dict_items([('d', 'D'), ('b', 'B'), ('c', 'C')])

### Tuples

Tuples are **non-mutable** lists:

In [139]:
tup = ('a', 'b', 'c')
print(type(tup), tup[1:3])

<class 'tuple'> ('b', 'c')


In [140]:
tup[0]='d'

TypeError: 'tuple' object does not support item assignment

## A program in Python

General Rules:

+ All text from a <code>#</code> simbol to the end of a line are considered as comments.
+ Code must be **indented** and sometimes delineated by colons. The Python standard for indentation is four spaces. Never use tabs: it can produce hard to find errors. Set you editor to convert tabs to spaces.
+ Typically, a statement must be on a line. You can use a backslash <code>\</code> at the end of a line to continue a statement on to the next line.


In [19]:
# This program computes the factorial of 100.

fact = 1
n= 100
for factor in range(n,0,-1):
    fact = fact * factor 
print(fact)    

93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000


In [28]:
list(range(10,0,-1))

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

<div class="alert alert-error"> When we write a colon at the end of an iteration, all lines indented at the next level are considered *part* of the iteration. 

When we write a line at the same indentation as the iteration, we are closing the iteration.</div>

## References

We can inspect the reference of an object:

In [141]:
a ='hello'
print(id(a))

4563005600


Two different objects:

In [142]:
a = [1,2,3]
b = [1,2,3]
print(id(a), id(b))
print (a is b)
print (a == b)

4562913544 4562838856
False
True


Object alias:

In [143]:
a = [1,2,3]
b = a                     # alias
print(id(a), id(b))

4562646152 4562646152


Cloning:

In [144]:
a = [1,2,3]
b = a[:]                  # cloning with :

print(a, b, b[1:], id(a), id(b), id(b[1:]))

[1, 2, 3] [1, 2, 3] [2, 3] 4562988936 4563024200 4562911432


When a list is an argument of a function, we are sending the *reference*, not a *copy*

## Functions

To create a function, use def. 

In [145]:
def head(list):     #parameters separated by comma
    return list[0]  #Identation means inside functions

numbers=[1,2,3,4]
print(head(numbers), numbers)

1 [1, 2, 3, 4]


In [146]:
def change_first_element(list):
    list[0]=0 #it returns none!

numbers=[1,2,3,4]
change_first_element(numbers)
print(numbers)

[0, 2, 3, 4]


If we return a list we are returning a reference:

In [None]:
def tail(list):
    return list[1:]     # we are creating a new list

numbers=[1,2,3,4]
rest = tail(numbers)
print(rest, numbers)
print(id(rest), id(numbers))

In [None]:
# Press tab
numbers.reverse()
print(numbers)

Sometimes it is important to perform a *sanity check* about what is doing a pre-defined function:

In [None]:
numbers=[1,2,3,4]
def test(l):
    return l.reverse() #Reverse return NONE! 
print(numbers)
print(numbers, test(numbers)) #Reverses changes the list INPLACE if we don't make a copy we will  alter the original numbers 

In [None]:
numbers=[1,2,3,4]

def test(l):
    l.reverse() #Reverse changes the list INPLACE, if we don't make a copy we will  alter the original numbers 
    return l

print(numbers, test(numbers))
print(id(numbers), id(test(numbers)))

In [None]:
numbers=[1,2,3,4]

def test(l):
    a=l[:]
    l.reverse() #Reverse changes the list INPLACE, if we don't make a copy we will alter the original numbers 
    return a

print(numbers, test(numbers))
print(id(numbers), id(test(numbers)))

In [None]:
?list.reverse

## Algorithms in Python 


#### Factorial

The factorial of a non-negative integer $n$, denoted by $n!$, is the product of all positive integers less than or equal to $n$.  

In [None]:
def factorial(n):
    fact = 1
    for factor in range(n,0,-1):
        fact = fact * factor
    return fact

In [None]:
factorial(100)

#### Fibonacci

The Fibonacci Sequence is the series of numbers: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...

The general rule to compute the sequence is very simple: The next number is found by adding up the two numbers before it.

In [None]:
def fib1(n):
    if n==1:
        return 1
    if n==0:
        return 0
    return fib1(n-1) + fib1(n-2)

fib1(20)

# this function cannot compute fib(100)

In [None]:
def fib2(n):
    a, b = 0, 1
    for i in range(1,n+1):
        a, b = b, a + b
    return a

n = 1000
if n<15:
    print(fib1(n))
else: 
    print(fib2(n))

#### Greatest Common Divisor

The greatest common divisor of two positive integers $a$ and $b$ is the largest divisor common to $a$ and $b$.  The Euclidean algorithm, or Euclid's algorithm, is an interative method for computing the greatest common divisor of two integers. 

+ If $a<b$, exchange $a$ and $b$.
+ Divide $a$ by $b$ and get the remainder, $r$. If $r=0$, report $b$ as the GCD of $a$ and $b$.
+ Replace $a$ by $b$ and replace $b$ by $r$. If $r \neq 0$ iterate.



In [None]:
def gcd(a,b): # Euclides algorithm v1.0: pseudocode translation
    r = 1
    while r != 0:
        if a<b:
            c=a
            a=b
            b=c
        r = a%b 
        if r == 0:
            return b
        else:
            a = b
            b = r

gcd(100,16)

In [None]:
def gcd(a,b):   # Euclides algorithm v2.0: idiomatic Python
    while a:
        a, b = b%a, a
    return b

gcd(100,16)

In [None]:
a = 0
a == False

## Reading Data

The most convenient method that you can use to work with data is to load it directly into memory.

When you load a file (of any type), the entire dataset is available at all times and the loading process is quite direct:

In [3]:
with open("files/SeaIce.txt", 'r') as input_file:
    print('File content:\n' + input_file.read())

File content:
year mo    data_type region extent   area
1979  1      Goddard      N  15.54  12.33
1980  1      Goddard      N  14.96  11.85
1981  1      Goddard      N  15.03  11.82
1982  1      Goddard      N  15.26  12.11
1983  1      Goddard      N  15.10  11.92
1984  1      Goddard      N  14.61  11.60
1985  1      Goddard      N  14.86  11.60
1986  1      Goddard      N  15.02  11.79
1987  1      Goddard      N  15.20  11.81
1988  1        -9999      N  -9999  -9999
1989  1      Goddard      N  15.12  13.11
1990  1      Goddard      N  14.95  12.72
1991  1      Goddard      N  14.46  12.49
1992  1      Goddard      N  14.72  12.54
1993  1      Goddard      N  15.08  12.85
1994  1      Goddard      N  14.82  12.80
1995  1      Goddard      N  14.62  12.72
1996  1      Goddard      N  14.21  12.07
1997  1      Goddard      N  14.47  12.30
1998  1      Goddard      N  14.81  12.73
1999  1      Goddard      N  14.47  12.54
2000  1      Goddard      N  14.41  12.22
2001  1      Goddard

The entire dataset is loaded from the library into free memory. Of course, the loading process will fail if your system lacks sufficient memory to hold the dataset. When this problem occurs, you need to consider other techniques
for working with the dataset, such as **streaming** it or **sampling** it.

Here’s an example of how you can stream data using Python:

In [5]:
with open("files/SeaIce.txt", 'r') as input_file:
    for observation in input_file:
        print('Reading Data: ' + observation, end="")

Reading Data: year mo    data_type region extent   area
Reading Data: 1979  1      Goddard      N  15.54  12.33
Reading Data: 1980  1      Goddard      N  14.96  11.85
Reading Data: 1981  1      Goddard      N  15.03  11.82
Reading Data: 1982  1      Goddard      N  15.26  12.11
Reading Data: 1983  1      Goddard      N  15.10  11.92
Reading Data: 1984  1      Goddard      N  14.61  11.60
Reading Data: 1985  1      Goddard      N  14.86  11.60
Reading Data: 1986  1      Goddard      N  15.02  11.79
Reading Data: 1987  1      Goddard      N  15.20  11.81
Reading Data: 1988  1        -9999      N  -9999  -9999
Reading Data: 1989  1      Goddard      N  15.12  13.11
Reading Data: 1990  1      Goddard      N  14.95  12.72
Reading Data: 1991  1      Goddard      N  14.46  12.49
Reading Data: 1992  1      Goddard      N  14.72  12.54
Reading Data: 1993  1      Goddard      N  15.08  12.85
Reading Data: 1994  1      Goddard      N  14.82  12.80
Reading Data: 1995  1      Goddard      N  14.62

The ``input_file`` file object contains a pointer to the open file. As the code performs data reads in the for loop, the file pointer moves to the next record.

Data streaming obtains all the records from a data source. You may find that
you don’t need all the records. You can save time and resources by simply
sampling the data.

In [6]:
n = 17
with open("files/SeaIce.txt", 'r') as input_file:
    for j, observation in enumerate(input_file):
        if j % n==0:
            print('Reading Line: ' + str(j) + ' Content: ' + observation, end="")

Reading Line: 0 Content: year mo    data_type region extent   area
Reading Line: 17 Content: 1995  1      Goddard      N  14.62  12.72
Reading Line: 34 Content: 2012  1      Goddard      N  13.77  11.87
Reading Line: 51 Content: 1993  2      Goddard      N  15.73  13.54
Reading Line: 68 Content: 2010  2      Goddard      N  14.59  12.60
Reading Line: 85 Content: 1991  3      Goddard      N  15.50  13.35
Reading Line: 102 Content: 2008  3      Goddard      N  15.22  13.20
Reading Line: 119 Content: 1990  4      Goddard      N  14.68  12.16
Reading Line: 136 Content: 2007  4      Goddard      N  13.87  11.75
Reading Line: 153 Content: 1989  5      Goddard      N  12.98  11.30
Reading Line: 170 Content: 2006  5      Goddard      N  12.62  10.39
Reading Line: 187 Content: 1988  6      Goddard      N  12.02   9.62
Reading Line: 204 Content: 2005  6      Goddard      N  11.29   8.74
Reading Line: 221 Content: 1987  7      Goddard      N   9.98   6.84
Reading Line: 238 Content: 2004  7      G

You can perform random sampling as well.

In [7]:
import random
sample_size = 0.01
with open("files/SeaIce.txt", 'r') as input_file:
    for j, observation in enumerate(input_file):
        if random.random()<=sample_size:
            print('Reading Line: ' + str(j) + ' Content: ' + observation, end= "") 

Reading Line: 53 Content: 1995  2      Goddard      N  15.24  13.30
Reading Line: 69 Content: 2011  2      Goddard      N  14.38  12.41
Reading Line: 125 Content: 1996  4      Goddard      N  14.22  12.23
Reading Line: 331 Content: 1992 10      Goddard      N   9.60   7.69
Reading Line: 340 Content: 2001 10      Goddard      N   8.59   6.59
Reading Line: 341 Content: 2002 10      Goddard      N   8.81   6.20
Reading Line: 380 Content: 2005 11      Goddard      N  10.47   8.73


A flat file presents the easiest kind of file to work with. 

A problem with using native Python techniques is that the input isn’t intelligent. For example, when a file contains a header, Python simply reads it as yet more data to process, rather than as a header (not a problem for Pandas!).

The least formatted and therefore easiest‐to‐read flat‐file format is the text file. However, a text file also treats all data as strings, so you often have to convert numeric data into other forms.

A comma‐separated value (CSV) file provides more formatting and more information, but it requires a little more effort to read.

At the high end of flat‐file formatting are custom data formats, such as an Excel file, which contains extensive formatting and could include multiple datasets in a single file.

### Reading from a CSV file

A CSV file provides more formatting than a simple text file. In fact, CSV files can become quite complicated. There is a standard that defines the format of CSV files, and you can see it at https://tools.ietf.org/html/rfc4180.

The ``csv`` module is useful for working with data exported from spreadsheets and databases into text files formatted with fields and records, commonly referred to as comma-separated value (CSV).

In [8]:
import csv

f = open("files/Advertising.csv", 'r')
try:
    reader = csv.reader(f)
    for row in reader:
        print(row)
finally:
    f.close()

['', 'TV', 'Radio', 'Newspaper', 'Sales']
['1', '230.1', '37.8', '69.2', '22.1']
['2', '44.5', '39.3', '45.1', '10.4']
['3', '17.2', '45.9', '69.3', '9.3']
['4', '151.5', '41.3', '58.5', '18.5']
['5', '180.8', '10.8', '58.4', '12.9']
['6', '8.7', '48.9', '75', '7.2']
['7', '57.5', '32.8', '23.5', '11.8']
['8', '120.2', '19.6', '11.6', '13.2']
['9', '8.6', '2.1', '1', '4.8']
['10', '199.8', '2.6', '21.2', '10.6']
['11', '66.1', '5.8', '24.2', '8.6']
['12', '214.7', '24', '4', '17.4']
['13', '23.8', '35.1', '65.9', '9.2']
['14', '97.5', '7.6', '7.2', '9.7']
['15', '204.1', '32.9', '46', '19']
['16', '195.4', '47.7', '52.9', '22.4']
['17', '67.8', '36.6', '114', '12.5']
['18', '281.4', '39.6', '55.8', '24.4']
['19', '69.2', '20.5', '18.3', '11.3']
['20', '147.3', '23.9', '19.1', '14.6']
['21', '218.4', '27.7', '53.4', '18']
['22', '237.4', '5.1', '23.5', '12.5']
['23', '13.2', '15.9', '49.6', '5.6']
['24', '228.3', '16.9', '26.2', '15.5']
['25', '62.3', '12.6', '18.3', '9.7']
['26', '262.

In [9]:
with open("files/Advertising.csv", 'r') as input_file:
    reader = csv.reader(input_file)
    for row in reader:
        print(row)

['', 'TV', 'Radio', 'Newspaper', 'Sales']
['1', '230.1', '37.8', '69.2', '22.1']
['2', '44.5', '39.3', '45.1', '10.4']
['3', '17.2', '45.9', '69.3', '9.3']
['4', '151.5', '41.3', '58.5', '18.5']
['5', '180.8', '10.8', '58.4', '12.9']
['6', '8.7', '48.9', '75', '7.2']
['7', '57.5', '32.8', '23.5', '11.8']
['8', '120.2', '19.6', '11.6', '13.2']
['9', '8.6', '2.1', '1', '4.8']
['10', '199.8', '2.6', '21.2', '10.6']
['11', '66.1', '5.8', '24.2', '8.6']
['12', '214.7', '24', '4', '17.4']
['13', '23.8', '35.1', '65.9', '9.2']
['14', '97.5', '7.6', '7.2', '9.7']
['15', '204.1', '32.9', '46', '19']
['16', '195.4', '47.7', '52.9', '22.4']
['17', '67.8', '36.6', '114', '12.5']
['18', '281.4', '39.6', '55.8', '24.4']
['19', '69.2', '20.5', '18.3', '11.3']
['20', '147.3', '23.9', '19.1', '14.6']
['21', '218.4', '27.7', '53.4', '18']
['22', '237.4', '5.1', '23.5', '12.5']
['23', '13.2', '15.9', '49.6', '5.6']
['24', '228.3', '16.9', '26.2', '15.5']
['25', '62.3', '12.6', '18.3', '9.7']
['26', '262.

When you have data to be imported into some other application, writing ``csv`` files is just as easy as reading them. 

Use ``writer()`` to create an object for writing, then iterate over the rows, using ``writerow()`` to print them.  

In [10]:
import csv

ifile  = open("files/Advertising.csv", 'r')
reader = csv.reader(ifile)

ofile  = open('test.csv', "w")
writer = csv.writer(ofile, delimiter=',', lineterminator='\n')

for row in reader:
    writer.writerow(row)

ifile.close()
ofile.close()

## Advanced Python

### Functional Programming

* lambda

* map

* filter

* reduce

#### Lambda

Function without a name. Useful when you want to declare a function online, usally for using it as a parameter for another function.

In [None]:
def old_add (a,b):
    return a+b

new_add = lambda a, b: a + b

new_add(4,5) == 4 + 5  and new_add(4,5) == old_add(4,5)


In [None]:
? sorted

In [None]:
unsorted = [('b', 6), ('a', 10), ('d', 0), ('c', 4)]
print(sorted(unsorted))
print(sorted(unsorted, key=lambda x: x[1]))

#### Map

Takes a function and a collection of values as a parameters. Exectutes the function in each element of the list. Useful when you want to use the functional pattern: map-reduce, normally when you wnat to process collections of data.

In [None]:
values = [1, 2, 3, 4, 5]
# Note: We convert the returned map object to
# a list data structure.

add_10 = list(map(lambda x: x + 10, values))
add_20 = list(map(lambda x: x + 20, values))
print(add_10)
print (add_20)


In [None]:
import math
list(map(math.sqrt,values))

#### Filter
Takes a boolen function and a collection of values as a parameters. Executes the boolean function to each value in the collection and return those of them that result is true. Useful for keep values that checks a certain test.


In [None]:
values = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Note: We convert the returned filter object to
# a list data structure.
even = list(filter(lambda x: x % 2 == 0, values))
odd = list(filter(lambda x: x % 2 == 1, values))

print(even)

print(odd)

#### Reduce

Takes a function with two parameters, one can be ignored, and returns a value which is an operation between the two parameters. It is useful to sum up the results of a collection aplying a function.


In [None]:
from functools import reduce

values = [1, 2, 3, 4]

summed = reduce(lambda a, b: a + b, values)
print(summed)


### Lists (and dictionary) comprehensions

Lists comprehensions are a way to fit a ``for`` loop, an ``if`` statement, and an assignment all in one line.

A list comprehension consists of the following parts:

+ An input sequence.
+ A variable representing members of the input sequence.
+ An optional expression.
+ An output expression producing elements of the output list from members of the input sequence that satisfy the predicate.

Map and filter can be rewrite like list comprehensions

In [None]:
num = [1, 4, -5, 10, -7, 2, 3, -1]
squared = [] #No pythonic! 4 lines, not just easy one! One iteration
for i in num:
     if i > 0:
        squared.append(i**2)
print(type(squared), squared)

squared = [ x**2 for x in num if x > 0] # One itaration, first checks if it is true and then makes the computation
print(type(squared), squared)

squar2= map(lambda x: x**2 ,filter(lambda x: x>0, num))  ## Two iterations over num and over filtered list
print(type(squared),list(squar2))

+ Rewrite the previous map and filter examples using list comprhensions:

In [None]:
#TODO here!


* Iterating through two lists (using zip) vs nested fors vs Directories

In [None]:
values1 = ['a','b','c','d']
values2 = [1,2,3,4]

print ([(i,j) for i in values1 for j in values2])

print ([(i,j) for i,j in zip(values1,values2)])

print ( {i:j for i,j in zip(values1,values2)} )
    

### Generators

There is a downside to list comprehensions: the entire list has to be stored in memory at once. This isn’t a problem for small lists like the ones in the above examples, or even of lists several orders of magnitude larger. But we can use **<font color="red">generators</font>** to solve this problem.

Generator expressions do not load the whole list into memory at once, but instead create a *generator object* so only one list element has to be loaded at any time.

Generator expressions have the same syntax as list comprehensions, but with parentheses around the outside instead of brackets:

In [7]:
num = [1, 4, -5, 10, -7, 2, 3, -1]

squared = ( x**2 for x in num if x > 0 )
print(type(squared), squared)

<class 'generator'> <generator object <genexpr> at 0x10b192eb8>


In [8]:
#The elements of the generator must be accessed by an iterator because they are generated when needed:

lis = []
for item in squared:
    lis = lis + [item]
print(lis)

[1, 16, 100, 4, 9]


We can define our own generators with the ``yield`` statesment. For example, let's build a generator for the binary representation of a number between 0 and 1 with arbitrary precision.

In [9]:
# binary representation of a number between 0 and 1 (b bits precision).

def res(n,b):
    bin_a = '.'
    for i in range(b):
        n *= 2
        bin_a +=  str(int(n))
        n = n % 1
    return bin_a

print(res(1/3.0,10))

.0101010101


In [11]:
# binary representation of a number between 0 and 1 (precision as needed).

def binRep(n):
    while True:
        n *= 2
        yield int(n)
        n = n % 1


a = binRep(1/3.) 
a_bin = '.'
for i in range(100):
    a_bin +=  str(next(a))
    
print(a_bin)

.0101010101010101010101010101010101010101010101010101010000000000000000000000000000000000000000000000


A more ellegant way of reading a CSV file:
+  Wrap the CSV reader in a function that returns a generator
+  Use context managers ``with [callable] as [name]`` to ensure that the handle to the file is closed automatically.
+  Use the ``csv.DictReader`` class when headers are present (otherwise just use ``csv.reader``)

In [12]:
import csv

ADV = 'files/Advertising.csv'

def read_data(path):
    with open(path, 'r') as data:
        reader = csv.DictReader(data)
        for row in reader:
            yield row

for idx, row in enumerate(read_data(ADV)):
    if idx < 15: print(row)
    else: break

OrderedDict([('', '1'), ('TV', '230.1'), ('Radio', '37.8'), ('Newspaper', '69.2'), ('Sales', '22.1')])
OrderedDict([('', '2'), ('TV', '44.5'), ('Radio', '39.3'), ('Newspaper', '45.1'), ('Sales', '10.4')])
OrderedDict([('', '3'), ('TV', '17.2'), ('Radio', '45.9'), ('Newspaper', '69.3'), ('Sales', '9.3')])
OrderedDict([('', '4'), ('TV', '151.5'), ('Radio', '41.3'), ('Newspaper', '58.5'), ('Sales', '18.5')])
OrderedDict([('', '5'), ('TV', '180.8'), ('Radio', '10.8'), ('Newspaper', '58.4'), ('Sales', '12.9')])
OrderedDict([('', '6'), ('TV', '8.7'), ('Radio', '48.9'), ('Newspaper', '75'), ('Sales', '7.2')])
OrderedDict([('', '7'), ('TV', '57.5'), ('Radio', '32.8'), ('Newspaper', '23.5'), ('Sales', '11.8')])
OrderedDict([('', '8'), ('TV', '120.2'), ('Radio', '19.6'), ('Newspaper', '11.6'), ('Sales', '13.2')])
OrderedDict([('', '9'), ('TV', '8.6'), ('Radio', '2.1'), ('Newspaper', '1'), ('Sales', '4.8')])
OrderedDict([('', '10'), ('TV', '199.8'), ('Radio', '2.6'), ('Newspaper', '21.2'), ('Sale

The file is not opened, read, or parsed until you need it. This is powerful because it means that even for much larger data sets you will have efficient, portable code. 

In [13]:
data = read_data(ADV)
print(data)
for item in data:
    print(item)

<generator object read_data at 0x10b1c95c8>
OrderedDict([('', '1'), ('TV', '230.1'), ('Radio', '37.8'), ('Newspaper', '69.2'), ('Sales', '22.1')])
OrderedDict([('', '2'), ('TV', '44.5'), ('Radio', '39.3'), ('Newspaper', '45.1'), ('Sales', '10.4')])
OrderedDict([('', '3'), ('TV', '17.2'), ('Radio', '45.9'), ('Newspaper', '69.3'), ('Sales', '9.3')])
OrderedDict([('', '4'), ('TV', '151.5'), ('Radio', '41.3'), ('Newspaper', '58.5'), ('Sales', '18.5')])
OrderedDict([('', '5'), ('TV', '180.8'), ('Radio', '10.8'), ('Newspaper', '58.4'), ('Sales', '12.9')])
OrderedDict([('', '6'), ('TV', '8.7'), ('Radio', '48.9'), ('Newspaper', '75'), ('Sales', '7.2')])
OrderedDict([('', '7'), ('TV', '57.5'), ('Radio', '32.8'), ('Newspaper', '23.5'), ('Sales', '11.8')])
OrderedDict([('', '8'), ('TV', '120.2'), ('Radio', '19.6'), ('Newspaper', '11.6'), ('Sales', '13.2')])
OrderedDict([('', '9'), ('TV', '8.6'), ('Radio', '2.1'), ('Newspaper', '1'), ('Sales', '4.8')])
OrderedDict([('', '10'), ('TV', '199.8'), ('R

### Objects

You can define your own classes and objects.

In [None]:
#creating a class

class Rectangle:
    def __init__(self,x,y):
        self.x = x
        self.y = y
    description = "This shape has not been described yet"
    author = "Nobody has claimed to make this shape yet"
    def area(self):
        return self.x * self.y
    def perimeter(self):
        return 2 * self.x + 2 * self.y
    def describe(self,text):
        self.description = text
    def authorName(self,text):
        self.author = text
    def scaleSize(self,scale):
        self.x = self.x * scale
        self.y = self.y * scale

#creating objects
a = Rectangle(100, 45)
b = Rectangle(10,230)

#describing the rectangles
a.describe("A fat rectangle")
b.describe("A thin rectangle")

In [None]:
#finding the area of your rectangle:
print(a.area())
 
#finding the perimeter of your rectangle:
print(a.perimeter())

#getting the description
print(a.description)
print(a.author)

In [None]:
#finding the area of your rectangle:
print(b.area())
print(b.description)

#making the rectangle 50% smaller
b.scaleSize(0.5)
b.describe("A small thin rectangle")
 
#re-printing the new area of the rectangle
print(b.area())
print(b.description)

### Functions and objects -> Decorators


In Python, functions are first-class objects. This means that functions can be passed around and used as arguments, just like any other object (string, int, float, list, and so on). Python also allows you to use functions as return values. You can do functions that accept functions as parameters. So decorators are a wrapper for a function, modifying its behavior:


In [1]:
def my_decorator(func):
    def wrapper():
        print("Something is happening before the function is called.")
        func()
        print("Something is happening after the function is called.")
    return wrapper

def say_whee():
    print("Whee!")

say_whee = my_decorator(say_whee) #Overwrite say_whee using the decorator wraper

say_whee()

Something is happening before the function is called.
Whee!
Something is happening after the function is called.


You can use @ sintax to define a function that will use a decorator:

In [2]:
def my_decorator(func):
    def wrapper():
        print("Something is happening before the function is called.")
        func()
        print("Something is happening after the function is called.")
    return wrapper

@my_decorator ##  say_wee= my_decorator(say_whee)
def say_whee():
    print("Whee!")
    
say_whee()

Something is happening before the function is called.
Whee!
Something is happening after the function is called.


* Can I decorate with my_decorator functions that have parameters ?

In [4]:
@my_decorator
def greet(name):
    print(f"Hello {name}")
greet("Eloi")

TypeError: wrapper() takes 0 positional arguments but 1 was given

In [5]:
def my_decorator(func):
    def wrapper(*args, **kwargs):
        print("Something is happening before the function is called.")
        func(*args, **kwargs)
        print("Something is happening after the function is called.")
    return wrapper

@my_decorator
def greet(name):
    print(f"Hello {name}")
greet("Eloi")

Something is happening before the function is called.
Hello Eloi
Something is happening after the function is called.


A list can be passed as a set of parameters to a function using *
A Dictionary can be passed as a set of optional parameters to a function using **

In [6]:
args= [1,2,3]
kargs = {'d':4}

def f(a,b,c,d=0):
    print(a+b+c+d)
f(*args,**kargs)    

10


### Some goodies

## Help: Python Tutorial

In [51]:
from IPython.display import HTML
HTML('<iframe src=http://docs.python.org/3/tutorial/index.html?useformat=mobile width=780 height=350></iframe>')

In [None]:
! pip install tqdm #this will not work in Windows 

In [None]:
import tqdm
from time import sleep
text = ""
for char in tqdm.tqdm(range(1000)):
    sleep(0.01)