# Caracteristicas de Python
- Todo es un objeto
- Dinamicamente tipado: Se determina el tipo en tiempo de ejecucion
- Se maneja por identacion. Configurar editores para que usen 4 espacios en vez de tab

## Instalación de paquetes

- La unica vez que pueden instalar algo con python-XXX
> sudo apt-get install python-setuptools python-dev

- muchos paquetes tienen dependencias de librerias en C que necesitan compilarse
> sudo apt-get install build-essential

- numpy, scipy, pandas, etc...
> sudo apt-get install libblas-dev liblapack-dev libatlas-base-dev gfortran

- ahora si pueden usar pip (tomen un cafe porque tarda un rato en compilar algunas cosas la primer vez que lo instalan)
> pip install numpy
> pip install scipy
> pip install pandas


## Tipos Bases
- Enteros (int)
- Reales (float)
- Booleanos (bool)
- Strings (string)

### Strings

Escape characters:
\' Single quote
\" Double quote
\t Tab
\n Newline (line break)
\\ Backslash

You can place an r before the beginning quotation mark of a string to make it a raw string. A raw string completely ignores all escape characters and prints any backslash that appears in the string.

Useful methods:
- upper(), lower(), isupper(), and islower() 
- isalpha() returns True if the string consists only of letters and is not blank.
- isalnum() returns True if the string consists only of letters and numbers and is not blank.
- isdecimal() returns True if the string consists only of numeric characters and is not blank.
- isspace() returns True if the string consists only of spaces, tabs, and new-lines and is not blank.
- istitle() returns True if the string consists only of words that begin with an uppercase letter followed by only lowercase letters.
- startswith() and endswith() methods return True if the string value they are called on begins or ends (respectively) with the string passed to the method; otherwise, they return False.
- join() method is useful when you have a list of strings that need to be joined together into a single string value. The - join() method is called on a string, gets passed a list of strings, and returns a string. The returned string is the concatenation of each string in the passed-in list.
- split() method does the opposite: It’s called on a string value and returns a list of strings. You can pass a delimiter string to the split() method to specify a different string to split upon. By default is whitespaces.
- Removing Whitespace with strip(), rstrip(), and lstrip()
- pyperclip module has copy() and paste() functions that can send text to and receive text from your computer’s clipboard. Sending the output of your program to the clipboard will make it easy to paste it to an email, word processor, or some other software.

In [None]:
# Raw strings:
print(r'That is Carol\'s cat.')

# Multiline:
print('''Dear Alice,

Eve's cat has been arrested for catnapping, cat burglary, and extortion.

Sincerely,
Bob''')

print('ABC'.join(['My', 'name', 'is', 'Simon']))

print('Hello' in 'Hello world')

# Justificar texto:
print('Hello'.rjust(10))
print('Hello'.rjust(20, '*'))
print('Hello'.center(20, '='))
print('Hello'.ljust(10, '-'))

# import pyperclip
# pyperclip.copy('Hello world!')
# pyperclip.paste()

### Regular Expresions

Passing a string value representing your regular expression to **re.compile()** returns a Regex pattern object (or simply, a Regex object).

A Regex object’s **search()** method searches the string it is passed for any matches to the regex. The search() method will return None if the regex pattern is not found in the string. If the pattern is found, the search() method returns a Match object. 

Match objects have a **group()** method that will return the actual matched text from the searched string.

Regex tester:
http://regexpal.com/

The | character is called a **pipe**. You can use it anywhere you want to match one of many expressions. For example, the regular expression r'Batman|Tina Fey' will match either 'Batman' or 'Tina Fey'.

When both Batman and Tina Fey occur in the searched string, the first occurrence of matching text will be returned as the Match object. You can find all matching occurrences with the findall() method.

The **?** character flags the group that precedes it as an optional part of the pattern. Note that the question mark can have two meanings in regular expressions: declaring a nongreedy match or flagging an optional group.

The ***** (called the star or asterisk) means “match zero or more”—the group that precedes the star can occur any number of times in the text. It can be completely absent or repeated over and over again.

The **+** (or plus) means “match one or more.” 

**Repetitions**: If you have a group that you want to repeat a specific number of times, follow the group in your regex with a number in curly brackets. For example, the regex (Ha){3} will match the string 'HaHaHa', but it will not match 'HaHa'.
You can also leave out the first or second number in the curly brackets to leave the minimum or maximum unbounded. For example, (Ha){3,} will match three or more instances of the (Ha) group, while (Ha){,5} will match zero to five instances. 

Python’s regular expressions are greedy by default, which means that in ambiguous situations they will match the longest string possible. The non-greedy version of the curly brackets, which matches the shortest string possible, has the closing curly bracket followed by a question mark.

#### Character classes

\d: Any numeric digit from 0 to 9.
\D: Any character that is not a numeric digit from 0 to 9.
\w: Any letter, numeric digit, or the underscore character. (Think of this as matching “word” characters.)
\W: Any character that is not a letter, numeric digit, or the underscore character.
\s: Any space, tab, or newline character. (Think of this as matching “space” characters.)
\S: Any character that is not a space, tab, or newline.

- The character class [0-5] will match only the numbers 0 to 5
- The character class [a-zA-Z0-9] will match all lowercase letters, uppercase letters, and numbers.
- By placing a caret character (^) just after the character class’s opening bracket, you can make a negative character class. A negative character class will match all the characters that are not in the character class. re.compile(r'[^aeiouAEIOU]')

- You can also use the caret symbol (^) at the start of a regex to indicate that a match must occur at the beginning of the searched text. Likewise, you can put a dollar sign ($) at the end of the regex to indicate the string must end with this regex pattern (r'^Hello' regular expression string matches strings that begin with 'Hello')

- The . (or dot) character in a regular expression is called a wildcard and will match any **one** character except for a newline.

- You can use the dot-star (.*) to stand in for that “anything.”

- The dot-star will match everything except a newline. By passing re.DOTALL as the second argument to re.compile(), you can make the dot character match all characters, including the newline character.


Resumen: 
The ? matches zero or one of the preceding group.

The * matches zero or more of the preceding group.

The + matches one or more of the preceding group.

The {n} matches exactly n of the preceding group.

The {n,} matches n or more of the preceding group.

The {,m} matches 0 to m of the preceding group.

The {n,m} matches at least n and at most m of the preceding group.

{n,m}? or *? or +? performs a nongreedy match of the preceding group.

^spam means the string must begin with spam.

spam$ means the string must end with spam.

The . matches any character, except newline characters.

\d, \w, and \s match a digit, word, or space character, respectively.

\D, \W, and \S match anything except a digit, word, or space character, respectively.

[abc] matches any character between the brackets (such as a, b, or c).

[^abc] matches any character that isn’t between the brackets.

---

- To make your regex case-insensitive, you can pass re.IGNORECASE or re.I as a second argument to re.compile(). 
- The sub() method for Regex objects is passed two arguments. The first argument is a string to replace any matches. The second is the string for the regular expression. The sub() method returns a string with the substitutions applied.

In [None]:
import re

phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('My number is 415-555-4242.')
print('Phone number found: ' + mo.group())

# Grouping with parenthesis
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My number is 415-555-4242.')
mo.group(1) # '415'
mo.group(2) # '355-4242'
mo.group(0) # '415-555-4242'
mo.group() # '415-555-4242'

mo.groups() # ('415', '555-4242')
areaCode, mainNumber = mo.groups()

# Pipe
heroRegex = re.compile (r'Batman|Tina Fey')
mo1 = heroRegex.search('Batman and Tina Fey.')
mo1.group()

batRegex = re.compile(r'Bat(wo)?man')
mo1 = batRegex.search('The Adventures of Batman')
mo1.group()

batRegex = re.compile(r'Bat(wo)*man')
mo1 = batRegex.search('The Adventures of Batman')
mo1.group()


xmasRegex = re.compile(r'\d+\s\w+')
xmasRegex.findall('12 drummers, 11 pipers, 10 lords, 9 ladies, 8 maids, 7 swans, 6 geese, 5 rings, 4 birds, 3 hens, 2 doves, 1 partridge')
# ['12 drummers', '11 pipers', '10 lords', '9 ladies', '8 maids', '7 swans', '6 geese', '5 rings', '4 birds', '3 hens', '2 doves', '1 partridge']

vowelRegex = re.compile(r'[aeiouAEIOU]')
vowelRegex.findall('Robocop eats baby food. BABY FOOD.')
# ['o', 'o', 'o', 'e', 'a', 'a', 'o', 'o', 'A', 'O', 'O']

nameRegex = re.compile(r'First Name: (.*) Last Name: (.*)')
mo = nameRegex.search('First Name: Al Last Name: Sweigart')

namesRegex = re.compile(r'Agent \w+')
namesRegex.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob.')
# 'CENSORED gave the secret documents to CENSORED.'


You can mitigate this by telling the re.compile() function to ignore whitespace and comments inside the regular expression string. This “verbose mode” can be enabled by passing the variable re.VERBOSE as the second argument to re.compile().

Now instead of a hard-to-read regular expression like this:


phoneRegex = re.compile(r'((\d{3}|\(\d{3}\))?(\s|-|\.)?\d{3}(\s|-|\.)\d{4}
(\s*(ext|x|ext.)\s*\d{2,5})?)')
you can spread the regular expression over multiple lines with comments like this:


phoneRegex = re.compile(r'''(
    (\d{3}|\(\d{3}\))?            # area code
    (\s|-|\.)?                    # separator
    \d{3}                         # first 3 digits
    (\s|-|\.)                     # separator
    \d{4}                         # last 4 digits
    (\s*(ext|x|ext.)\s*\d{2,5})?  # extension
    )''', re.VERBOSE)

## Estructuras
### Listas
- Objeto mutable
- Contienen cualquier tipo de variables
- No tienen un máximo de elementos (mientras que entren en memoria)
- Los valores se pueden repetir
- Estan ordenados

#### Slices
Indices van del 0 al -1 y pueden omitirse:
sublista = lista[inicio:fin:saltos]

#### List Comprehension
Para cada valor de un iterable (lista, set, dicc, tupla) se puede: 
    - filtrar
    - aplicar una funcion

In [None]:
# Crear lista con valores
lista = [1, 2, 3]
# Agregar un valor
lista.append(4)
# Obtener un valor
primero = lista[0]

# Armar sublista
list2 = lista[1:-2]

# Recorrer lista
for valor in lista:
    print (valor)

# Recorrer lista de tuplas:
l = [('a', 1), ('b', 2), ('c', 3)] 
for letra, numero in l:
    print (letra, numero)

# List comprehension
l2 = [value * 2 for value in lista if value < 5]

# Imprime un diccionario a partir de una lista
print ({x: x**2 for x in lista if x < 5})

### Sets
- **Conjuto** de elementos mutable
- No se pueden repetir los valores (Si se intenta agregar un elemento que ya existe, se ignora)
- No estan ordenados

In [None]:
# Crear set con valores
set = set([1, 2, 3, 1, 2])
# Agregar valor
set.add(4)
set.add(1)
print(set)

# Recurrer tupla
for valor in set:
    if valor == 1:
        break
    print valor


### Diccionarios
- Elementos Clave/Valor mutables
- Las claves deben ser unicas
- Cada clave puede contener una lista de valores

In [None]:
d = {'a': 1, 'b': 2, 'c': 3}
print (d['a'])
d['d'] = 4
d.update({'e': 5, 'f': 6})
d['a'] = 123
print(d)

# Recorrer diccionario por key
for key in d:
    value = d[key]
    if value < 3:
        continue

for key, value in d.items():
    if value < 3:
        continue


### Tuplas
- Iterable inmutable parecido a la lista
- No se pueden agregar nuevos elementos, ni modificar los existentes
- Se pueden repetir los valores, y se piden por indice

In [None]:
t1 = ()
t2 = (1, 2, 3)
print (t1)
print (t2[2])

## Bloques
### If
if (bool1 and bool2) or (not bool2):
    print "Entro..."

Todos los valores se evalúan a false:

    Si es null, entonces evalua a false
    Si es un string y es vacio, entonces evalúa false
    Si es un número (entero o flotante) y es 0, entonces evalúa como false
    Si es una lista/set/diccionario, entonces si esta vacio, entonces evalúa false
    
### For
for i in range(0, 10):
    for j in range(1, 15):
        print i, j

- En python el FOR itera sobre iterables (lista, set, diccionario, tuplas, generadores, etc..)
- Para hacer un for que se repita una cantidad limitada de veces, entonces tenemos que usar la funcion xrange(0, X) donde X es la cantidad de veces que se ejecute el mismo.

### While
- El continue hace que no se ejecute lo que esta debajo del mismo, pero vuelve a evaluar la condicion del while
- El break termina la ejecucion del ciclo.
while True:
    value = funcion_super_compleja()
        if value and foobar:
            continue
        elif value:
            break
    print "Esto no se ejecuta cuando se ejecuta el continue o el break"

## Funciones
### Definición
- No se especifica si devuelve un valor o no
- No se especifica el tipo de valores que recibe
- No especifica si tira una excepción

- Python no permite function overloading. La misma funcion que recibe distinta cantidad de parametros, Para eso generalmente se usan los parametros por default

- *args: se puede puede recibir todos los parametros extras que se le pasen a la función en forma **tupla**.
- Tambien funciona cuando la función tiene parametros y el *args usa todos los extra

- \**kwargs: como args, pero formato **diccionario**, y se tiene que especificar el nombre de los parametros

In [None]:
def foobar(a, b, c=3, d=4):
    return a + b + c + d

print foobar(1, 2)
print foobar(1, 2, 3)
print foobar(1, 2, 3, 5)
print foobar(1, 2, d=8)
print foobar(a=1, b=2, c=3, d=10)

def promedio(*args):
    suma =  sum(args)
    cant_elems = 1.0 * len(args)
    return suma / cant_elems

promedio(1, 2, 3, 4, 5)
promedio(1, 2)

# Tambien se pueden agregar otros parametros: 
def promedio(x, y, *args):
    print args
    suma =  sum(args) + x + y
    cant_elems = 2.0 + len(args)
    return suma / cant_elems

def foobar(x, y, **kwargs):
    print x, y
    print kwargs

foobar(1, 2, z=3, i=4)
d = {'foo': 123}
foobar(1, 2, **d)

### Funciones Utiles
- zip: junta todas las listas, y genera una lista de tuplas. Cada tupla va a tener un valor de cada lista
- enumerate: devuelve el elemento de la lista que se esta iterando y el indice del mismo
- len: calcula la cantidad de elementos que tiene una lista/set/etc.... En el caso de que sea un diccionario, devuelve la cantidad de keys que tiene el mismo
- min: calcula el minimo elemento en una lista
- max: calcula el maximo
- sum: calcula la suma de los elementos de una lista

In [None]:
l1 = [1, 2, 3, 4]
l2 = ['a', 'b', 'c', 'd']
l3 = ['+', '*', '-', '/']

for num, letra, signo in zip(l1, l2, l3):
    print (num, letra, signo)

## Generadores
Usados para iterables que tiene muchos elementos (pueden tener infinitos). Por ejemplo, generar una lista de millones de elementos.

No generan todos los elementos de la lista, sino que los va generando a partir de que se va pidiendo el siguiente elemento del iterador

Una función devuelve un generador cuando usa el yield

Tiene cosas en común con el iterador de java. Una vez que se llega hasta el final el mismo no se puede volver a usar.

In [None]:
# Dos formas de imprimir objetos, con generador y con lista:
l = current_super_list(0, 10000000**10000000):
    for x in l:
        print x

for y in l:
    print y

# Más performante
def create_super_list(min, max):
current = min
while current < max:
    yield current
    current += 1

li2 = current_super_list(0, 10000000**10000000):
for z in li2:
    print z

## Lambda
Funciones especiales que tomas uno o mas parámetro y sólo devuelven un valor
- No pueden tener bloques: if, for, while.
- Se las usa mucho en las funciones map, reduce, filter

lambda argument: manipulate(argument)

**map()** is a function which takes two arguments: 
r = map(func, seq)

- The first argument func is the name of a function and the second a sequence (e.g. a list) seq. map() applies the function func to all the elements of the sequence seq



In [None]:
add = lambda x, y: x + y
print(add(3, 5))

# List sorting
a = [(1, 2), (4, 1), (9, 10), (13, -3)]
a.sort(key=lambda x: x[1])
print(a)

# Map function:
def fahrenheit(T):
    return ((float(9)/5)*T + 32)
 
temperatures = (36.5, 37, 37.5, 38, 39)
F = map(fahrenheit, temperatures)
temperatures_in_Fahrenheit = list(map(fahrenheit, temperatures))
print(temperatures_in_Fahrenheit)

# Map using Lambda:
C = [39.2, 36.5, 37.3, 38, 37.8] 
F = list(map(lambda x: (float(9)/5)*x + 32, C))
print(F)

print (list(filter(lambda val: 0 <= val < 38, C)))

## Clases
- No es obligatorio en python usar clases
- Todos los métodos de la clases tienen que tomar como primer parámetro la instancia de la misma
- El __init__ es similar a lo que en otros lenguajes es el constructor
- No existen atributos o metodos privados, public o protected. Todo es public. Por convención, todos los metodos o atributos que son privados o protected tienen que empezar con “_”

In [None]:
class Pepe:
    pass

class FooBar(Pepe):
    def __init__(self, a, b):
        super(FooBar, self).__init__()
        self._a = a
        self._b = b
    def sum(self):
        return self._a + self._b

instance = FooBar(1, 2)
print instance.sum()

## Magic Methods
- Son métodos que empiezan y terminan con “__” (doble guion bajo)
- No son llamados explícitamente por nosotros pero si por python cuando se hace cierta funcionalidad
- Los mas generales son: __init__, __repr__, __str__, __eq__, __hash__, __enter__, __exit__
- lista mas completa: http://www.diveintopython3.net/special-method-names.html

- El __hash__ es especialmente importante porque se lo usa para chequear si el elemento ya está en set o diccionario
- El __eq__ se lo usa para comparar si dos instancias de la clase son iguales

Por default dos objetos NO son iguales. Es decir, sino defino el método __eq__, entonces los objetos van a ser diferentes por mas de que tengan los mismos valores

Algo similar pasa con el __hash__. Por default para cada instancia del mismo objeto va a ser diferente, por lo que en un set se van a agregar

In [None]:
class MyClass(object):
    def __init__(self, value):
        self.value = value
    def __add__(self, other):
            return MyClass(self.value + other.value)
    def __eq__(self, other):
        return (self.x == other.x and self.y == other.y)
    def __hash__(self):
        return self.x

v1 = MyClass(1)
v2 = MyClass(2)
print (v1 + v2).value
# (No se llama v.__add__(v2), sino que directamente se usa el operador “+”, Lo mismo con el __eq__)

# Los siguientes ejemplos:
# Se encargan de cerrar el file si hubo una excepción o no automáticamente. En caso de que haya una excepción, cierra el file y lanza la excepción
# Esta usando el __enter__ y __exit__ de las clases
with open('/tmp/mi_archivo.txt') as input_file:
    for line in input_file:
        print line

with open('/etc/passwd', 'w') as output_file:
    output_file.write('Foobar')

# tambien se puede usar bases de datos 
with psycopg2.connect(DSN) as conn:
    with conn.cursor() as curs:
        curs.execute(SQL)


## Ayuda
- dir: dado una variable, lista todos los métodos y atributos que tiene la misma
- help: para un método de una variable, muestra el texto de ayuda del mismo. 

In [None]:
lista = []
dir(lista)
[name for name in dir(lista) if not name.startswith('__')]
help(lista.append)

## Files
Calling **os.path.abspath(path)** will return a string of the absolute path of the argument. This is an easy way to convert a relative path into an absolute one.

Calling **os.path.isabs(path)** will return True if the argument is an absolute path and False if it is a relative path.

Calling **os.path.relpath(path, start)** will return a string of a relative path from the start path to path. If start is not provided, the current working directory is used as the start path.

Calling **os.path.getsize(path)** will return the size in bytes of the file in the path argument.

Calling **os.listdir(path)** will return a list of filename strings for each file in the path argument. (Note that this function is in the os module, not os.path.)

Calling **os.path.exists(path)** will return True if the file or folder referred to in the argument exists and will return False if it does not exist.

Calling **os.path.isfile(path)** will return True if the path argument exists and is a file and will return False otherwise.

Calling **os.path.isdir(path)** will return True if the path argument exists and is a folder and will return False otherwise.


### There are three steps to reading or writing files in Python.

1. Call the **open()** function to return a File object. Modes: a, w, r, +

2. Call the **read()** or **write()** method on the File object.

3. Close the file by calling the **close()** method on the File object.


### Shelve Module
You can save variables in your Python programs to binary shelf files using the shelve module. This way, your program can restore data to variables from the hard drive. The shelve module will let you add Save and Open features to your program. For example, if you ran a program and entered some configuration settings, you could save those settings to a shelf file and then have the program load them the next time it is run.

Three new files in the current working directory: mydata.bak, mydata.dat, and mydata.dir. On OS X, only a single mydata.db file will be created.

Your programs can use the shelve module to later reopen and retrieve the data from these shelf files. 

In [None]:
import os

os.getcwd() # Get current working directory
os.chdir('C:\\Windows\\System32') # Change working directory
# os.makedirs(C:\\path to dir\\name of dir)

os.path.basename(os.getcwd())
os.path.dirname(os.getcwd())
os.path.split(os.getcwd())

'/usr/bin'.split(os.path.sep) # Linux : ['usr', 'bin']

helloFile = open('C:\\Users\\your_home_folder\\hello.txt', 'a+')

helloContent = helloFile.read()
print(helloContent) 

helloContent.write('Bacon is not a vegetable.')

sonnetFile = open('sonnet29.txt')
sonnetFile.readlines() #saves lines in an array

# Shelve module
import shelve
shelfFile = shelve.open('mydata')
cats = ['Zophie', 'Pooka', 'Simon']
shelfFile['cats'] = cats
shelfFile.close()

shelfFile = shelve.open('mydata')
type(shelfFile)
# <class 'shelve.DbfilenameShelf'>
shelfFile['cats']
# ['Zophie', 'Pooka', 'Simon']
list(shelfFile.keys())
# ['cats']
list(shelfFile.values())
# [['Zophie', 'Pooka', 'Simon']]
shelfFile.close()


### Shutil
The shutil (or shell utilities) module has functions to let you copy, move, rename, and delete files in your Python programs.

Calling **shutil.copy(source, destination)** will copy the file at the path source to the folder at the path destination. (Both source and destination are strings.) If destination is a filename, it will be used as the new name of the copied file. This function returns a string of the path of the copied file.

**shutil.copytree()** will copy an entire folder and every folder and file contained in it. Calling shutil.copytree(source, destination) will copy the folder at the path source, along with all of its files and subfolders, to the folder at the path destination.

Calling **shutil.move(source, destination)** will move the file or folder at the path source to the path destination and will return a string of the absolute path of the new location. If destination points to a folder, the source file gets moved into destination and keeps its current filename. Since it’s easy to accidentally overwrite files in this way, you should take some care when using move(). The destination path can also specify a filename, the source file is moved and renamed.

#### Deleting files and folders
Calling **os.unlink(path)** will delete the file at path.

Calling **os.rmdir(path)** will delete the folder at path. This folder must be empty of any files or folders.

Calling **shutil.rmtree(path)** will remove the folder at path, and all files and folders it contains will also be deleted.

Since Python’s built-in shutil.rmtree() function irreversibly deletes files and folders, it can be dangerous to use. A much better way to delete files and folders is with the third-party **send2trash** module. It will send folders and files to your computer’s trash or recycle bin instead of permanently deleting them.

#### Go through files in a tree
The **os.walk()** function is passed a single string value: the path of a folder. You can use os.walk() in a for loop statement to walk a directory tree, much like how you can use the range() function to walk over a range of numbers. Unlike range(), the os.walk() function will return three values on each iteration through the loop:
    1. A string of the current folder’s name
    2. A list of strings of the folders in the current folder
    3. A list of strings of the files in the current folder
    

In [None]:
import os

for folderName, subfolders, filenames in os.walk(os.getcwd()):
    print('The current folder is ' + folderName)

    for subfolder in subfolders:
        print('SUBFOLDER OF ' + folderName + ': ' + subfolder)
    for filename in filenames:
        print('FILE INSIDE ' + folderName + ': '+ filename)

    print('')

#### Zip Files
Your Python programs can both create and open (or extract) ZIP files using functions in the **zipfile** module.

A ZipFile object has a namelist() method that returns a list of strings for all the files and folders contained in the ZIP file. These strings can be passed to the getinfo() ZipFile method to return a ZipInfo object about that particular file. ZipInfo objects have their own attributes, such as file_size and compress_size in bytes, which hold integers of the original file size and compressed file size, respectively. While a ZipFile object represents an entire archive file, a ZipInfo object holds useful information about a single file in the archive.

The extractall() method for ZipFile objects extracts all the files and folders from a ZIP file into the current working directory. you can pass a folder name to extractall() to have it extract the files into a folder other than the current working directory. If the folder passed to the extractall() method does not exist, it will be created. 
The extract() method for ZipFile objects will extract a single file from the ZIP file.

To create your own compressed ZIP files, you must open the ZipFile object in write mode by passing 'w' as the second argument. (This is similar to opening a text file in write mode by passing 'w' to the open() function.)

When you pass a path to the write() method of a ZipFile object, Python will compress the file at that path and add it into the ZIP file. The write() method’s first argument is a string of the filename to add. The second argument is the compression type parameter, which tells the computer what algorithm it should use to compress the files; you can always just set this value to zipfile.ZIP_DEFLATED. If you want to simply add files to an existing ZIP file, pass 'a' as the second argument to zipfile.ZipFile() to open the ZIP file in append mode.

In [None]:
import zipfile, os
os.chdir("D:\\Documents\\GitHub\\playground\\projects\\python\\automating-boring-stuff\\resources")    # move to the folder with example.zip
exampleZip = zipfile.ZipFile('example.zip')
print(exampleZip.namelist())
#  ['spam.txt', 'cats/', 'cats/catnames.txt', 'cats/zophie.jpg']
spamInfo = exampleZip.getinfo('spam.txt')
print(spamInfo.file_size)
#   13908
print(spamInfo.compress_size)
#   3828
print('Compressed file is %sx smaller!' % (round(spamInfo.file_size / spamInfo.compress_size, 2)))
#   'Compressed file is 3.63x smaller!'
exampleZip.extractall()
 exampleZip.extract('spam.txt', '.\\folder')
exampleZip.close()

newZip = zipfile.ZipFile('new.zip', 'w')
newZip.write('spam.txt', compress_type=zipfile.ZIP_DEFLATED)
newZip.close()

## Concurrencia
Python es single core. Todas las opercaciones que usen van a correr en el mismo core del CPU

Si se usa threads, los mismos van a correr en el mismo core que esta corriendo el proceso principal. Para solucionar esto es que se usa **multiprocessing** que hace un fork y por lo tanto corre en otro core

Las limitaciones de que Python corra en un unico core que deben al GIL (Global Interpreter Lock)

In [None]:
from multiprocessing import Process

def f(max):
    for i in range(0, max):
        print(i)

if __name__ == 'main':
    p = Process(targer=f, args=(100,))
    p.start()
    p.join()

## Exceptions
- try: todo lo que este dentro del bloque, se va a manejar la excepción
- except: se encarga de procesar el tipo de error específico. El Except puede procesar la excepcion o volver a lanzar la misma para que la procese otro. Si nadie la procesa en el punto de entrada del programa, el mismo va a terminar con un error. 
    - pass: Ignora el error
    - raise: lanza una exepcion
- finally: se ejecuta ocurra un error o no. Se lo usa mas que nada para cerrar los archivos o la conexión a la base de datos. El finally se ejecuta por mas de que el except tire una excepcion. Siempre se ejecuta el mismo.

Exceptions are raised with a raise statement. In code, a raise statement consists of the following:
- The raise keyword
- A call to the Exception() function
- A string with a helpful error message passed to the Exception() function

When Python encounters an error, it produces a treasure trove of error information called the **traceback**. The traceback includes the error message, the line number of the line that caused the error, and the sequence of the function calls that led to the error. This sequence of calls is called the call **stack**.

**traceback.format_exc()**: This function is useful if you want the information from an exception’s traceback but also want an except statement to gracefully handle the exception. You will need to import Python’s traceback module before calling this function.

An **assertion** is a sanity check to make sure your code isn’t doing something obviously wrong. These sanity checks are performed by assert statements. If the sanity check fails, then an AssertionError exception is raised. In code, an assert statement consists of the following:
- The assert keyword
- A condition (that is, an expression that evaluates to True or False)
- A comma
- A string to display when the condition is False

Example: assert podBayDoorStatus == 'open', 'The pod bay doors need to be "open".'

Assertions can be disabled by passing the -O option when running Python. This is good for when you have finished writing and testing your program and don’t want it to be slowed down by performing sanity checks (although most of the time assert statements do not cause a noticeable speed difference). Assertions are for development, not the final product. 

In [None]:
f = None
try:
    f = open('/etc/passwd', 'w')
    f.write("No ejecutar esto como root")
except FileNotFoundError:
    print("No se pudo abrir en el archivo")
    raise
finally:
    if f:
        f.close()

In [None]:
 def boxPrint(symbol, width, height):
        if len(symbol) != 1:
            raise Exception('Symbol must be a single character string.')
        if width <= 2:
            raise Exception('Width must be greater than 2.')
        if height <= 2:
            raise Exception('Height must be greater than 2.')
        print(symbol * width)
        for i in range(height - 2):
            print(symbol + (' ' * (width - 2)) + symbol)
        print(symbol * width)

for sym, w, h in (('*', 4, 4), ('O', 20, 5), ('x', 1, 3), ('ZZ', 3, 3)):
    try:
        boxPrint(sym, w, h)
    except Exception as err:
        print('An exception happened: ' + str(err))

In [None]:
import traceback
print(os.getcwd())
try:
    raise Exception('This is the error message.')
except:
    errorFile = open('errorInfo.txt', 'w')
    errorFile.write(traceback.format_exc())
    errorFile.close()
    print('The traceback info was written to errorInfo.txt.')

## Logging
Se crea un logger. El mismo se puede crear usando un nombre específico, o sino va a usar el nombre del módulo

El mismo tiene distintos niveles:

DEBUG - logging.debug(): The lowest level. Used for small details. Usually you care about these messages only when diagnosing problems.

INFO - logging.info(): Used to record information on general events in your program or confirm that things are working at their point in the program.

WARNING - logging.warning(): Used to indicate a potential problem that doesn’t prevent the program from working but might do so in the future.

ERROR - logging.error(): Used to record an error that caused the program to fail to do something.

CRITICAL - logging.critical(): The highest level. Used to indicate a fatal error that has caused or is about to cause the program to stop running entirely.

El logger puede dirigir el mensaje a un archivo, al standard output, error output, mail, etc...
El basic setup permite configurar el logging para que use el standard output o un archivo
Se puede ocultar los mensajes que no quieren que logear. Por ahi se lo quiere loguear en development pero no en produccion
Puede mandar el output a un archivo o std dependiendo de donde este corriendo
Esto hace que no tenga que andar comentando el código para producción.

The logging.disable() function disables these so that you don’t have to go into your program and remove all the logging calls by hand. You simply pass logging.disable() a logging level, and it will suppress all log messages at that level or lower. So if you want to disable logging entirely, just add logging.disable(logging.CRITICAL) to your program.

Instead of displaying the log messages to the screen, you can write them to a text file. The logging.basicConfig() function takes a filename keyword argument, like so:

logging.basicConfig(filename='myProgramLog.txt', level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')

In [None]:
import logging
logging.basicConfig(level=logging.DEBUG, format=' %(asctime)s - %(levelname)s - %(message)s')
logging.debug('Start of program')

def factorial(n):
    logging.debug('Start of factorial(%s)' % (n))
    total = 1
    for i in range(n + 1):
        total *= i
        logging.debug('i is ' + str(i) + ', total is ' + str(total))
    logging.debug('End of factorial(%s)' % (n))
    return total

print(factorial(5))
logging.debug('End of program')

In [None]:
import logging
logging.basicConfig()
logger = logging.getLogger()

def foo(a, b, c):
    res = a + b + c
    logger.debug('%s %s %s %s', a, b, c, res)
    return res

# Loggear dependiendo de donde se este corriendo: 
import logging
import socket

if socket.gethostname() == 'AR-IT01119':
    logging.basicConfig(level=logging.DEBUG)
else:
    logging.basicConfig(level=logging.WARNING, filename='/tmp/file.txt')
    
logger = logging.getLogger()

def super_recomplicada(a, b):
    logger.debug('Los valores son: %, %', a, b)
    if not b:
        logger.warning('El valor de a era 0 por lo que no se encuentra un resultado')
        return None
    return (a - b)

# Logging con excepciones
def division(a, b):
    return a / b
    try:
        division(1, 0)
    except Exception:
        logger.exception('Ocurrio un error')


In [None]:
import logging
logging.basicConfig(level=logging.DEBUG, format=' %(asctime)s - %(levelname)s - %(message)s')
logging.debug('Some debugging details.')
# 2015-05-18 19:04:26,901 - DEBUG - Some debugging details.
logging.info('The logging module is working.')
# 2015-05-18 19:04:35,569 - INFO - The logging module is working.
logging.warning('An error message is about to be logged.')
# 2015-05-18 19:04:56,843 - WARNING - An error message is about to be logged.
logging.error('An error has occurred.')
# 2015-05-18 19:05:07,737 - ERROR - An error has occurred.
logging.critical('The program is unable to recover!')
# 2015-05-18 19:05:45,794 - CRITICAL - The program is unable to recover!

## Estructuración de proyectos
/tmp/foo 
    setup.py
    /src
        a.py
        /B
            foo.py
            __init__.py
        __init__.py

- El nombre de la carpeta SRC no es obligatorio. Muchas veces se le pone el mismo nombre que el proyecto en el que se esta trabajando.
- a.py es un modulo, mientras que B es un paquete, y b.py es un modulo del paquete a
- Los archivos *.py son modulos
- Las carpetas que tenga el archivo __init__.py son paquetes
- Para que uno pueda importar los archivos de Python desde otro lado los archivos py no tienen que tener espacios

Cuando uno hace import foo, lo que hace python es lo siguiente:
- Si la carpeta donde se esta es un paquete (tiene el archivo __init__.py), entonces lo busca ahi.
- Si no lo encuentra lo busca en el sys.path

### setup.py
- Puede listar todas las dependencias del proyecto. (*)
- Se lo usa para poner el proyecto dentro de sys.path
- Ademas, tiene informacion extra como una descripcion del paquete, developers
It describes all of the metadata about your project. There a quite a few fields you can add to a project to give it a rich set of metadata describing the project. However, there are only three required fields: name, version, and packages. The name field must be unique if you wish to publish your package on the Python Package Index (PyPI). The version field keeps track of different releases of the project. The packages field describes where you’ve put the Python source code within your project.


### Paquetes comunes
- click: para crear comandos de linea de consola y poder parsear los parametros y argumentos
- requests: para hacer requests a las paginas web
- fabric: para poder conectarse por SSH, y ejecutar cosas desde python
- pyquery: para parsear HTML
- splinter: para poder controlar el browser desde Python

In [None]:
# SETUP.PY
from distutils.core import setup

setup(
    name='TowelStuff',
    version='0.1dev',
    packages=['towelstuff',],
    license='Creative Commons Attribution-Noncommercial-Share Alike license',
    long_description=open('README.txt').read(),
)

#### Command line arguments
The command line arguments will be stored in the variable sys.argv. The first item in the sys.argv list should always be a string containing the program’s filename ('name.py'), and the second item should be the first command line argument. 

In [None]:
import sys
if len(sys.argv) < 2:
    print('Usage: python pw.py [account] - copy account password')
    sys.exit()


# Web scraping
Modules:
- webbrowser. Comes with Python and opens a browser to a specific page.
- Requests. Downloads files and web pages from the Internet.
- Beautiful Soup. Parses HTML, the format that web pages are written in.
- Selenium. Launches and controls a web browser. Selenium is able to fill in forms and simulate mouse clicks in this browser.

---

The **webbrowser** module’s open() function can launch a new browser to a specified URL.

The **requests** module lets you easily download files from the Web without having to worry about complicated issues such as network errors, connection problems, and data compression.

The requests.get() function takes a string of a URL to download. By calling type() on requests.get()’s return value, you can see that it returns a Response object, which contains the response that the web server gave for your request.

To check for success is to call the raise_for_status() method on the Response object. This will raise an exception if there was an error downloading the file and will do nothing if the download succeeded. 

You can save the web page to a file on your hard drive with the standard open() function and write() method. There are some slight differences, though. First, you must open the file in write binary mode by passing the string 'wb' as the second argument to open(). Even if the page is in plaintext (such as the Romeo and Juliet text you downloaded earlier), you need to write binary data instead of text data in order to maintain the Unicode encoding of the text.

The iter_content() method returns “chunks” of the content on each iteration through the loop. Each chunk is of the bytes data type, and you get to specify how many bytes each chunk will contain. One hundred thousand bytes is generally a good size, so pass 100000 as the argument to iter_content().

The write() method returns the number of bytes written to the file. In the previous example, there were 100,000 bytes in the first chunk, and the remaining part of the file needed only 78,981 bytes.


#### Beautiful Soup
Is a module for extracting information from an HTML page (and is much better for this purpose than regular expressions). The BeautifulSoup module’s name is bs4. Beautiful Soup examples will parse (that is, analyze and identify the parts of) an HTML file on the hard drive. 

The bs4.BeautifulSoup() function needs to be called with a string containing the HTML it will parse. The bs4.BeautifulSoup() function returns is a BeautifulSoup object. 

You can retrieve a web page element from a BeautifulSoup object by calling the select()method and passing a string of a CSS selector for the element you are looking for. Selectors are like regular expressions: They specify a pattern to look for, in this case, in HTML pages instead of general text strings.

soup.select('div'): All elements named <div>
soup.select('#author'): The element with an id attribute of author
soup.select('.notice'): All elements that use a CSS class attribute named notice
soup.select('div span'): All elements named <span> that are within an element named <div>
soup.select('div > span'): All elements named <span> that are directly within an element named <div>, with no other element in between
soup.select('input[name]'): All elements named <input> that have a name attribute with any value
soup.select('input[type="button"]'): All elements named <input> that have an attribute named type with value button
    
The select() method will return a list of Tag objects, which is how Beautiful Soup represents an HTML element. The list will contain one Tag object for every match in the BeautifulSoup object’s HTML. Tag values can be passed to the str() function to show the HTML tags they represent. Tag values also have an attrs attribute that shows all the HTML attributes of the tag as a dictionary. 

The get() method for Tag objects makes it simple to access attribute values from an element. The method is passed a string of an attribute name and returns that attribute’s value.

In [None]:
import requests
res = requests.get('https://automatetheboringstuff.com/files/rj.txt')
type(res)
# <class 'requests.models.Response'>
res.status_code == requests.codes.ok
try:
    res.raise_for_status()
except Exception as exc:
    print('There was a problem: %s' % (exc))
# True
len(res.text)
# 178981
print(res.text[:250])

# Write to disk
playFile = open('RomeoAndJuliet.txt', 'wb')
for chunk in res.iter_content(100000):
    playFile.write(chunk)
playFile.close()

In [None]:
import requests, bs4
res = requests.get('http://nostarch.com')
res.raise_for_status()
noStarchSoup = bs4.BeautifulSoup(res.text, "lxml")
type(noStarchSoup)
elems = noStarchSoup.select('div')
print(type(elems))
print(len(elems))
print(type(elems[0]))
print(elems[0].getText())
print(str(elems[0]))
print(elems[0].attrs)
print(elems[0].get('id'))

#### Selenium

The selenium module lets Python directly control the browser by programmatically clicking links and filling in login information, almost as though there is a human user interacting with the page. Selenium allows you to interact with web pages in a much more advanced way than Requests and Beautiful Soup; but because it launches a web browser, it is a bit slower and hard to run in the background if, say, you just need to download some files from the Web.

Instead of import selenium, you need to run from selenium import webdriver.
 
WebDriver objects have quite a few methods for finding elements on a page. They are divided into the find_element_* and find_elements_* methods. The find_element_* methods return a single WebElement object, representing the first element on the page that matches your query. The find_elements_* methods return a list of WebElement_* objects for every matching element on the page.


browser.find_element_by_class_name(name): Elements that use the CSS class name

browser.find_elements_by_css_selector(selector): Elements that match the CSS selector

browser.find_elements_by_id(id): Elements with a matching id attribute value

browser.find_elements_by_link_text(text): <a> elements that completely match the text provided

browser.find_elements_by_partial_link_text(text): <a> elements that contain the text provided

browser.find_elements_by_name(name): Elements with a matching name attribute value

browser.find_elements_by_tag_name(name): Elements with a matching tag name (case insensitive; an <a> element is matched by 'a' and 'A')
    
##### WebElement Attributes and Methods

tag_name: The tag name, such as 'a' for an <a> element

get_attribute(name): The value for the element’s name attribute

text: The text within the element, such as 'hello' in <span>hello</span>

clear(): For text field or text area elements, clears the text typed into it

is_displayed(): Returns True if the element is visible; otherwise returns False

is_enabled(): For input elements, returns True if the element is enabled; otherwise returns False

is_selected(): For checkbox or radio button elements, returns True if the element is selected; otherwise returns False

location: A dictionary with keys 'x' and 'y' for the position of the element in the page


WebElement objects returned from the find_element_* and find_elements_* methods have a click() method that simulates a mouse click on that element. This method can be used to follow a link, make a selection on a radio button, click a Submit button, or trigger whatever else might happen when the element is clicked by the mouse. 

Sending keystrokes to text fields on a web page is a matter of finding the <input> or <textarea> element for that text field and then calling the send_keys() method. 
    
Selenium has a module for keyboard keys that are impossible to type into a string value, which function much like escape characters. These values are stored in attributes in the selenium.webdriver.common.keys module. Since that is such a long module name, it’s much easier to run from selenium.webdriver.common.keys import Keys at the top of your program; if you do, then you can simply write Keys anywhere you’d normally have to write selenium.webdriver.common.keys. Table 11-5 lists the commonly used Keys variables.


Keys.DOWN, Keys.UP, Keys.LEFT, Keys.RIGHT - The keyboard arrow keys
Keys.ENTER, Keys.RETURN - The ENTER and RETURN keys
Keys.HOME, Keys.END, Keys.PAGE_DOWN, Keys.PAGE_UP - The home, end, pagedown, and pageup keys
Keys.ESCAPE, Keys.BACK_SPACE, Keys.DELETE - The ESC, BACKSPACE, and DELETE keys
Keys.F1, Keys.F2,..., Keys.F12 - The F1 to F12 keys at the top of the keyboard
Keys.TAB - The TAB key


Selenium can simulate clicks on various browser buttons as well through the following methods:
browser.back(). Clicks the Back button.
browser.forward(). Clicks the Forward button.
browser.refresh(). Clicks the Refresh/Reload button.
browser.quit(). Clicks the Close Window button.

In [None]:
from selenium import webdriver
browser = webdriver.Firefox()
print(type(browser))
# <class 'selenium.webdriver.firefox.webdriver.WebDriver'>
browser.get('http://inventwithpython.com')
try:
    elem = browser.find_element_by_class_name('bookcover')
    print('Found <%s> element with that class name!' % (elem.tag_name))
except:
    print('Was not able to find an element with that name.')
# Found <img> element with that class name!

linkElem = browser.find_element_by_link_text('Read It Online')
print(type(linkElem))
# <class 'selenium.webdriver.remote.webelement.WebElement'>

linkElem.click() # follows the "Read It Online" link


browser.get('https://mail.yahoo.com')
emailElem = browser.find_element_by_id('login-username')
emailElem.send_keys('not_my_real_email')
passwordElem = browser.find_element_by_id('login-passwd')
passwordElem.send_keys('12345')
passwordElem.submit()


from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get('http://nostarch.com')
htmlElem = browser.find_element_by_tag_name('html')
htmlElem.send_keys(Keys.END)     # scrolls to bottom
htmlElem.send_keys(Keys.HOME)    # scrolls to top

# Excel Documents



In [None]:
import openpyxl

wb = openpyxl.load_workbook('example.xlsx')
wb.get_sheet_names()
#['Sheet1', 'Sheet2', 'Sheet3']
sheet = wb.get_sheet_by_name('Sheet3')
sheet.title
anotherSheet = wb.active
sheet['A1'].value
'Row ' + str(c.row) + ', Column ' + c.column + ' is ' + c.value
'Cell ' + c.coordinate + ' is ' + c.value
for i in range(1, 8, 2):
        print(i, sheet.cell(row=i, column=2).value)
        
sheet.max_row
sheet.max_column

get_column_letter(27)
# 'AA'

column_index_from_string('AA')
# 27

tuple(sheet['A1':'C3'])
for rowOfCellObjects in sheet['A1':'C3']:
    for cellObj in rowOfCellObjects:
        print(cellObj.coordinate, cellObj.value)
    print('--- END OF ROW ---')
    
for cellObj in sheet.columns[1]:
    print(cellObj.value)
    
wb = openpyxl.Workbook()
sheet = wb.active
sheet.title
#'Sheet'
sheet.title = 'Spam Bacon Eggs Sheet'
wb.save('example_copy.xlsx')
wb.create_sheet()
wb.create_sheet(index=0, title='First Sheet')
wb.remove_sheet(wb.get_sheet_by_name('Middle Sheet'))
sheet['A1'] = 'Hello world!'