# Python basics

Here are some links for python beginners:

* https://pythonschool.net/category/basics.html

* http://www.practicepython.org/exercises/

* http://www.datacamp.com/

## Collection data types -data structures-

Containers are data structures holding elements, and that support membership tests. You might think about them as a real life containers too: a box, house, etc. They are data structures that live in memory, and typically hold all their values in memory, too. We can find six data structures: 

* String
* List
* Tuple
* Set
* Frozenset
* Dict (mapping type, similar to Java Hashmap)
* Collections (module not built-in)
* Heapq (module not built-in)
	
The most important reasons for choosing one implementation over the rest are: 

* Do we need a to keep the given order?
* Does the datastructure need to be changed?  
* Do we need to keep unique elements or we can accept duplicates?
* Is performance relevant for us? Do we need to check membership? Is dataset too big? 
* Do we need to sort the elements on the dataset in some way? 
* Do we need to work with the data itself or we can use some indexes in order to gain performance? 
* What kind of data do my collection going to keep? homegeneous? heterogeneous? 


### Datastructures comparison


|Data structure|Built-in|Immutable|Hashable|Keeps order|Sortable|Unique|Is sequence?|Iterable| Delimiter  |Performace|
|-------:------|----:---|----:----|----:---|-----:-----|----:---|---:--|-----:------|--------|-----:------|----:-----|
|strings       |    X   |    X    |        |     X     |    X   |      |     X      |   X    |" " or ' '  |   slow   |
|list          |    X   |         |        |     X     |    X   |      |     X      |   X    |    [ ]     |   slow   |
|dict          |    X   |         |        |           |    X   |   X  |            |   X    |    { }     |   fast   |
|tuple         |    X   |    X    |    X   |     X     |        |      |     X      |        |    ( )     |   fast   |
|set           |    X   |         |        |           |    X   |   X  |            |        |  { ( ) }   |   fast   |
|frozenset     |    X   |    X    |    X   |           |        |   X  |            |        |            |   fast   |
|collections   | module |         |        |           |    X   |   X  |            |        |Counter({ })|high-perf |
|heapq         | module |         |        |           |        |      |            |        |            |high-perf |


### Hashing vs Immutability


**Hashing** is the process of converting some large amount of data into a much smaller amount (typically a single integer) in a repeatable way so that it can be looked up in a table in constant-time (O(1)), which is important for high-performance algorithms and data structures. An object is hashable if it has a hash value which never changes during its lifetime.

**Immutability** is the idea that an object will not change in some important way after it has been created, especially in any way that might change the hash value of that object.

The two ideas are related because objects which are used as hash keys must typically be immutable so their hash value doesn't change. If it was allowed to change then the location of that object in a data structure such as a hashtable would change and then the whole purpose of hashing for efficiency is defeated.

### Sortable vs ordered

**Ordered**: It means that the items come in the same order they have been introduced, so the initial order is kept, thus the data structure is a sequence.  

**Sorting** : Sequence objects may be compared to other objects with the same sequence type. The comparison uses lexicographical ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted. If two items to be compared are themselves sequences of the same type, the lexicographical comparison is carried out recursively.

### Sequence operations (string, list and tuples)

|Operation name|Operator |Explanation                               |
|-------:------|----:----|----:-------------------------------------|
|indexing      |   [ ]   | Access an element of a sequence          |
|concatenation |    +    | Combine sequences together               |
|repetition    |    *    | Concatenate a repeated number of times   |
|membership    |   in    | Ask whether an item is in a sequence     | 
|length        |   len   | Ask the number of items in the sequence  |
|slicing       |   [:]   | Extract a part of a sequence             |   




### Check membership with asserts

In [14]:
# Check membership
# Note: assert is a keyword and not a function

assert 1 in [1, 2, 3]  # lists
assert 4 not in [1, 2, 3]
assert 1 in {1, 2, 3}      # sets
assert 4 not in {1, 2, 3}
assert 1 in (1, 2, 3)      # tuples
assert 4 not in (1, 2, 3)


### Access to items 

In [None]:
fruits = { 
    'apples': {'cost': 3, 'units': 100}, 
    'bananas': {'cost': 1, 'units': 80},
    'grapes': {'cost': 5, 'units': 500}
}

print(fruits['apples']['cost'])

### Constructors

In [74]:
### List Constructor: x=list(), x=[]

a = "some_variable"
x = ["hello", "how", "are", "you", "26", 17, a]  # lists can be heterogeneous
print("x is a        : ", type(x), x, "\n")

# remove an element using the index
del(x[0])
print(x)

# remove an element using the value
x.remove(17)
print(x)

x is a        :  <class 'list'> ['hello', 'how', 'are', 'you', '26', 17, 'some_variable'] 

['how', 'are', 'you', '26', 17, 'some_variable']
['how', 'are', 'you', '26', 'some_variable']


In [16]:
### Tuple Constructor

mytuple = tuple(x)

print("mytuple is a  : ", type(mytuple), mytuple)
print("mytyple is a hash-able object: ", hash(mytuple))
# del(mytuple[0])   # TypeError: 'tuple' object doesn't support item deletion, Tuples are immutable!!!
mytuple2 = (["this", "is", "a", "list"], "this", "is", "a", "tuple", "!", "!") # Duplicates allowed
print("mytuple2 is a : ", type(mytuple2), mytuple2, "\n")


# tuples are immutable, but they might contain mutable objects like lists

print("This tuple contains a list on index=0 : ", type(mytuple2[0]), mytuple2[0])
del(mytuple2[0][2]) # lists are mutable so, it can be accessed and modified
print("One element has been removed : ", mytuple2[0], "\n")

mytuple is a  :  <class 'tuple'> ('how', 'are', 'you', '26', 17, 'some_variable')
mytyple is a hash-able object:  2254645726041592945
mytuple2 is a :  <class 'tuple'> (['this', 'is', 'a', 'list'], 'this', 'is', 'a', 'tuple', '!', '!') 

This tuple contains a list on index=0 :  <class 'list'> ['this', 'is', 'a', 'list']
One element has been removed :  ['this', 'is', 'list'] 



In [17]:
### Set Constructor

myset = {"Oxygen", "Sulfur", "Selenium", "Tellurum", "Polonium", mytuple, "Oxygen"} # Sets can only contain immutable elements and unique items

print("This set contains a tuple : ", type(myset), myset, "\n") # Note that the order is not maintained, and not duplicates are accepted
myset.remove("Tellurum") # Sets are mutable, so any element can be removed
myset

This set contains a tuple :  <class 'set'> {'Selenium', 'Tellurum', 'Sulfur', ('how', 'are', 'you', '26', 17, 'some_variable'), 'Polonium', 'Oxygen'} 



{'Selenium',
 'Sulfur',
 ('how', 'are', 'you', '26', 17, 'some_variable'),
 'Polonium',
 'Oxygen'}

In [18]:
### Dict constructor

mydict = {9: "Fluor", 17: "Chlorine", 35: "Bromine", 53: "Iodine", "Astatine":85} # provide value|key pair, otherwise it is a set

print("mydict is a  :", type(mydict), mydict, "\n") # Careful: Iodine and Astatine are keys, not values!! 
mydict["Tennessine"] = 17 # Key: Tennessine, Value = 17
mydict["Tennessine"] = 117

print("mydict updated  :")
mydict

list1, list2 = ['a', 'b', 'c'], [1,2,3]
dict( zip( list1, list2))
{'a': 1, 'c': 3, 'b': 2}

mydict is a  : <class 'dict'> {9: 'Fluor', 17: 'Chlorine', 35: 'Bromine', 53: 'Iodine', 'Astatine': 85} 

mydict updated  :


{'a': 1, 'b': 2, 'c': 3}

In [19]:
### Frozenset constructor

myfrozenset = frozenset({"Boron", "Aluminium", "Galium", "Indium", "Thalium", "Boron"}) # No duplicates allowed

print("myfrozenset is a  :", type(myfrozenset))
print("myfrozenset is a hash-able object:", hash(myfrozenset))

myfrozenset

myfrozenset is a  : <class 'frozenset'>
myfrozenset is a hash-able object: 4514583614292879719


frozenset({'Aluminium', 'Boron', 'Galium', 'Indium', 'Thalium'})

In [20]:
escape_velocity = {
    'earth': 1, 
    'jupiter': 5.32,
    'saturn': 3.17
}
del(escape_velocity['saturn'])
print(escape_velocity)

{'earth': 1, 'jupiter': 5.32}


### Comprehensions

enum() is a method that allows us to loop over something and have an automatic counter. Even if we are applying enum() over a list, we can format the output to a list, dict, tuple or set depending on the comprehension specified:


In [21]:
# Given a list
mylist = ["a","b","c","d"]  # [] list

# This comprehension returns a list of tuples:
[(i, j) for i, j in enumerate(mylist)]  # returns a list of tuples [()]

[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]

In [22]:
# Alternatively, given that enumerate() already returns a tuple, 
# you can return it directly without unpacking it first:

[pair for pair in enumerate(mylist)] # returns a list

[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]

In [23]:
# This comprehension returns a dict: 
{i:j for i,j in enumerate(mylist)} # returns a dict{}

{0: 'a', 1: 'b', 2: 'c', 3: 'd'}

In [24]:
# This list comprehension returns a list of tuples:

# constructor for range function range(start, stop[, step]), if no step, default value is 1
[(i,j) for i in range(3) for j in 'abc'] # two iterators, returns a list [()]

[(0, 'a'),
 (0, 'b'),
 (0, 'c'),
 (1, 'a'),
 (1, 'b'),
 (1, 'c'),
 (2, 'a'),
 (2, 'b'),
 (2, 'c')]

In [25]:
# This a list of dicts: 
[{i:j} for i in range(3) for j in 'abc'] # two iterators, returns a list of dicts [{}]

[{0: 'a'},
 {0: 'b'},
 {0: 'c'},
 {1: 'a'},
 {1: 'b'},
 {1: 'c'},
 {2: 'a'},
 {2: 'b'},
 {2: 'c'}]

In [26]:
# A list of lists:
[[i,j] for i in range(3) for j in 'abc'] # returns a list of lists [[]]

[[0, 'a'],
 [0, 'b'],
 [0, 'c'],
 [1, 'a'],
 [1, 'b'],
 [1, 'c'],
 [2, 'a'],
 [2, 'b'],
 [2, 'c']]

In [27]:
# And a set of tuples: 
{(i,j) for i,j in enumerate('abcdef')} # returns a set {()}

{(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')}

In [28]:
# You can just pass the enumerate tuple directly:
[t for t in enumerate('abcdef') ]  # returns a list of tuples [()]

[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')]

In [29]:
my_list = ['apple', 'banana', 'grapes', 'pear']
for c, value in enumerate(my_list, 1):  
    print(c, value)

1 apple
2 banana
3 grapes
4 pear


In [30]:
# You can also create tuples containing the index and list item using a list
counter_list = list(enumerate(my_list, 0))
print(counter_list)

[(0, 'apple'), (1, 'banana'), (2, 'grapes'), (3, 'pear')]


In [31]:
a = "hello"

print("Type of a : ", type(a))

b = sorted(a)

print("Type of b : ", type(b))

Type of a :  <class 'str'>
Type of b :  <class 'list'>


In [32]:
basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
print(type(basket))

for f in sorted(set(basket)):
    print(f)

b = set(basket)    
    
print("A set from a list : ", type(b))
print("An ordered set is a list : ", type(sorted(set(basket))))

<class 'list'>
apple
banana
orange
pear
A set from a list :  <class 'set'>
An ordered set is a list :  <class 'list'>


In [33]:
from collections import Counter

# Tally occurrences of words in a list
cnt = Counter()

print(type(cnt))

for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
    cnt[word] += 1
cnt

print(type(b), cnt)

<class 'collections.Counter'>
<class 'set'> Counter({'blue': 3, 'red': 2, 'green': 1})


In [34]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Print out second element from areas
print(areas[1])

# Print out last element from areas
print(areas[-1])

# Print out the area of the living room
sum_areas = 0


for i in range(len(areas)):
    
    if type(areas[i]) == float:
        
        sum_areas += areas[i]
        
print(sum_areas)

11.25
9.5
69.5


In [35]:
# Create list areas
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Create areas_copy
areas_copy = list(areas)

# Change areas_copy
areas_copy[0] = 5.0

# Print areas
print(areas)

[11.25, 18.0, 20.0, 10.75, 9.5]


## Maps and filters

In [36]:
x = range(10)

print(type(x))

def divis_by_5(num):
    return num % 5 == 0
    

<class 'range'>


In [37]:
print(map(divis_by_5, x))

<map object at 0x11bdf1588>


In [38]:
filter(divis_by_5, x)

<filter at 0x11bdf1d68>

In [1]:
x = [4, 8, -2, -6, 3]
greater_than_zero = filter(lambda n: (n > 0), x)
print(list(greater_than_zero))

[4, 8, 3]


## User input

In [39]:
def check_type(**kwargs):
    return type(kwargs)

print("the type is ")
check_type(a = 1, b = 2, c = 3)


def mean(*args):
    """Returns the mean of all the numbers"""
    total_sum = 0 # Intial sum
    n = len(args) # Number of arguments
    for x in 3,4:
        total_sum = total_sum + x
    return total_sum/n
print((mean(3, 4), mean(40, 45, 50)))



the type is 
(3.5, 2.3333333333333335)


In [40]:

x = ['regulation', 'authorize', 'precaution', 'myth']
count_vowel = list(map(lambda word: word.count('i'), x))
print(count_vowel)

[1, 1, 1, 0]


In [41]:
def easy_print(**kwargs):
        for p, q in kwargs.items():
            print('The value of ' + str(p) + " is " + str(q))
easy_print(
x=15, y=30
)

The value of x is 15
The value of y is 30


## Other python 3.x objects

In [42]:
y = range(29, 32)

print (type(y))

print(sum(y))


<class 'range'>
90


In [43]:
x = range(100, 104)

print(type(x))

for i in x:
    print(i)

<class 'range'>
100
101
102
103


In [44]:
characters = ['The Hulk', 'Iron Man']
names = ['Bruce Banner', 'Tony Stark']

ave_1 = zip(characters, names)
avengers = list(ave_1)
print(avengers)

print(ave_1)

[('The Hulk', 'Bruce Banner'), ('Iron Man', 'Tony Stark')]
<zip object at 0x11bdfde48>


## More exercises...

In [45]:
vibe = iter('Cisco Ramon')
print(vibe)
print(*vibe)

<str_iterator object at 0x11bdf71d0>
C i s c o   R a m o n


In [None]:
Which option replaces the ___ in the following function definition?

def mile_to_km(x):
    """Converts miles to kilometers"""
    try:
        return x * 1.609
    except TypeError:
        print('x must be int or float')

raise
NameError
TypeError
ValueError

In [None]:
def mean(*x):
    """Returns the mean of all the numbers"""
    total_sum = 0 # Intial sum
    n = len(x) # Number of arguments
    for i in x:
        total_sum = total_sum + i
    return total_sum/n
print((mean(2, 3), mean(30, 35, 40)))

In [None]:
avengers = ['maria hill', 'phil coulson', 'nick fury']
AVENGERS = [y.upper() for y in avengers]
print(AVENGERS)

In [None]:
flash_vil = ['Hunter Zolomon', 'Savitar', 'Eobard Thawne']
for i, b in enumerate(flash_vil):
    print(str(i) + ': ' + b)



In [None]:
z = "cautioned"
print(z.replace("a", "/"))

p = [3, 18, 15, 4, 8, 4]
print(sorted(p, reverse=False))

foo = [0, 5.22, "A", "Tue", 1.2]
foo[2:4] = ["Sun", "Mon"] # replaces and removes one element
print(foo)

## Numpy arrays

* **Homogeneus in type: Numpy arrays cannot contain elements with different types.** If you try to build such a list, some of the elements' types are changed to end up with a homogenous list. This is known as type coercion.



* Calculations all at once 

* **Indexing with brackets:**
    * 1D array --> A[index]
    * 2D arrays --> A[index0, index1]
    
    
    
* **Slicing arrays:** 
    * 1D array --> A[slice]
    * 2D arrays --> A[slice0, slice1]
    * slice = start:stop:stride
        * Indexes from the start to the stop-1 in steps of stride
        * Missing start : implicitly at the beginning of array
        * Missing stop : implicitly at the end of array
        * Missing stride : implicitly stride 1
    * Negative indexes/slices: count from the end of the array
    

In [2]:
import numpy as np
z = np.array([[7, 0, 4], 
              [2, 3, 3]])
print(z[0:, 2:])


x = np.array([11, 15, 12, 3, 12, 15])
print(x[1:3])

store = np.array([8, 6, 9, 1, 8, 0])
cost  = np.array([81, 89, 82, 78, 89, 60])
np_cols = np.column_stack((store, cost))
print(np_cols)

[[4]
 [3]]
[15 12]
[[ 8 81]
 [ 6 89]
 [ 9 82]
 [ 1 78]
 [ 8 89]
 [ 0 60]]


In [4]:
x = np.array([10, 8, 5, 24])
y = np.array([16, 23, 25, 22])
w = np.stack([x, y])
print(w)
z = np.column_stack([x, y])
print(z)
print(w.shape)
print(z.shape)

[[10  8  5 24]
 [16 23 25 22]]
[[10 16]
 [ 8 23]
 [ 5 25]
 [24 22]]
(2, 4)
(4, 2)


In [None]:
import numpy as np
x = np.array([4, 7, 5, 7, 5, 4, 5, 7])
y = np.array([1, 6, 0, 5, 0, 6, 4, 8])
print(np.corrcoef(x, y))

In [None]:
# Operations with ndarray objects

import numpy as np

m = np.array([6, 4, 2])
n = np.array([True, False, True])
# print(type(m))
print(m + n)

In [None]:
sales[np.logical_or(sales['product'] == 'A', sales['sold'] > 100)]

In [None]:
from sklearn.model_selection import StratifiedShuffleSplit
import numpy as np

X = np.array([[5, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([0, 0, 1, 1])


print("0", X[0])
print("1", X[1])
print("2", X[2])
print("3", X[3])

# try different values for test_size, and try to understand the result. 0.4 and 0.3 are safe values. 
sss = StratifiedShuffleSplit(n_splits=3, test_size=0.5, random_state=0)
sss.get_n_splits(X, y)
   

StratifiedShuffleSplit(n_splits=3, random_state=0)

for train_index, test_index in sss.split(X, y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    print("X train index: ", X[train_index])
    print("X test index: ", X[test_index])
    
    y_train, y_test = y[train_index], y[test_index]

In [None]:
# Slicing operations 
import numpy as np
z = np.array([[7, 0, 4], 
              [2, 3, 3]])
print(z[0:, 2:])


x = np.array([11, 15, 12, 3, 12, 15])
print(x[1:3])



In [None]:
# Stack columns

store = np.array([8, 6, 9, 1, 8, 0])
cost  = np.array([81, 89, 82, 78, 89, 60])
np_cols = np.column_stack((store, cost))
print(np_cols)

In [None]:
# Iterator

import numpy as np
x = np.array([[2, 1, 0],
              [6, 6, 5]])
for i in np.nditer(x):
    print(i)

In [10]:
# save ndarray object to txt/csv

import numpy
a = numpy.asarray([ [6,2,3], [4,3,6], [9,8,5] ])
numpy.savetxt("datasets/foo.csv", a, delimiter=',')


In [11]:
from numpy import genfromtxt
my_data = genfromtxt('datasets/foo.csv', delimiter=',')
print(my_data)

[[ 6.  2.  3.]
 [ 4.  3.  6.]
 [ 9.  8.  5.]]


In [5]:
x = np.array([4, 7, 4, 5, 7, 4, 4, 4])
y = np.array([0, 5, 3, 9, 3, 1, 7, 2])
print(np.corrcoef(x, y))

[[ 1.          0.23243927]
 [ 0.23243927  1.        ]]


## Pandas and Dataframes


A Dataframe is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure. 

The Series is the datastructure for a single column of a DataFrame, not only conceptually, but literally i.e. the data in a DataFrame is actually stored in memory as a collection of Series. 

* Labelled tabular data structure
* Labels on rows: index
* Labels on columns: columns
* Columns are pandas Series

### Iterators for Dataframes

In [None]:
# Import the csv file located at the url "link_to_df" in small dataframe chunks of size 4
# Print the second chunk of this dataframe
import pandas as pd
link_to_iris = 'https://goo.gl/BR8npa'
iris = pd.read_csv(link_to_iris, chunksize = 4)
next(iris)
print(next(iris))



In [63]:


for i, p in logins.iterrows():
    logins.loc[i, 'month'] = p['MONTH'].lower()
print(logins)

      month
0   January
1  February
2     March


TypeError: tuple indices must be integers or slices, not str

In [1]:
import numpy as np
x = np.array([3, 4, False, True, 5.2])
print(x)

[ 3.   4.   0.   1.   5.2]


## References

* http://www.datacamp.com
* https://docs.python.org/3.3/tutorial/datastructures.html#dictionaries
* http://nvie.com/posts/iterators-vs-generators/
* http://www.thomas-cokelaer.info/tutorials/python/data_structures.html
* http://interactivepython.org/
* http://stackoverflow.com/questions/2671376/hashable-immutable
* https://docs.python.org/3.6/library/functions.html
* http://anandology.com/python-practice-book/iterators.html