# Information Retrieval Lab: Python Tutorial 🐍

### (Re)sources:
- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/index.html) _by Jake VanderPlas (Code released under the MIT License)_
- ➡️ [A Whirlwind Tour of Python](https://jakevdp.github.io/WhirlwindTourOfPython/) _by Jake VanderPlas released under the "No Rights Reserved" CC0 license (O’Reilly). Copyright 2016 O’Reilly Media, Inc., 978-1-491-96465-1_ __(What the present tutorial is mostly based on)__
- [Python for Data Analysis](https://github.com/wesm/pydata-book) _by Wes McKinney (Code released under the MIT License)_
- [Python 3.8 Documentation](https://docs.python.org/3.8/)

__Obligatory Wikipedia excerpt:__


>_Python is an interpreted, high-level and general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects._


>_Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented, and functional programming. Python is often described as a "batteries included" language due to its comprehensive standard library._

>_An important goal of Python's developers is keeping it fun to use. This is reflected in the language's name—a tribute to the British comedy group Monty Python—and in occasionally playful approaches to tutorials and reference materials, such as examples that refer to spam and eggs (from a famous Monty Python sketch) instead of the standard foo and bar._

>_A common neologism in the Python community is pythonic, which can have a wide range of meanings related to program style. To say that code is pythonic is to say that it uses Python idioms well, that it is natural or shows fluency in the language, that it conforms with Python's minimalist philosophy and emphasis on readability. In contrast, code that is difficult to understand or reads like a rough transcription from another programming language is called unpythonic._

>_The language's core philosophy is summarized in the document __The Zen of Python__ ([PEP](https://www.python.org/dev/peps) 20) \[...\]_

In [1]:
#MAKE PAGE WIDER
from IPython.display import HTML
display(HTML("<style>.container { width:80% !important; }</style>"))

In [2]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


### Python is Dynamically Typed

In [3]:
x = 1         # x is an integer
x = 1.        # x is now a float
x = 'Hello, world!'   # x is now a string
x = [1, 2, 3, 4] # x is now a list

#### A Python integer is not a C integer:

The __C struct__ behind the curtain:

```C
struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};
```

### Everything is an object (even functions)

In [4]:
x=1
type(x)

int

In [5]:
x=1.
type(x)

float

In [6]:
x = 'Hello, World!'
type(x)

str

In [7]:
x = [1,2,3,4]
type(x)

list

In [9]:
x = [1, 2, 3]
y = x

In [10]:
x.append(4) # append 4 to x
print(y) # y's list is modified as well!

[1, 2, 3, 4]


In [11]:
x = 'something else'
print(y)  # y is unchanged

[1, 2, 3, 4]


### Arithmetic Operations
Python implements seven basic binary arithmetic operators:

| Operator     | Name           | Description                                            |
|--------------|----------------|--------------------------------------------------------|
| ``a + b``    | Addition       | Sum of ``a`` and ``b``                                 |
| ``a - b``    | Subtraction    | Difference of ``a`` and ``b``                          |
| ``a * b``    | Multiplication | Product of ``a`` and ``b``                             |
| ``a / b``    | True division  | Quotient of ``a`` and ``b``                            |
| ``a // b``   | Floor division | Quotient of ``a`` and ``b``, removing fractional parts |
| ``a % b``    | Modulus        | Integer remainder after division of ``a`` by ``b``     |
| ``a ** b``   | Exponentiation | ``a`` raised to the power of ``b``                     |

### Boolean Operations
When working with Boolean values, Python provides operators to combine the values using the standard concepts of "and", "or", and "not". Predictably, these operators are expressed using the words `and`, `or`, and `not`:

In [12]:
a=5
b=2
c=3

In [13]:
a==5 and b==2

True

In [14]:
a==5 and not c==5

True

In [15]:
(a==5 or not b==6) and not c==3

False

In [16]:
a==5 or not b==6 and not c==3

True

In [17]:
(a==5 or not b==6) and not c==3

False

### Identity and Membership Operators

Python also contains prose-like operators  to check for identity and membership.
They are the following:

| Operator      | Description                                       |
|---------------|---------------------------------------------------|
| ``a is b``    | True if ``a`` and ``b`` are identical objects     |
| ``a is not b``| True if ``a`` and ``b`` are not identical objects |
| ``a in b``    | True if ``a`` is a member of ``b``                |
| ``a not in b``| True if ``a`` is not a member of ``b``            |

In [18]:
a = [1,2,3]
b = a
b is a

True

In [19]:
a = [1,2,3]
b = [1,2,3]
b is a

False

In [20]:
1 in [1,2,3]

True

In [21]:
'foo' in [1,2,3]

False

In [22]:
5 not in [1,2,3]

True

### Python Scalar Types

| Type        | Example        | Description                                                  |
|-------------|----------------|--------------------------------------------------------------|
| ``int``     | ``x = 1``      | integers (i.e., whole numbers)                               |
| ``float``   | ``x = 1.0``    | floating-point numbers (i.e., real numbers)                  |
| ``complex`` | ``x = 1 + 2j`` | Complex numbers (i.e., numbers with real and imaginary part) |
| ``bool``    | ``x = True``   | Boolean: True/False values                                   |
| ``str``     | ``x = 'abc'``  | String: characters or text                                   |
| ``NoneType``| ``x = None``   | Special object indicating nulls                              |

### Python Data Structures

| Type Name | Example                   |Description                            |
|-----------|---------------------------|---------------------------------------|
| ``list``  | ``[1, 2, 3]``             | Ordered collection                    |
| ``tuple`` | ``(1, 2, 3)``             | Immutable ordered collection          |
| ``dict``  | ``{'a':1, 'b':2, 'c':3}`` | Unordered (key,value) mapping         |
| ``set``   | ``{1, 2, 3}``             | Unordered collection of unique values |

#### Lists

In [23]:
L = [2, 3, 5, 7]

In [24]:
# Length of a list
len(L)

4

In [25]:
# Append a value to the end
L.append(11)
L

[2, 3, 5, 7, 11]

In [26]:
# Addition concatenates lists
L + [13, 17, 19]

[2, 3, 5, 7, 11, 13, 17, 19]

In [27]:
# sort() method sorts in-place
L = [2, 5, 1, 6, 3, 4]
L.sort()
L

[1, 2, 3, 4, 5, 6]

In [28]:
L = [1, 'two', 3.14, [0, 3, 5]]

In [29]:
L = [2, 3, 5, 7, 11]

In [30]:
L[0]

2

In [31]:
L[1]

3

In [32]:
L[-1]

11

In [33]:
L[-2]

7

In [34]:
L[0:3]

[2, 3, 5]

In [35]:
L[:3]

[2, 3, 5]

In [36]:
L[3:]

[7, 11]

In [37]:
L[:3]+L[3:]

[2, 3, 5, 7, 11]

In [38]:
L.reverse() #This happen
L

[11, 7, 5, 3, 2]

In [39]:
l = [7, 1, 2]
r = [9, 6, 8]
l + r*2

[7, 1, 2, 9, 6, 8, 9, 6, 8]

In [40]:
print(l)
l.sort()
print(l)

[7, 1, 2]
[1, 2, 7]


In [41]:
print(r)
sorted(r)

[9, 6, 8]


[6, 8, 9]

In [42]:
print(r)

[9, 6, 8]


In [46]:
sorted?

In [45]:
dir(l)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

#### Dicts

In [47]:
numbers = {'one':1, 'two':2, 'three':3}

In [48]:
numbers['two']

2

In [49]:
numbers['ninety'] = 90

In [50]:
k = numbers.keys()
k

dict_keys(['one', 'two', 'three', 'ninety'])

In [51]:
v = numbers.values()
v

dict_values([1, 2, 3, 90])

In [52]:
dir(numbers)

['__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'copy',
 'fromkeys',
 'get',
 'items',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

In [53]:
for thing in numbers.keys():
    print(thing)

one
two
three
ninety


#### Sets

In [54]:
primes = {2, 3, 5, 7}
odds = {1, 3, 5, 7, 9}

In [55]:
# union: items appearing in either
primes | odds      # with an operator
primes.union(odds) # equivalently with a method

{1, 2, 3, 5, 7, 9}

In [56]:
# intersection: items appearing in both
primes & odds             # with an operator
primes.intersection(odds) # equivalently with a method

{3, 5, 7}

In [57]:
# difference: items in primes but not in odds
primes - odds           # with an operator
primes.difference(odds) # equivalently with a method

{2}

In [58]:
# symmetric difference: items appearing in only one set
primes ^ odds                     # with an operator
primes.symmetric_difference(odds) # equivalently with a method

{1, 2, 9}

In [59]:
numbers = {'one', 'four', 'twenty'}

In [60]:
k & numbers

{'one'}

In [61]:
k | numbers

{'four', 'ninety', 'one', 'three', 'twenty', 'two'}

In [62]:
k - numbers

{'ninety', 'three', 'two'}

In [63]:
k ^ numbers

{'four', 'ninety', 'three', 'twenty', 'two'}

### Control Flow

In [64]:
x = -15

if x == 0:
    print(x, "is zero")
elif x > 0:
    print(x, "is positive")
elif x < 0:
    print(x, "is negative")
else:
    print(x, "is unlike anything I've ever seen...")

-15 is negative


#### Conditional Expression

Introduced in PEP 308, and often referred to as a ternary operator:

```python
x = x_if_true if condition else x_if_false
```
which is the succint version of:

```python
if condition:
    x = x_if_true
else:
    x = x_if_false
```

In [65]:
sun_shining = False
x = 35 if sun_shining else -4
print(x)

-4


In [66]:
def sun_shining(sun_shining=True):
    return 35 if sun_shining else -4

In [67]:
sun_shining(True)

35

In [68]:
sun_shining(False)

-4

#### for loops

In [69]:
for N in [2, 3, 5, 7]:
    print(N, end=' ') # print all on same line

2 3 5 7 

In [70]:
for i in range(10):
    print(i, end=' ')

0 1 2 3 4 5 6 7 8 9 

In [71]:
for n in range(20):
    # if the remainder of n / 2 is 0, skip the rest of the loop
    if n % 2 == 0:
        continue
    print(n, end=' ')

1 3 5 7 9 11 13 15 17 19 

#### while loops

In [72]:
i = 0
while i < 10:
    print(i, end=' ')
    i += 1

0 1 2 3 4 5 6 7 8 9 

In [73]:
a, b = 0, 1
amax = 100
L = []

while True:
    (a, b) = (b, a + b)
    if a > amax:
        break
    L.append(a)

print(L)


[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]


### Functions

In [74]:
def fibonacci(N, a=0, b=1):
    L = []
    while len(L) < N:
        a, b = b, a + b
        L.append(a)
    return L

In [75]:
fibonacci(10)

[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

In [76]:
fibonacci(10, b=3, a=1)

[3, 4, 7, 11, 18, 29, 47, 76, 123, 199]

In [77]:
def catch_all(*args, **kwargs):
    print("args =", args)
    print("kwargs = ", kwargs)

In [78]:
catch_all(1, 2, 3, a=4, b=5)

args = (1, 2, 3)
kwargs =  {'a': 4, 'b': 5}


In [79]:
catch_all('a', keyword=2)

args = ('a',)
kwargs =  {'keyword': 2}


In [80]:
inputs = (1, 2, 3)
keywords = {'pi': 3.14}

catch_all(*inputs, **keywords)

args = (1, 2, 3)
kwargs =  {'pi': 3.14}


In [81]:
add = lambda x, y: x + y
add(1, 2)

3

In [82]:
things = ["cat", "apple", "boat"]
sorted(things) # alphabetically, upper case first

['apple', 'boat', 'cat']

In [83]:
sorted(things, key=lambda x: len(x))

['cat', 'boat', 'apple']

### Standard Library

#### Math

In [84]:
import math

math.log2(1024)

10.0

In [85]:
math.log(math.e)

1.0

In [86]:
math.cos(math.pi)

-1.0

#### Random

In [87]:
import random as rnd  ## you can re-name imported modules

rnd.randint(1, 6)  ## Here, the end points are both included

1

In [88]:
things = ['cat', 'apple', 'boat']
rnd.choice(things)

'apple'

#### urllib

In [89]:
## Modules can have sub-modules
import urllib.request as rq

response = rq.urlopen("http://en.wikipedia.org/wiki/Python")

print(response.read(151).decode('utf8'))

<HTML><HEAD>
<meta http-equiv=pragma content=nocache>
<META HTTP-EQUIV=Expires CONTENT=-1>
<SCRIPT>
var url = new DOMParser().parseFromString('<a></a>'


#### itertools

In [90]:
import itertools

perms = itertools.permutations([1, 2, 3], r=2)
# r-length tuples, all possible orderings, no repeated elements
# default r: length of the iterable

for p in perms:
    print(p)

(1, 2)
(1, 3)
(2, 1)
(2, 3)
(3, 1)
(3, 2)


In [91]:
combs = itertools.combinations([1, 2, 3], r=2)
# r-length tuples, in sorted order, no repeated elements

print(list(combs))

[(1, 2), (1, 3), (2, 3)]


### Comprehensions

#### List Comprehensions

In [92]:
[n for n in range(11)]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [93]:
perms = itertools.permutations([1, 2, 3], r=2)
[p for p in perms]

[(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)]

In [94]:
perms = itertools.permutations([1, 2, 3], r=2)
[thing for thing in perms]

[(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)]

In [95]:
[n**2 for n in range(11)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [96]:
[n for n in range(11) if n%3] # n%3 is shorthand for n%3!=0

[1, 2, 4, 5, 7, 8, 10]

In [97]:
[n if n%2 else -n for n in range(11) if n%3] 

[1, -2, -4, 5, 7, -8, -10]

In [98]:
[(n,n) if n%2 else (-n,9) for n in range(11) if n%3] 

[(1, 1), (-2, 9), (-4, 9), (5, 5), (7, 7), (-8, 9), (-10, 9)]

#### Dictionary Comprehensions

In [99]:
list_of_tuples = [(n,n) if n%2 else (-n,9) for n in range(11) if n%3] 
{a:b for a,b in list_of_tuples}

{1: 1, -2: 9, -4: 9, 5: 5, 7: 7, -8: 9, -10: 9}

In [100]:
numbers

{'four', 'one', 'twenty'}

In [102]:
{v:k for k,v in numbers.item()}

AttributeError: 'set' object has no attribute 'item'

#### Set Comprehensions

In [103]:
{a%4 for a in range(1000)}

{0, 1, 2, 3}

#### Lambdas
Lambdas can be used to create anonymous functions.

In [104]:
sum_lambda = lambda x, y: x+y

sum_lambda(3,2)

5

#### Map, Filter, Reduce

Supports `map`, `filter`, and `reduce` functions.
All three can be replaced with List Comprehensions or loops, but often provide a more elegant solution.
Keep in mind that they return a generator by default, so we have to cast them back into a list to actually apply the transformation.

In [105]:
list(map(lambda x: x+1, [1,2,3,4,5]))

[2, 3, 4, 5, 6]

In [106]:
list(filter(lambda x: x % 2 == 0, [1,2,3,4,5]))

[2, 4]

In [107]:
# in Python3, reduce() isn't a built-in function anymore
# and has to be imported from the functools module
from functools import reduce

reduce(lambda x, y: x+y, [1,2,3,4,5])

15