<div style="text-align: right">INFO 6105 Data Science Eng Methods and Tools, Lecture 2 Day 1</div>
<div style="text-align: right">Dino Konstantopoulos, 9 September 2019</div>

## A brief introduction to the language Python in 10 chapters


[Python](http://www.python.org/) is a modern, general-purpose, object-oriented, high-level programming language. It is widely used in science and engineering, and has gained considerable traction in the domain of scientific computing over the past 5 years, some examples: 

+ The Bureau of Meteorology uses it to drive its hydrology prediction
+ Python used at NASA for the Mars rover Curiosity mission 
+ Astronomy: 

> * The [Space Telescope Science Institute](http://www.stsci.edu/institute/software_hardware/pyraf/stsci_python) manages the operation of the Hubble Space Telescope with Python

Some positive attributes of Python that are often cited: 

* **Simple**: It is easy to read and relatively easy to learn (albeit not the easiest language to learn)
* **Expressive**: Fewer lines of code, fewer bugs and easy to maintain.
* **Powerful**: Python works as a script-type tool all the way to large projects, Big Data, High Performance Computing applications, data science, etc.
* **Batteries included**: The [**standard library**](http://docs.python.org/2/library/) is huge and includes some really cool libraries.

A Python (or Jupyter) notebook implements Don Knuth's [literate programming](https://en.wikipedia.org/wiki/Literate_programming) idea: mixing code with english text to explain every piece of computation. It's prrrrrrrfect :-)

![Literate_Programming](https://upload.wikimedia.org/wikipedia/en/6/62/Literate_Programming_book_cover.jpg)

In [14]:
# look, this is coode!
2 + 2

4

And here is where we're going to talk about the computation above. 

It adds two numbers. Not two vectors, nor two matrices, just two numbers. Boring, right? Especially after all your **neat R code**!

## 1. The philosophy of Python

If you type:

In [21]:
import this

## 2. Operators

Variables in a computer program are **placeholders for data**. 

What *kind* of data (the **type** of the data), we'll talk about later. For now, let's use *simple types*. The Assignement operator is ```=```. By typing the variable as the last row in a cell, you can examine the data it contains.

In [1]:
a = 5 
b = 2
print(b)
a

2


5

Here is a cell where we just print the data without assigning it to a variable:

In [2]:
a * 2

10

Oh, look the cell above ***remembered*** the value of a from the cell above. So we have **memory** across our notebook. Neat!

Here is the **increment** operator:

In [3]:
a += 2 # same as a = a + 2

In [4]:
a

7

and the **decrement** operator:

In [5]:
a -=2

In [6]:
a

5

`**` is used for exponentiation 

In [7]:
x = 2

In [8]:
x**2

4

But you have another option:

In [9]:
pow(x,2)

4

Oh god! How to remember all this?

You don't have to! Just ***google***!

## 3. Singular Types and Data structures

Ok, let's talk about the different *types* of simple numbers.

### Floats

The `float` type extends integers to decimals.

In [10]:
x = 2.0 # can use 2. if you are lazy 

In [11]:
type(x)

float

In [12]:
x = float(2)

In [13]:
type(x)

float

In [14]:
x

2.0

### Integers and ```Long``` integers

Integers is the simplest type. Integers contain 32 bits (four bytes) and thus range from 0 to $2^{32}$, or 0 to 65535 for positive integers, and $-2^{31}$ to $2^{31}$, or -32768 to 32767, for signed integers.

In [15]:
x = 1

In [16]:
type(x)

int

In [17]:
x = int(1.2) ### will take the integer part 

In [18]:
x

1

```Long``` integers have **no** range limitation (see [here](https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic)). Note that Python converts ```int``` to ```long``` automatically if needed.

Arbitrary-precision arithmetic, also called ***bignum arithmetic***, ***multiple-precision arithmetic**, or sometimes ***infinite-precision arithmetic***, indicates that calculations are performed on numbers whose digits of precision are limited only by the available memory of the host system. This contrasts with the faster fixed-precision arithmetic found in most arithmetic logic unit (ALU) hardware, which typically offers between 8 and 64 bits of precision.

In [19]:
x = 1L

SyntaxError: invalid syntax (<ipython-input-19-456d179137c0>, line 1)

In [20]:
type(x)

int

In [21]:
x = 2**64

In [22]:
type(x)

int

In [23]:
x

18446744073709551616

In [24]:
y = 10**80

In [25]:
y

100000000000000000000000000000000000000000000000000000000000000000000000000000000

How big do you think the previous number is?

What is your favorite big number?

### Booleans 

Used to represent ```True``` and ```False``` values. Usually they arise as the result of a logical operation

In [26]:
x = True

In [27]:
type(x)

bool

In [28]:
x = 1

In [29]:
x == 0

False

When you see a ```==``` sign, add a ```is``` before the assignment, and a question mark at the end:

```is x = 0 ?```

In [30]:
y = (x == 0); y

False

In [31]:
x = [True, True, False, True]

In [32]:
sum(x)

3

Yikes! What is that square bracket monster above? We'll see further down (hint: it's a ```list```).

## 4. More complicated Operators

In [33]:
# Modulo operation
7 % 3  # => 1

# Enforce precedence with parentheses
(1 + 3) * 2  # => 8

# negate with not
not True  # => False
not False  # => True

# Equality as a logical predicate is ==
1 == 1  # => True
2 == 1  # => False

# Inequality is !=
1 != 1  # => False
2 != 1  # => True

# More comparisons
1 < 10  # => True
1 > 10  # => False
2 <= 2  # => True
2 >= 2  # => True

# Note: Comparisons can be chained!
1 < 2 < 3  # => True
2 < 3 < 2  # => False

False

Here we begin to see some of the advantages of python. It's pretty ***terse***, right? Try ```1 < 2 < 3``` in java...

## 5. Strings

You can define a string as any valid sequence of characters surrounded by double quotes:

In [34]:
sentence = "It's the end of the hurricane."; print(sentence)

It's the end of the hurricane.


Or single quotes:

In [35]:
sentence = '0 for NOAA and 1 for Trump so far.'; print(sentence)

0 for NOAA and 1 for Trump so far.


Or even triple quotes, which present the luxury of being able to be broken down into mutlipe lines:

In [36]:
sentence = """Who's going to win the Superbowl again?

Patriots maybe?"""; print(sentence)

Who's going to win the Superbowl again?

Patriots maybe?


In [37]:
len(sentence) #!

56

You can convert types above (floats, ints, Longs) to a string with the ```str``` function

In [38]:
str(3.14)

'3.14'

In [39]:
# Strings can be added
"Hello " + "world!"  # => "Hello world!"
# Strings can be added without using '+'
"Hello " "world!"  # => "Hello world!"

# ... or multiplied
"Hello" * 3  # => "HelloHelloHello"


'HelloHelloHello'

###  Slicing strings: A string is a python *iterable* 

In [40]:
# A string can be treated like a list of characters
"This is a string"[0]  # => 'T'

'T'

You can INDEX a string variable, indexing in Python starts at 0 (not 1 like in ```R```): the subscript refers to an **offset** from the starting position of an iterable, so the first element has an offset of zero

If you want to know more follow [why python uses 0-based indexing](http://python-history.blogspot.co.nz/2013/10/why-python-uses-0-based-indexing.html)

`[start : stop : step]` is called `slicing`.  it returns a slice object representing the set of indices specified by range(start, stop, step).

In [41]:
sentence[0:9]

"Who's goi"

In [42]:
sentence[0:9:2]

'Wosgi'

A little trick: If we specify a step of -1, we start from the end and go to the start. That's because python strings are *circular*:

In [43]:
sentence[::-1]

"?ebyam stoirtaP\n\n?niaga lwobrepuS eht niw ot gniog s'ohW"

If we write sentence [:-1], this returns all elements [:] except the last one: -1. So this should drop the period at the end of the sentence.

In [44]:
sentence[:-1]

"Who's going to win the Superbowl again?\n\nPatriots maybe"

One way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of *n* characters has index *n*. For example:

The first row of numbers gives the position of the indices 0...6 in the string; the second row gives the corresponding negative indices. The slice from i to j consists of all characters between the edges labeled i and j, respectively.

For non-negative indices, the length of a slice is the difference of the indices, if both are within bounds. For example, the length of word[1:3] is 2.

Strings are **immutable**: You cannot change string elements in place:

In [45]:
sentence[2] = "blabla"

TypeError: 'str' object does not support item assignment

A lot of handy methods are available to manipulate strings

In [46]:
print(sentence.upper())

WHO'S GOING TO WIN THE SUPERBOWL AGAIN?

PATRIOTS MAYBE?


In [47]:
sentence.endswith('.')

False

In [48]:
sentence.split() # by default split on whitespaces, returns a list (see below)

["Who's",
 'going',
 'to',
 'win',
 'the',
 'Superbowl',
 'again?',
 'Patriots',
 'maybe?']

### String contenation and formatting

In [49]:
"The answer is " + "42"

'The answer is 42'

In [50]:
";".join(["The answer is ","42"]) # ["The answer is ","42"] is a list with two elements (separated by a ,)

'The answer is ;42'

In [51]:
a = 42

In [52]:
"The answer is %s" % ( a )

'The answer is 42'

In [53]:
"The answer is %4.2f" % ( a )

'The answer is 42.00'

In [54]:
"The answer is {0:<6.4f}, {0:<6.4f} and not {1:<6.4f} ".format(a,42.0001)

'The answer is 42.0000, 42.0000 and not 42.0001 '

## 6. Container Types

Container types are types that include *many* values (like our R labs), each of which can be of different type. Lists, tuples, sets, and dictionaries are the different Container types. Sets are just like lists except the elements are always unique (cannot be duplicated). Tuples are like lists, but immutable. Dictionaries allow you to relate two items: A `Key`, and its `Value`.

And also, ***python does not know about matrices***!! Oh god! So how do we work with matrices in python?

Let's start with the basic container type: The `List`.

### Lists

In [55]:
int_list = [1,2,3,4,5,6]

In [56]:
int_list

[1, 2, 3, 4, 5, 6]

In [57]:
str_list = ['thing', 'stuff', 'truc']

In [58]:
str_list

['thing', 'stuff', 'truc']

lists can contain **anything** (items of distinct type):

In [59]:
mixed_list = [1, 1., 2+3J, 'sentence', """
long sentence
"""]

In [60]:
mixed_list

[1, 1.0, (2+3j), 'sentence', '\nlong sentence\n']

In [61]:
type(mixed_list[0])

int

#### Accessing elements and slicing lists 

```lists``` are iterable, their items (elements) can be accessed in a similar way as that we saw for strings 

In [62]:
int_list[0]

1

In [63]:
int_list[1]

2

In [64]:
int_list[::-1] ## same as int_list.reverse() but it is NOT operating in place

[6, 5, 4, 3, 2, 1]

In [65]:
int_list

[1, 2, 3, 4, 5, 6]

lists can be nested (list of lists). Oh... that would be one way to simulate matrices, right?

In [66]:
x = [[1,2,3],[4,5,6]]

In [67]:
x[0]

[1, 2, 3]

In [68]:
x[1]

[4, 5, 6]

In [69]:
x[0][1]

2

```append``` is one of the most useful list methods

In [70]:
int_list.append(7); print(int_list)

[1, 2, 3, 4, 5, 6, 7]


lists are ***mutable***: you can change their elements in place 

---
# Note
Difference between **extend** and **append**

In [1]:
test1 = [[1,2,3],[4,5,6]]

In [7]:
test2 = []
test2.append(test1)
print(test2)
test3 = []
test3.extend(test1)
print(test3)

[[[1, 2, 3], [4, 5, 6]]]
[[1, 2, 3], [4, 5, 6]]


---

In [71]:
int_list[0] = 2; print(int_list)

[2, 2, 3, 4, 5, 6, 7]


In [72]:
int_list.reverse() 

In [73]:
int_list ### ! list object methods are applied 'in place'

[7, 6, 5, 4, 3, 2, 2]

In [74]:
int_list.count(2)

2

### Tuples

Tuples are also iterables, and they can be indexed and sliced like lists

In [75]:
int_tup = (1,2,3,5,6,7)

In [76]:
int_tup[1:3]

(2, 3)

In [77]:
int_tup.index(2)

1

This construction is also possible

In [78]:
tup = 1,2,3

In [79]:
tup

(1, 2, 3)

Tuples however **are not** mutable, contrarily to lists.

In [80]:
int_tup[0] = 1

TypeError: 'tuple' object does not support item assignment

### Sets

Sets are like lists but can contain ***no duplicate elements***. Don't worry about adding duplicates to a set, though. Python will get rid of duplicates for you!

Also, you can do linear algebra with sets. You know, ***union***, ***intersection***, etc.

In [81]:
empty_set = set()

filled_set = {1, 2, 2, 3, 4}

filled_set.add(5) 

# Do set intersection with &
other_set = {3, 4, 5, 6}
filled_set & other_set  # => {3, 4, 5}

# Do set union with |
filled_set | other_set  # => {1, 2, 3, 4, 5, 6}

# Do set difference with -
{1, 2, 3, 4} - {2, 3, 5}  # => {1, 4}

# Check if set on the left is a superset of set on the right
{1, 2} >= {1, 2, 3}  # => False

# Check if set on the left is a subset of set on the right
{1, 2} <= {1, 2, 3}  # => True

# Check for existence in a set with in
2 in filled_set  # => True
10 in filled_set  # => False
10 not in filled_set # => True

True

**Useful trick: ```zipping``` lists**

`zip` is a wickely useful operator for objects, much like it is for garments! Nothing easier for bringing items together. We'll use it ***a lot*** in class!

In [9]:
a = range(5); print(a)

range(0, 5)


In [10]:
b = range(5,10); print(b) 

range(5, 10)


In [11]:
a + b

TypeError: unsupported operand type(s) for +: 'range' and 'range'

In [12]:
zip(a,b) # returns a list of tuples

<zip at 0x1170c089408>

In [13]:
c = zip(a,b)

---
# Note
How does function zip work

In [17]:
test1 = range(0,5)
test2 = range(5,10)
print(test1)
print(test2)
test3 = zip(test1,test2)
for i,j in test3:
    print("i:{0} j:{1}".format(i,j))

range(0, 5)
range(5, 10)
i:0 j:5
i:1 j:6
i:2 j:7
i:3 j:8
i:4 j:9


---

In [87]:
c[1]

TypeError: 'zip' object is not subscriptable

Enough for today?

![sloth](https://tellingthetruth1993.files.wordpress.com/2015/06/sloth-from-imgsoup-com.jpg)