<div style="text-align: right">INFO 6105 Data Science Eng Methods and Tools, Lecture 2 Day 2</div>
<div style="text-align: right">Dino Konstantopoulos, 15 September 2022</div>

## A brief introduction to the language Python in 10 chapters

[Python](http://www.python.org/) is a modern, general-purpose, object-oriented, high-level programming language. It is widely used in science and engineering, and has gained considerable traction in the domain of scientific computing over the past 5 years, some examples: 

+ Python is used at NASA for the Mars rover Curiosity mission 
+ The [Space Telescope Science Institute](http://www.stsci.edu/institute/software_hardware/pyraf/stsci_python) manages the operation of the Hubble Space Telescope with Python

Some positive attributes of Python that are often cited: 

* **Simple**: It is easy to read and relatively easy to learn (albeit not the easiest language to learn)
* **Expressive**: Fewer lines of code, fewer bugs and easy to maintain.
* **Powerful**: Python works as a script-type tool all the way to large projects, Big Data, High Performance Computing applications, data science, etc.
* **Batteries included**: The [**standard library**](http://docs.python.org/2/library/) is huge and includes some very useful libraries.
* **Many libraries**: There are tons of add-on libraries that will make your life easier, and guess what, they are all open source! So you can peek inside and change them at will.

A Python (or Jupyter) notebook implements Don Knuth's [literate programming](https://en.wikipedia.org/wiki/Literate_programming) idea: mixing code with english text to explain every piece of computation. It's prrrrrrrfect :-)

![Literate_Programming](https://upload.wikimedia.org/wikipedia/en/6/62/Literate_Programming_book_cover.jpg)

In [None]:
# look, this is code!
1 + 2

And here is where we're going to talk about the computation above. 

It adds two numbers. *Not* two vectors, *nor* two matrices, just two numbers. Boring, right? Especially after all your **high-dimensional R code**! But that is where programming starts. And if you feel pretty comfortable adding two numbers on a calculator, and you were comfortable adding vectors and spreadsheets in your R homework, there is no reason why you cannot graduate to advanced data science programming.

## 1. The philosophy of Python

If you type:

In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


## 2. Operators

*Variables* in a computer program are **placeholders for data**. They are the most important, *1st component of programming*. working with *containers* (many numbers at the same time) is the 2nd.

What *kind* of data (the **type** of the data) is the 3rd most important concept. We'll talk about later. For now, let's use *simple types*. The Assignement operator is ```=```. By typing the variable as the last row in a cell, you can examine the data it contains.

In [140]:
a = 5 
a

5

Here is a cell where we just print the data *without* assigning it to a variable:

In [141]:
a * 2

10

Oh, look the cell above ***remembered*** the value of a from the cell above. So we have *memory* ***across our notebook cells***. Neat!

Here is the **increment** operator:

In [142]:
a += 2 # same as a = a + 2

In [143]:
a

7

and the **decrement** operator:

In [145]:
a -=2

In [146]:
a

5

`**` is used for exponentiation 

In [3]:
x = 2

In [4]:
x**2

4

But you have another option:

In [5]:
b = pow(x,2)
b

4

In [6]:
import numpy
numpy.sqrt(b)

2.0

Oh god! How to remember all this?

You don't have to! Just ***google***!

## 3. Singular Types and Data structures

Ok, let's talk about the different *types* of simple numbers.

These are just single numbers, so we're not really doing *programming*. Rather, we're simply "*calculating*" :-) But instead of calculating with numbers, we're calculating with *Objects*, which can have different **types**.

### Floats

The `float` type extends integers to decimals.

In [7]:
x = 2.0 # can use 2. if you feel lazy today

In [8]:
type(x)

float

In [9]:
x = float(2)

In [10]:
type(x)

float

In [11]:
x

2.0

### Integers and ```Long``` integers

Integers is the simplest type. Integers contain 32 bits (four bytes) and thus range from 0 to $2^{32}$, or 0 to 65535 for positive integers, and $-2^{31}$ to $2^{31}$, or -32768 to 32767, for signed integers.

In [19]:
x = 1

In [20]:
type(x)

int

In [21]:
x = int(1.2) ### will take the integer part 

In [23]:
x=1.2

In [25]:
type(x)

float

```Long``` integers have **no** range limitation in python (see [here](https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic)). Note that Python converts ```int``` to ```long``` *automatically* if needed.

Arbitrary-precision arithmetic, also called ***bignum arithmetic***, ***multiple-precision arithmetic***, or sometimes ***infinite-precision arithmetic***, indicates that calculations are performed on numbers whose digits of precision are limited only by the available memory of the host system. This contrasts with the faster fixed-precision arithmetic found in most arithmetic logic unit (ALU) hardware, which typically offers between 8 and 64 bits of precision.

It's a bit like Python's type system. Python makes sure that it's never in the way of your productivity.

In [26]:
x = 1L

SyntaxError: invalid syntax (1261149322.py, line 1)

`L` does not work anymore in python 3!

In [169]:
type(x)

int

In [170]:
x = 2**64

In [171]:
type(x)

int

In [172]:
x

18446744073709551616

Remember the number of atoms in the known universe from our ***Big numbers*** slides? Well, python has no problem with it:

In [6]:
y = 10**80

In [7]:
y

100000000000000000000000000000000000000000000000000000000000000000000000000000000

In [8]:
type(y)

int

Try *that* in another language!

Btw, what is *your* favorite big number?

<br />
<center>
<img src = ipynb.images/neuron.webp width = 600 />
</center>

### Booleans 

Used to represent ```True``` and ```False``` values. Usually they arise as the result of a logical operation.

They are the data type of *predicate* expressions.

In [27]:
x = True

In [28]:
type(x)

bool

In [29]:
x = 1

In [30]:
x == 0

False

In [31]:
x == 1

True

When you see a ```==``` sign, add a ```is``` before the assignment, add a question mark at the end!

```is x = 0 ?```

In [32]:
y = (x == 0); y

False

In [33]:
x = [True, True, False, True]

In [34]:
sum(x)

3

Yikes! What is that square bracket monster above? We'll see further down (hint: it's a ```list```).

## 4. More complicated Operators

In [None]:
# Modulo operation
7 % 3  # => 1

# Enforce precedence with parentheses
(1 + 3) * 2  # => 8

# negate with not
not True  # => False
not False  # => True

# Equality as a logical predicate is ==
1 == 1  # => True
2 == 1  # => False

# Inequality is !=
1 != 1  # => False
2 != 1  # => True

# More comparisons
1 < 10  # => True
1 > 10  # => False
2 <= 2  # => True
2 >= 2  # => True

# Note: Comparisons can be chained!
1 < 2 < 3  # => True
2 < 3 < 2  # => False

In [10]:
1 < 2 < 2

False

In [37]:
1<2<2

False

Here we begin to see some of the advantages of python. It's pretty ***terse***, right? Try ```1 < 2 < 3``` in java...

## 5. Strings

<br />
<center>
<img src = ipynb.images/bananas.png width = 300 />
</center>

You can define a string as any valid sequence of characters surrounded by double quotes:

In [183]:
sentence = "It's the end of the hurricane."; print(sentence)

It's the end of the hurricane.


Or single quotes:

In [11]:
sentence = '0 for NOAA and 1 for "climate change" so far.'; print(sentence)

0 for NOAA and 1 for "climate change" so far.


Or even triple quotes, which present the luxury of being able to be broken down into mutlipe lines:

In [38]:
sentence = """Who's going to win the Superbowl again?

Mahomes maybe?"""; print(sentence)

Who's going to win the Superbowl again?

Mahomes maybe?


In [39]:
a = 1
try:
    len(a)
except:
    print("not a container data type!")

not a container data type!


In [191]:
len(sentence) #!

55

In [192]:
sentence[0]

'W'

You can convert types above (floats, ints, Longs) to a string with the ```str``` function

In [193]:
pi = str(3.14)
pi

'3.14'

In [194]:
float(pi)

3.14

In [195]:
str(int(float(pi)))

'3'

In [196]:
# Strings can be added
"Hello " + "world!"  # => "Hello world!"
# Strings can be added without using '+'
"Hello " "world!"  # => "Hello world!"

# ... or multiplied
"Hello" * 3  # => "HelloHelloHello"


'HelloHelloHello'

###  Slicing strings: A string is a python *iterable* 

In [197]:
# A string can be treated like a list of characters
"This is a string"[0]  # => 'T'

'T'

You can INDEX a string variable, indexing in Python starts at 0 (not 1 like in ```R```): the subscript refers to an **offset** from the starting position of an iterable, so the first element has an offset of zero

If you want to know more follow [why python uses 0-based indexing](http://python-history.blogspot.co.nz/2013/10/why-python-uses-0-based-indexing.html)

`[start : stop : step]` is called `slicing`.  it returns a slice object representing the set of indices specified by range(start, stop, step).

In [40]:
sentence[0:9]

"Who's goi"

In [41]:
sentence[0:9:2]

'Wosgi'

A little trick: If we specify a step of -1, we start from the end and go to the start. That's because python strings are *circular*:

In [42]:
sentence[ : :-1]

"?ebyam semohaM\n\n?niaga lwobrepuS eht niw ot gniog s'ohW"

In [43]:
sentence

"Who's going to win the Superbowl again?\n\nMahomes maybe?"

If we write sentence [:-1], this returns all elements [:] except the last one: -1. So this should drop the period at the end of the sentence.

In [44]:
sentence[:-1]

"Who's going to win the Superbowl again?\n\nMahomes maybe"

One way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of *n* characters has index *n*. For example:

In [46]:
x='python'

In [49]:
x[::-1]

'nohtyp'

The first row of numbers gives the position of the indices 0...6 in the string; the second row gives the corresponding negative indices. The slice from i to j consists of all characters between the edges labeled i and j, respectively.

For non-negative indices, the length of a slice is the difference of the indices, if both are within bounds. For example, the length of word[1:3] is 2.

Strings are **immutable**: You cannot change string elements in place:

In [201]:
sentence

"Who's going to win the Superbowl again?\n\nMahomes maybe?"

In [203]:
sentence[2] = "blabla"

TypeError: 'str' object does not support item assignment

In [14]:
sentence

"Who's going to win the Superbowl again?\n\nMahomes maybe?"

In [15]:
new_sentence = sentence[0:1] + "______________________" + sentence[3:]
new_sentence

"W______________________'s going to win the Superbowl again?\n\nMahomes maybe?"

A lot of handy methods are available to manipulate strings

In [50]:
print(sentence.upper())

WHO'S GOING TO WIN THE SUPERBOWL AGAIN?

MAHOMES MAYBE?


In [51]:
sentence.endswith('?')

True

In [52]:
sentence.startswith('W')

True

In [53]:
my_array = sentence.split() # by default split on whitespaces, returns a list (see below)
my_array

["Who's",
 'going',
 'to',
 'win',
 'the',
 'Superbowl',
 'again?',
 'Mahomes',
 'maybe?']

In [54]:
' '.join(my_array)

"Who's going to win the Superbowl again? Mahomes maybe?"

### String contenation and formatting

In [55]:
"The answer is " + "42"

'The answer is 42'

In [56]:
";".join(["The answer is ","42"]) # ["The answer is ","42"] is a list with two elements (separated by a ,)

'The answer is ;42'

In [57]:
a = 42

In [61]:
"The answer is %s" % ( a )

'The answer is 42'

In [62]:
"The answer is %4.2f" % ( a )

'The answer is 42.00'

In [63]:
"The answer is {0:<6.4f}, {0:<6.4f} and not {1:<6.4f} ".format(a,42.0001)

'The answer is 42.0000, 42.0000 and not 42.0001 '

In [64]:
!pwd

'pwd' is not recognized as an internal or external command,
operable program or batch file.


## 6. Container Types

<br />
<center>
<img src="ipynb.images/container.png" width = 400 />
</center>

Now we move on to *programming* (namely the *2nd fundamental concept of programming*): Working with multiple numbers/Objects at the same time, with so-called **container types**. Same as what we did in our R labs!

Container types are types that include *many* values (like our R labs), each of which can be of different type. Lists, tuples, sets, and dictionaries are the different Container types. Sets are just like lists except the elements are always unique (cannot be duplicated). Tuples are like lists, but immutable. Dictionaries allow you to relate two items: A `Key`, and its `Value`.

There's a few differences between python and R: In Python, indexing starts at 0 and you can use negative indexing in order to seek from the end. But also, there is no matrix data type in python!

***Python does not know about matrices***!! Oh god! So how do we work with matrices in python? Let's figure this out below.

Let's start with the basic container type: The `List`.

### Lists

In [214]:
int_list = [1,2,3,4,5,6]  #[]
int_list

[1, 2, 3, 4, 5, 6]

In [215]:
type(int_list)

list

In [216]:
str_list = ['thing', 'stuff', 'truc']

In [217]:
str_list

['thing', 'stuff', 'truc']

In [218]:
type(str_list)

list

Lists can contain **anything** (items of distinct type):

In [219]:
mixed_list = [1, 1., 2+3J, 'sentence', """
long sentence
"""]

In [65]:
a=2+3j

In [66]:
type(a)

complex

In [220]:
mixed_list

[1, 1.0, (2+3j), 'sentence', '\nlong sentence\n']

In [223]:
for l in mixed_list:
    print(type(l))
type(l)

<class 'int'>
<class 'float'>
<class 'complex'>
<class 'str'>
<class 'str'>


str

#### Accessing elements and slicing lists 

```lists``` are *iterable*, their items (elements) can be accessed in a similar way as that we saw for strings 

In [224]:
int_list[0]

1

In [225]:
int_list[1]

2

In [226]:
int_list[::-1] ## same as int_list.reverse() but it is NOT operating in place

[6, 5, 4, 3, 2, 1]

In [None]:
int_list

lists can be nested (list of lists). Oh... wait... could that be one way to simulate matrices?

In [227]:
x = [[1,2,3],[4,5,6]]

In [228]:
x

[[1, 2, 3], [4, 5, 6]]

In [229]:
x[0]

[1, 2, 3]

In [230]:
x[1]

[4, 5, 6]

In [231]:
x[1][0]

4

In [None]:
x

```append``` is one of the most useful list methods

In [232]:
int_list.append('hello!')
print(int_list)

[1, 2, 3, 4, 5, 6, 'hello!']


lists are ***mutable***: you can change their elements in place 

In [233]:
int_list[0] = 2
print(int_list)

[2, 2, 3, 4, 5, 6, 'hello!']


In [234]:
int_list.reverse() 
print(int_list)

['hello!', 6, 5, 4, 3, 2, 2]


In [235]:
int_list ### ! list object methods are applied 'in place'

['hello!', 6, 5, 4, 3, 2, 2]

In [236]:
int_list.count(2)

2

### Tuples

Tuples are also iterables, and they can be indexed and sliced like lists

In [237]:
int_tup = (1,2,3,5,6,7)
int_tup

(1, 2, 3, 5, 6, 7)

In [238]:
int_tup[1:3]

(2, 3)

In [239]:
int_tup.index(2)

1

This construction is also possible

In [240]:
tup = 1,2,3

In [241]:
tup

(1, 2, 3)

Tuples however **are not** mutable, contrarily to lists.

In [242]:
int_tup[0] = 1

TypeError: 'tuple' object does not support item assignment

### Sets

Sets are like lists but can contain ***no duplicate elements***. Don't worry about adding duplicates to a set, though. Python will get rid of duplicates for you!

Also, you can do linear algebra with sets. You know, ***union***, ***intersection***, etc.

In [67]:
empty_set = set()

filled_set = {1, 2, 2, 3, 4}

filled_set.add(5) 

filled_set

{1, 2, 3, 4, 5}

In [68]:
empty_set = set()

filled_set = {1, 2, 2, 3, 4}

filled_set.add(5) 

# Do set intersection with &
other_set = {3, 4, 5, 6}
filled_set & other_set  # => {3, 4, 5}

# Do set union with |
filled_set | other_set  # => {1, 2, 3, 4, 5, 6}

# Do set difference with -
{1, 2, 3, 4} - {2, 3, 5}  # => {1, 4}

# Check if set on the left is a superset of set on the right
{1, 2} >= {1, 2, 3}  # => False

# Check if set on the left is a subset of set on the right
{1, 2} <= {1, 2, 3}  # => True

# Check for existence in a set with in
2 in filled_set  # => True
10 in filled_set  # => False
10 not in filled_set # => True

True

In [69]:
aset={1,2,2,3,4}

In [70]:
aset

{1, 2, 3, 4}

In [245]:
{1, 2} < {1, 2, 3} 

True

**Useful trick: ```zipping``` lists**

`zip` is a wickedly useful operator for objects, much like it is for garments! Nothing easier for bringing items together. We'll use it ***a lot*** in class!

In [71]:
my_list = [0,1,2,3,4,5]
my_list

[0, 1, 2, 3, 4, 5]

In [72]:
my_list_2 = [x for x in range(6)]
my_list_2

[0, 1, 2, 3, 4, 5]

In [73]:
my_list = (0,1,2,3,4,5)
my_list

(0, 1, 2, 3, 4, 5)

In [74]:
my_list_2 = (x for x in range(6))
list(my_list_2)

[0, 1, 2, 3, 4, 5]

In [75]:
my_list_2 = range(6)
list(my_list_2)

[0, 1, 2, 3, 4, 5]

In [255]:
print(list(range(0,100)))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]


In [256]:
my_list = range(0,100)
','.join([ str(c) for c in list(my_list)])

'0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99'

In [257]:
print(list(my_list))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]


In [76]:
a = range(5); print (a)

range(0, 5)


In [77]:
b = range(5,10); print (b) 

range(5, 10)


In [78]:
range(list(a)[0],list(b)[-1])

range(0, 9)

In [262]:
list(zip(a,b)) # returns a list of tuples

[(0, 5), (1, 6), (2, 7), (3, 8), (4, 9)]

In [79]:
p=[1,2,3,4,5]

In [80]:
q=[6,7,8,9,10]

In [81]:
list(zip(a,b))

[(0, 5), (1, 6), (2, 7), (3, 8), (4, 9)]

Enough for today?

![sloth](https://tellingthetruth1993.files.wordpress.com/2015/06/sloth-from-imgsoup-com.jpg)

In [2]:
a = range(5); print (list(a))

[0, 1, 2, 3, 4]


In [3]:
b = range(6); print (list(b))

[0, 1, 2, 3, 4, 5]


In [4]:
list(zip(a,b))

[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]