<div style="text-align: right">INFO 6105 Data Sci Engineering Tools and Methods, Lecture 1, Day 2</div>
<div style="text-align: right">Prof. Dino Konstantopoulos, 9 January 2019</div>

## A brief introduction to the language Python


[Python](http://www.python.org/) is a modern, general-purpose, object-oriented, high-level programming language. It is widely used in science and engineering, and has gain considerable traction in the domain of scientific computing over the past few years, some examples: 

+ The Bureau of Meteorology uses it to drive its hydrology prediction
+ Python used at NASA for the Mars rover Curiosity mission 
+ Astronomy: 

> * The [Space Telescope Science Institute](http://www.stsci.edu/institute/software_hardware/pyraf/stsci_python) manages the operation of the Hubble Space Telescope with Python
> * Cosmological simulations with [yt](http://yt-project.org/)

<div align="center">
![](http://yt-project.org/img/gallery/alma_natcov.jpg)
</div>  
<hr size=0>

Some positive attributes of Python that are often cited: 

* **Simplicity**: It is easy to read and easy to learn, almost reads like pseudo-code in many instances
* **Expressive**: Fewer lines of code, fewer bugs and easy to maintain.
* **Powerful**: Python is not a language you grow out of. It can also be used for large projects, Big Data, High Performance Computing applications, etc.
* **Batteries included**: The [**standard library**](http://docs.python.org/2/library/) is huge and includes some really cool libraries.

Python (with R) is the main language of Data Science. It is the language we will use in class.

## The philosophy of Python

If you type:

In [None]:
import this

## Operators

Assignement operator is ```=```

In [None]:
a = 5 
a

In [None]:
a * 2

In [9]:
a += 2 # same as a = a + 2

In [None]:
a

In [11]:
a -=2

In [None]:
a

** is used for exponentiation 

In [13]:
x = 2

In [None]:
x**2

In [None]:
pow(x,2)

## Singular Types and Data structures

### Floats

In [19]:
x = 2.0 # can use 2. if you are lazy 

In [None]:
type(x)

In [21]:
x = float(2)

In [None]:
type(x)

In [None]:
x

### Complex numbers 

can be created using the ```J``` notation or the ```complex``` function

In [23]:
x = 2 + 3J

In [None]:
print(type(x)); print(x)

In [25]:
x = complex(2, 3)

In [None]:
print(type(x)); print(x)

### Integers and ```Long``` integers

In [28]:
x = 1

In [None]:
type(x)

In [31]:
x = int(1.2) ### will take the integer part 

In [None]:
x

Normal integers can range from $-2^{31}$ to $2^{31}$, ```Long``` integers have NO range limitation (see [here](https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic)). Note that Python converts ```ints``` to ```Long``` automatically if needed

In [33]:
x = 1L

In [None]:
type(x)

In [35]:
x = 2**64

In [None]:
type(x)

In [None]:
x

### Booleans 

Used to represent ```True``` and ```False```. Usually they arise as the result of a logical operation

In [38]:
x = True

In [None]:
type(x)

In [40]:
x = 1

In [None]:
x == 0

In [None]:
y = (x == 0); y

In [43]:
x = [True, True, False, True]

In [None]:
sum(x)

### Strings

You can define a string as any valid characters surrounded by single quotes

In [None]:
sentence = 'The Guide is definitive. Reality is frequently inaccurate.'; print(sentence)

Or double quotes 

In [None]:
sentence = "I'd take the awe of understanding over the awe of ignorance any day."; print(sentence)

Or triple quotes 

In [None]:
sentence = """Time is an illusion.

Lunchtime doubly so."""; print(sentence)

In [None]:
len(sentence) #!

And you can convert the types above (floats, complex, ints, Longs) to a string with the ```str``` function

In [None]:
str(complex(2,3))

####  A string is a python *iterable* 

You can INDEX a string variable, indexing in Python starts at 0 (not 1): the subscript refers to an **offset** from the starting position of an iterable, so the first element has an offset of zero

If you want to know more follow [why python uses 0-based indexing](http://python-history.blogspot.co.nz/2013/10/why-python-uses-0-based-indexing.html)

In [None]:
sentence[0:4]

In [None]:
sentence[::-1]

But it is **immutable**: You cannot change string elements in place

In [None]:
sentence[2] = "blabla"

A lot of handy methods are available to manipulate strings

In [None]:
print(sentence.upper())

In [None]:
sentence.endswith('.')

In [None]:
sentence.split() # by default split on whitespaces, returns a list (see below)

#### String contenation and formatting

In [None]:
"The answer is " + "42"

In [None]:
";".join(["The answer is ","42"]) # ["The answer is ","42"] is a list with two elements (separated by a ,)

In [59]:
a = 42

In [None]:
"The answer is %s" % ( a )

In [None]:
"The answer is %4.2f" % ( a )

In [None]:
"The answer is {0:<6.4f}, {0:<6.4f} and not {1:<6.4f} ".format(a,42.0001)

## Container Types

Container types are types that include many values, each of which can be of different type. Lists, tuples, sets, and dictionaries are the different Container types. Here we cover lists, tuples, and dicionaries. Sets are just like lists except the elements are always unique (cannot be duplicated).

### Lists

In [64]:
int_list = [1,2,3,4,5,6]

In [None]:
int_list

In [66]:
str_list = ['thing', 'stuff', 'truc']

In [None]:
str_list

lists can contain anything

In [None]:
mixed_list = [1, 1., 2+3J, 'sentence', """
long sentence
"""]

In [None]:
mixed_list

In [None]:
type(mixed_list[0])

#### Accessing elements and slicing lists 

```lists``` are iterable, their items (elements) can be accessed in a similar way as we saw for strings 

In [None]:
int_list[0]

In [None]:
int_list[1]

In [None]:
int_list[::-1] ## same as int_list.reverse() but it is NOT operating in place

In [None]:
int_list

lists can be nested (list of lists)

In [76]:
x = [[1,2,3],[4,5,6]]

In [None]:
x[0]

In [None]:
x[1]

In [None]:
x[0][1]

```append``` is one of the most useful list methods

In [None]:
int_list.append(7); print(int_list)

lists are mutable: you can change their elements in place 

In [None]:
int_list[0] = 2; print(int_list)

In [83]:
int_list.reverse() 

In [None]:
int_list ### ! list object methods are applied 'in place'

In [None]:
int_list.count(2)

### Tuples

Tuples are also iterables, and they can be indexed and sliced like lists

In [86]:
int_tup = (1,2,3,5,6,7)

In [None]:
int_tup[1:3]

In [None]:
int_tup.index(2)

This construction is also possible

In [89]:
tup = 1,2,3

In [None]:
tup

Tuples ARE NOT mutable, contrary to lists

In [None]:
int_tup[0] = 1

**Useful trick: ```zipping``` lists**

In [None]:
a = range(5); print a

In [None]:
b = range(5,10); print b 

In [None]:
a + b

In [None]:
zip(a,b) # returns a list of tuples

### List comprehensions

List comprehensions are one of the most useful and compacts Python expressions. 

In [None]:
str_list

In [None]:
['my ' + x for x in str_list]

In [None]:
[x.upper() for x in str_list]

In [None]:
[x+y for x,y in zip(a,b)] # using zip (above)

In [None]:
a

In [None]:
[x + 6 if (x < 3) else x for x in a]

### Dictionaries 

One of the more flexible built-in data structures is the dictionary. A dictionary maps a collection of values to a set of associated keys. These mappings are mutable, and unlike lists or tuples, are unordered. Hence, rather than using the sequence index to return elements of the collection, the corresponding key must be used. 

Dictionaries are specified by a comma-separated sequence of keys and values, which are separated in turn by colons. The dictionary is enclosed by curly braces. Dictionaries are also the general JSON format of the Web. For example:

In [None]:
my_dict = {'a':16, 'b':(4,5), 'foo':'''(noun) a term used as a universal substitute 
           for something real, especially when discussing technological ideas and 
           problems'''}
my_dict

In [None]:
my_dict['b']
(4, 5)

In [None]:
'a' in my_dict	# Checks to see if ‘a’ is in my_dict

In [None]:
my_dict.has_key('bar')	# Checks to see if a key exists

In [None]:
my_dict.items()		# Returns key/value pairs as list of tuples

In [None]:
my_dict.keys()		# Returns list of keys

In [None]:
my_dict.values()	# Returns list of values

In [None]:
my_dict['c']

If we would rather not get the error, we can use the `get` method, which returns `None` if the value is not present, or a value of your choice

In [8]:
my_dict.get('c')

In [None]:
my_dict.get('c', -1)

## Logical operators 

Logical operators will **test** for some condition and return a boolean (True, False)

#### Comparison operators

+ `>` : Greater than
+ `>=` : Greater than or equal to
+ `<` : Less than
+ `<=` : Less than or equal to
+ `==` : Equal to
+ `!=` : Not equal to

**is / is not**

Use **==** (**!=**) when comparing values and **is** (**is not**) when comparing **identities**.

In [12]:
x = 5.

In [None]:
type(x)

In [14]:
y = 5

In [None]:
type(y)

In [None]:
x == y

In [None]:
x is y # x is a float, y is a int, they point to different addresses in memory

#### Some examples of common comparisons

In [18]:
a = 5
b = 6

In [None]:
a == b

In [None]:
a != b

In [None]:
(a > 4) and (b < 7)

In [None]:
(a > 4) and (b > 7)

In [None]:
(a > 4) or (b > 7)

**All** and **Any** can be used for a *collection* of booleans

In [24]:
x = [5,6,2,3,3]

In [25]:
cond = [item > 2 for item in x]

In [None]:
cond

In [None]:
all(cond)

In [None]:
any(cond)

## Control flow structures

#### Indentation is meaningful

In Python, there are no annoying curly braces, parenthesis, brackets etc as in other languages  to delimitate flow control blocks, instead, the INDENTATION plays this role.

In [None]:
for x in xrange(10): 
    if x < 5:
        print x**2
    else:
        print x 

**Note**: The standard is to use 4 spaces (**NOT** tabs) for the indentation, set your favorite editor accordingly, for example in vi / vim: 

    set tabstop=4
    set expandtab
    set shiftwidth=4
    set softtabstop=4


When editing a code cell in IPython, the indentation is handled intelligently, try typing in a new blank cell: 

    for x in xrange(10): 
        if x < 5:
            print x**2
        else:
            print x 
            

In [None]:
for x in xrange(10):
    if x < 5: 
        

#### if ... elif ... else

In [None]:
x = 10

if x < 10: # not met
    x = x + 1
elif x > 10: 
    x = x - 1 # not met either 
else: 
    x = x * 2
    
print(x)

In [None]:
x = 10

if (x > 5 and x < 8): 
    x = x+1
elif (x > 5 and x < 12): 
    x = x * 3
else:
    x = x-1
    
print(x)

#### The For loop 

￼The basic structure of FOR loops is ￼

    for item in iterable: 
        expression(s)
        

In [None]:
count = 0
# x = range(1,10) # range creates a list ... 
# xrange is a convenience function, it creates an iterator rather than a list
# which has a smaller memory footprint
x = range(1,10) 
for i in x:
    count += i
    print(count)

#### try ... except

You can see it as a generalization of the ```if ... else``` construction, allowing more flexibility in handling failures in code

In [None]:
text = ('a','1','54.1','43.a')
for t in text:
    try:
        temp = float(t)
        print(temp)
    except ValueError:
        print(str(t) + ' is Not convertible to a float')

A list of built-in exceptions is available here 

[http://docs.python.org/3.1/library/exceptions.html](http://docs.python.org/3.1/library/exceptions.html)

## Recycling code in Python

As with Matlab and R, it's a good idea to write **functions** for bits of code that you use often. 

The syntax for defining a function in Python is: 

    def name_of_function(arguments): 
        "Some code here that works on arguments and produces outputs"
        ...
        return outputs

Note that the execution block **must be indented** ... 

Unlike Matlab, you can create a file (a **module**: extension .py required) which contains **several** functions, and can also define variables, and import some other functions from other modules

In [None]:
%%file some_module.py 

PI = 3.14159 # defining a variable

from numpy import arccos # importing a function from another module

def f(x): 
    """
    This is a function which adds 5 to its argument
     
    """
    return x + 5

def g(x, y): 
    """
    This is a function which sums its 2 arguments
    """
    return x + y

In [3]:
import some_module

In [None]:
%whos

In [None]:
dir(some_module)

In [None]:
help(some_module)

In [None]:
some_module.PI

In [39]:
some_module.arccos?

In [None]:
some_module.f(7)

In [None]:
help(some_module.f)

In [42]:
from some_module import f

In [None]:
f(5)

In [44]:
import some_module as sm

In [None]:
sm.f(10)

The Zen of python says: 
    
```Namespaces are one honking great idea -- let's do more of those!```
    
so **don't** do: 

    from some_module import *
    
As to avoid names conflicts ...

#### a bit more on functions: 

Functions can have **positional** as well as **keyword** arguments (with defaults, can be `None` if that's allowed / tested)

positional arguments must always come before keyword arguments

In [46]:
def some_function(a,b,c=5,d=1e3): 
    res = (a + b) * c * d
    return res

In [None]:
some_function(2,3)

In [None]:
some_function(2, 3, c=5, d=0.01)

you can return more than one output, by default will be a tuple

In [49]:
def some_function(a, b): 
    return a+1, b+1, a*b

In [53]:
a,b,c = some_function(2,3)

In [None]:
c

In [None]:
type(res)