# Python Basic Concepts

In [1]:
import addutils.toc ; addutils.toc.js(ipy_notebook=True)

In [2]:
from addutils import css_notebook
css_notebook()

## 1 Why Python?

As far as scientific computing is concerned, it's hard to find a good alternative to Python. Python is the do it all language. If you want to perform a statistical analysis, then model some data, and then come up with a GUI and web platform to share with other users, you can perfectly do this with Python.

Nevertheless Python tutorials for Data Analysis in Engineering, Finance and Scientific applications are difficult to find.
For this reason we made a complete set of courses and tutorials to address the scientist's and engineer's needs:

* **Get data** (simulation, experiment control)
* **Manipulate and process data.**
* **Visualize results**... to understand what we are doing!
* **Communicate results:** produce figures for reports or publications, write presentations.

We use Python because it's:

* **OPEN**: The Python implementation is under an open source license that makes it freely usable and distributable, even for commercial use.
* **BATTERIES INCLUDED**: Rich collection of advanced scientific computing libraries and general libraries: we don’t want to re-program the plotting of a curve, a Fourier transform or a fitting algorithm. Don’t reinvent the wheel!
* **FRIENDLY and EASY TO LEARN**: Python allows you to do almost anything possible
with a compiled language (C/C++/Fortran) without requiring all the complexity. It is
extensible in C or C++. Clear syntax enhances readability: “Executable Pseudo Code”
* **RUNS EVERYWERE**: It runs on many Unix variants, on the Mac, and on PCs under MSDOS,
Windows, Windows NT, and OS/2, Android and many other platforms.

###<font color="Green">Alternatives to Python…</font>

* **Compiled languages: C, C++, Fortran, etc.**
    * *Advantages:* Fast, Optimized compilers, Very optimized scientific libraries. Example: BLAS (vector/matrix operations)
    * *Drawbacks:* Painful usage: These are difficult languages for non computer scientists.

* **Scripting languages: Matlab**
    * *Advantages:* Very rich collection of libraries, Fast execution, Good development environment.
    * *Drawbacks:* Base language quite poor, can become restrictive for advanced users, Expensive.

* **Other scripting languages: Scilab, Octave, Igor, R, IDL, etc.**
    * *Advantages:* Open-source, free, or at least cheaper than Matlab, Some advanced features (statistics in R, figures in Igor, etc.)
    * *Drawbacks:* Fewer available algorithms than in Matlab, very powerful, but they are restricted to a single type of usage.

## 2 Python101

### 2.1 Two important language features:

1. *Python is interpreted*
    1. The code doesn't require compilation 
    2. In IPython Notebook, code in cells is executed immediately
2. ***The indentation is part of the syntax***

### 2.2 The bricks of Python are:

* Built-in operators: +, -, log, sqrt, and so on.
* Built-in high level data types: strings, lists, dictionaries, etc.
* Control structures: if, if-else, if-elif-else, while, plus a powerful collection iterator (for).
* Multiple levels of organizational structure: functions, classes, scripts, modules, and packages. These assist in organizing code. An excellent and large example is the [Python standard library](http://docs.python.org/2/library/).

The operators are much like as in Matlab.
We will see and play later with the different data types and control structures, that are very handy and useful.

We defer functions and classes discussion to a more advanced tutorial. However, functions are much like Matlab functions, and classes are the basic concept of object-oriented programming, very useful for more structured, large projects. Put simply, an object is an instance of a class, as the Colosseum is an instance of the buildings class.

### 2.3 Variables, as everything else, in Python are objects

Objects have many properties. For example every object has an univocal **id**. In the following example three variables are assigned in the same codeline, then `d = a` define d to be the `a` object. In other words, `d` and `a` are two different names for the same object: this is confirmed by the same object id.

In [3]:
a, b, c = 5, 6, 7
d = a
print a, d, id(a), id(d)

5 5 36237592 36237592


**'isinstance'** checks if the passed value correspond to one of the listed instances: in this case 'a' is a float:

In [4]:
isinstance(a, (int, float, bool))

True

### 2.4 Mutable / Immutable Objects

* **Mutable Objects** can be modified after being created
* **Immutable objects** can be read but not modified (rewritten) after being created. For example, a string is immutable, so you cannot add caracters to a string without reassign the string itself.

Some Examples:

* **Strings**      are IMMUTABLE
* **Lists**        are MUTABLE
* **Tuples**       are IMMUTABLE
* **Sets**         are MUTABLE
* **Dictionaries** are MUTABLE

### 2.5 Scripts, modules and namespaces

Some words on the organizational structure of Python code:

    
* A **script is the operational unit of programming**: it is a collection of many constructs, built using operators, datatypes, control structures, functions and classes, logically connected into a single body and saved as a single file with the .py or .pyw extension, that accomplish a complete programming task. 
    * You can run a script from the Python interpreter or from Ipython. 
    * You can import the functions of a script into another script thanks to the import statement: than you treat it as a module.
* **Packages are collections of modules**, stored into a single folder that can have multiple folders, each corresponding to a subpackage. Each folder contains a special file, named `__init__.py`, that can be empty, that signal that the folder is a (sub)package. [Numpy](http://www.scipy.org/NumPy_for_Matlab_Users) and [Matplotlib](http://matplotlib.org/gallery.html) are examples of  packages.


Python is expanded by modules. To use a module it must first be imported. There are three ways to import modules:

* `import modulename` - will preserve the full package name in the namespace. To use a module keyword in the code you will use `modulename.keyword`
* `import modulename as name` - will replace the full package name with a suitable alias. To use a module keyword in the code you will use `name.keyword`
* `from modulename import *` - *THIS IS NOT ADVISABLE IN MOST CASES*: will include the package keywords in the base namespace, this means that some keywords could be overvritten. To use a module keyword in the code you will use `keyword`

Some examples:

    import math             # Then math. must be used before using any command
    import numpy as np      # Then the alias np. must be used before any command
    from pandas import *    # Import EVERYTHING in the current namespace

In [5]:
import math             # Then 'math.'must be used before using any command
math.sin(3)

0.1411200080598672

## 3 Strings

Strings can be defined with both double or single quotes. Escape codes like `\t [tab]`, `\n [newline]` or `\xHH [special character]` can be used. The output can be printed multiple times by using `*k`

In [6]:
a, b, c = 'hello', "HELLO", "Hello, how's going?"
print a, '-'*2, b, '-'*2, c

hello -- HELLO -- Hello, how's going?


The **in** function can be used to find substrings:

In [7]:
a = '\t abcdef_gh \n '
'cd' in a

True

`strip` is one of the most used functions while working with strings in Python. **Discover by yourself what it is** by using `<b>?</b>` !

In [8]:
print a.strip()

abcdef_gh


***Try by yourself*** &nbsp;the power of `split` by running the following code (`strip`, `split` and many other functions can be put in the same statement by using the **'.'** operator:

    a.strip().split('_')
    b = '236 23 32           23 55'
    b.split()

In [9]:
b = '236 23 32           23 55'
b.split()

['236', '23', '32', '23', '55']

***Try by yourself*** &nbsp;the following commands:

    c = a.strip()
    print c                                     # 'abcdef_gh'
    print c.upper()                             # 'ABCDEF_GH'
    print c.title()                             # 'Abcdef_gh'
    print c.center(30,'=')                      # '=======abcdef_gh========'
    print c.find('c')                           # 2 (index start from zero)
    print c.split('_')                          # ['abcdef', 'gh']
    print c.replace('_','')                     # 'abcdefgh'
    print ' *** '.join(['one', 'two', 'three']) # one *** two *** three


**Exercise:** format the following string and remove trailing and leading escape characters and internal separation characters, format the name to have the first letter capitalized and the other lowercase (output must be: **Johnn Richard Thompson**). Everything can be done in just one line!

In [10]:
name = '   JOHNN - Richard-Thompson  '
print ' '.join(name.strip().replace('-',' ').title().split())

Johnn Richard Thompson


More examples for `split`:  
```python
    # Split
    s1 = '236 23 32           23 55'
    s1.split()    # ['236', '23', '32', '23', '55'] - Multiple separators

    s3 = '236 32 ||  23  ||32--44||2|5||6'
    s3.split('||')      # ['236 32 ', '  23  ', '32--44', '2|5', '6']

    s3.split('||', 1)   # ['236 32 ', '  23  ||32--44||2|5||6']
    
    # Dealing with multiple separators using 'split' 
    s4 = 'a;b,c;d'
    s4.replace(';',',').upper().split(',')

    # Alternative solution: 'regexp'
    import re
    phrase = "Hey, '32' you - what are you doing here???"
    print ' '.join(re.findall('\w+', phrase))
```  

## 4 String formatting

There are many ways to format an output in Python. In past, the most common was using the % string formatting operator, a sort of placeholder, with the following syntax:  
`'Message %' %(val)`.

See the following examples (old way!):

In [11]:
"Hello %s, my name is %s, and my age is %d" % ('john', 'mike', 30)

'Hello john, my name is mike, and my age is 30'

In [12]:
print '%+08.4f' % 5.5567

+05.5567


But python is a living language, in constant evolution, with a wide and active community, which often proposes changes and improvements. A new standard for string formatting was implemented, which uses the `format` method of the `string` class ([see PEP-3101](http://www.python.org/dev/peps/pep-3101/)).   

***Try to write some code by yourself:***
```python
    print 'Name: {0}, age: {1}'.format('John', 35)
    print 'Total with tax: ${0:.2f}'.format(13.00 * 1.2)
    print '{0}, {1}, {2}'.format('a', 'b', 'c')
    print '{}, {}, {}'.format('a', 'b', 'c')
    print '{2}, {1}, {0}'.format('a', 'b', 'c') 
    print '{0}{1}{0}'.format('abra', 'cad')
    print '{:,}'.format(1234567890)
    print 'Coordinates: {latitude}, {longitude}'.format(latitude='37.24N', longitude='-115.81W')
    
    coord = {'latitude': '37.24N', 'longitude': '-115.81W'}
    print 'Coordinates: {latitude}, {longitude}'.format(**coord)
    
    print '{:<30}'.format('left aligned')
    print '{:>30}'.format('right aligned')
    print '{:^30}'.format('centered'
    print '{:*^30}'.format('centered')
    
    points = 26.5
    total = 30
    print 'correct answers: {:.2%}'.format(points/total)
```

In [13]:
print 'Name: {0}, age: {1}'.format('John', 35)

Name: John, age: 35


## 5 Lists

List are ordered Non-Homogeneus containers. Lists are MUTABLE, so the single items can be redefined without redefining the list object. The index starts from 0, not from 1! 

In [14]:
ls = [2, 3, 4, 5, 'six', 9]
ls[-1] = 8                  # Redefine the last element
print ls

[2, 3, 4, 5, 'six', 8]


***Try by yourself*** &nbsp;the following commands:
```python
    ls = [1, 2, 3]
    ls.append([11, 12, 'one'])
    ls.extend([33,44])
    ls.insert(2,[55,66])
    ls[1:1] = [77, 88, 99]     # See 'slicing' next
    ls = ls + ['aa', 'bb']
```

In [15]:
ls[1:1] = [77, 88, 99]
print ls

[2, 77, 88, 99, 3, 4, 5, 'six', 8]


***Try by yourself*** &nbsp;more commands:
```python
    ls = [5, 6, 3, 7, 3, 9, 7]
    ls.sort()
    ls.reverse()
    ls.pop()
    ls.count(7)
    len(ls)          # Length
    range(10)        # Generate a list of integers
    range(4,20,3)    # range(start, stop, step)
```

In [16]:
ls = [5, 6, 3, 7, 3, 9, 7]
ls.count(7)

2

`sort` can be used with a secondary sort key (a function to generate the key): in this case the sort key is the lenght of the strings

In [17]:
ls= ['Zr', 'wax', 'grid', 'I', 'Sir', 'zirconium']
ls.sort(key=len)
ls 

['I', 'Zr', 'wax', 'Sir', 'grid', 'zirconium']

`sort` modifies the list (sort in place). If you don't want to modify the list use the 'sorted' function

In [18]:
print sorted(ls)

['I', 'Sir', 'Zr', 'grid', 'wax', 'zirconium']


`in` checks if one element is in the list

In [19]:
'wax' in ls

True

`index` finds the position of a given element in a list

In [20]:
print ls, ls.index('grid')

['I', 'Zr', 'wax', 'Sir', 'grid', 'zirconium'] 4


Lists can be iterated with `for`. In Python the index is not requires but you can have one if you need it for your purposes. Check the following two examples

In [21]:
for string in ls:
    print string.rjust(10)

         I
        Zr
       wax
       Sir
      grid
 zirconium


In [22]:
for index, string in enumerate(ls):
    print index, string.rjust(10)

0          I
1         Zr
2        wax
3        Sir
4       grid
5  zirconium


***List comprehension*** is one of the more important constructs in Python. The general syntax is:

    [expression(argument) for argument in list if boolean_expression]

Expression can contain control structures such as if ... else.
Lets see one example. Imagine to start from a list of numbers and build a second list containing just the string representation of the numbers that can be divided by three (in Python `x%y` is the reminder of the division x/y):

In [23]:
numbers = range(4, 20)
strings = [str(number) for number in numbers if not number%3]
print strings

['6', '9', '12', '15', '18']


## 6 Slicing

Slicing can be done on any sequential object (like strings and list) and is used to extract (slice) a part of the object, delete or add elements to the object.  
***Try by yourself*** &nbsp;some slicing on a string:
```python    
    s = 'abcdefghi' + '123'       # s is a string
    s[:4]
    s[5:]
    s[::2]
    ls = list('abcdefghi')        # ls is a list
    ls[-1:-1] = ['i', 1, 2, 3]
    ls[0:3] = ['A', 'B', 'C']
    s1 = ''.join(str(s) for s in ls)
```

In [24]:
s = 'abcdefghi' + '123'       # s is a string
s[:4]

ls = list('abcdefghi')        # ls is a list
ls[-1:-1] = ['i', 1, 2, 3]
print ls

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 1, 2, 3, 'i']


## 7 Sets

Sets are lists with **UNIQUE** elements. Sets are MUTABLE. On Sets you can apply the typical set theory rules like union, intersection, and difference. Sets cannot be indexed: to index, sets must be transformed in List

In [25]:
set1 = set([1, 2, 7, 7, 7, 9])
set2 = set([5, 4, 6, 7, 8, 8])
print set1, set2

set([1, 2, 9, 7]) set([8, 4, 5, 6, 7])


***Try by yourself*** &nbsp;the following commands:
```python
    set1 & set2      # AND
    set1 | set2      # OR
    set1 ^ set2      # XOR
    set1 - set2
    ls = list(set1)  # To index a set, first transform it to a list
```

In [26]:
ls = list(set1)
ls[0]

1

## 8 Tuples

Tuples are like lists but are **IMMUTABLE** and do not have methods. Tuples are indexable.

So, what is the reason to use tuples? The main reason is to use tuples as immutable containers to pass arguments to a function or to be used as keys in dictionaries (see next chapter)

In [27]:
def myfunction(pack):
    a, b, c, d = pack
    print a+b+c+d[0]-d[1]
    
t = 1, 2, 3, (8, 9)  # Pack arguments to pass it to a function

myfunction(t)

5


In [28]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
    print a, b, c     # Common use for tuple unpacking

1 2 3
4 5 6
7 8 9


**'zip'** can be used to reorganize (transpose) columns of data:

In [29]:
names = [('Johnn', 'Doe'), ('Ryan', 'Stuart')]
firstnames, lastnames = zip(*names)
print firstnames, ' -- ', lastnames

('Johnn', 'Ryan')  --  ('Doe', 'Stuart')


## 9 Dictionaries

Dictionaries are **'Associative Arrays'**: values are indexed by generic keys. In other words the indexing kay can be an integer number but can be a string, a tuple or any other immutable object.

Here we make a dictionary using tuples as keys and telephone numbers as values. Then we access a dictionary item by providing a key (tuple)

In [30]:
data = [(('Johnn', 'Doe'), '(831) 758-7214'),
         (('Ryan', 'Stuart'), '(877) 359-8474'), (('Nick', 'Connor'), '(800) 445-2854')]
d = dict(data)
print d[('Johnn', 'Doe')]

(831) 758-7214


***Try by yourself*** &nbsp;the following commands:
```python
    d.keys()
    d.values()
    d.items()
```

In [31]:
print d.keys()

[('Ryan', 'Stuart'), ('Johnn', 'Doe'), ('Nick', 'Connor')]


When reading the dictionary you must check if the dictionary contain the key by using `in`. If you ask for an unknown key, Python rises an exception. Alternatively you can use `get` with a default value to be used if the key is not found

In [32]:
('Johnn', 'Stuart') in d

False

In [33]:
print d.get(('Johnn', 'Stuart'),'Number not available')

Number not available


When iterating a dictionary, the items order is not guaranteed:

In [34]:
for key, value in d.items():
    print 'KEY: ', key, '\t \tVALUE: ', value

KEY:  ('Ryan', 'Stuart') 	 	VALUE:  (877) 359-8474
KEY:  ('Johnn', 'Doe') 	 	VALUE:  (831) 758-7214
KEY:  ('Nick', 'Connor') 	 	VALUE:  (800) 445-2854


In [35]:
# Iterators work on keys by default
for key in d:
    print key, '\t', d[key]

('Ryan', 'Stuart') 	(877) 359-8474
('Johnn', 'Doe') 	(831) 758-7214
('Nick', 'Connor') 	(800) 445-2854


Some more examples:

In [36]:
# How to create a dictionary with a for loop (from a list of tuples)
d1 = dict([(n, str(n)) for n in xrange(5)])
print d1              # {0: '0', 1: '1', 2: '2', 3: '3', 4: '4'}

{0: '0', 1: '1', 2: '2', 3: '3', 4: '4'}


In [37]:
d1.pop(2)
d2 = {'10': 'ten', '11': 'eleven'}
d1.update(d2)   # {0: '0', 1: '1', 2: '2', 3: '3', '10': 'ten', '11': 'eleven'}
print d1

{0: '0', 1: '1', 3: '3', 4: '4', '10': 'ten', '11': 'eleven'}


In [38]:
# Creating dict from sequences
d3 = {}
for key, value in zip(list('abcd'), list('1234')):
    d3[key] = value
print d3

{'a': '1', 'c': '3', 'b': '2', 'd': '4'}


## 10 Counters

Counters are a very special type of dictionaries: they give you a simple and effective way to count items.

In [39]:
from collections import Counter

colorlist = ['red', 'blue', 'red', 'green', 'blue', 'blue', 'green', 'blue', 'cyan']
cnt = Counter(colorlist)

print 'Total of all counts:   ', sum(cnt.values())
print 'Most common elements:  '
for item, number in cnt.most_common(3):
    print '\t'*2, item, number
    
print 'Least common elements:  '
for item, number in cnt.most_common()[:-4:-1]:
    print '\t'*2, item, number

Total of all counts:    9
Most common elements:  
		blue 4
		green 2
		red 2
Least common elements:  
		cyan 1
		red 2
		green 2


## 11 IF - FOR - WHILE

This is a brief overview of the flow-control instructions in Python

### 11.1 IF

In [40]:
a = 34
if a != 7:
    print "'a' is not 7"
if a > 15:
    print "'a' is greater than 15"
elif a == 15:
    print "'a' is exactly 15!"
else:
    print "'a' is less than 15"

'a' is not 7
'a' is greater than 15


### 11.2 FOR - ELSE

In Python you can iterate Lists, Dictionaries, Lines in a file and all the 'ITERABLE' Objects

In [41]:
l = ['a', 'b', 'c', 'd', 'e', 'f']
for v in l:
    if v == 'e':
        break            # Skip all loops and go the 'else' statement
    elif v == 'b':
        continue         # Skip this loop
    print v
print 'Done !'           # Executed upon completion of the for loop

a
c
d
Done !


In [42]:
for element in [3,4,5]:           # Elements in LISTS        
    print element
for element in (7,8,9):           # Elements in TUPLES        
    print element
for char in 'abc':                # Elements in STRINGS
    print char
import os.path
path = os.path.join(os.path.curdir, "example_data", "my_input.txt")
for line in open(path): # Elements in FILES    
    print line,

3
4
5
7
8
9
a
b
c
First Second
10     0.32432
20  1.324
21 7.237923
36 .83298932
56        237.327823


In [43]:
# Enumerate returns an enumerate object:
for i, season in enumerate(['Spring', 'Summer', 'Fall', 'Winter']):
    print i, season

0 Spring
1 Summer
2 Fall
3 Winter


In [44]:
# Range can be used to quickly build a list to be used in a for loop:
print range(5)                 
print range(2,10,3)
print range(2,-6,-3)

[0, 1, 2, 3, 4]
[2, 5, 8]
[2, -1, -4]


### 11.3 WHILE

In [45]:
a = 0
while a < 10:
    a += 1
    print a,

1 2 3 4 5 6 7 8 9 10


In [46]:
a = 2**16
while a:
    print a,
    a = a/2

65536 32768 16384 8192 4096 2048 1024 512 256 128 64 32 16 8 4 2 1


---

Visit [www.add-for.com](<http://www.add-for.com/IT>) for more tutorials and updates.

This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.