# 1. SEMANTICS
- Variables
- Expressions
- Types
- Input

# Input from keyboard

`input()` acquires simple text from keyboard and returns it as a `str`.

In [3]:
z = input("insert a number: ")
print(type(z))
print("your input: ", z)
if z>0:
    print("positive number: ", z)

insert a number: 54
<class 'str'>
your input:  54


TypeError: '>' not supported between instances of 'str' and 'int'

To use the `>` operator on `z`, we must explicitly convert it to the needed type.

# Built-in types

Commonly used numerical and string types are

Type | Description |
:----|:-----------
 str | similar to C++ string 
 float | C double precision 
 complex | complex number with real parts x + yj
 int | integer 
 bool | boolean variable. special integer with just 1 bit

You must explicitly convert `input()` to desired type for use.


In [5]:
x = int(input("Insert integer: ")) # What happens if input is float?
print("x = ", x)

Insert integer: 4.5


ValueError: invalid literal for int() with base 10: '4.5'

Say we input 4.5.  The call is then `int('4.5')`.  To fix it we need to call `int(float('4.5'))`

In [None]:
x = int(float(input("Insert integer: "))) # Better
print("x = ", x)

Insert integer: 6.8
x =  6


In [4]:
x = int(input("Insert integer number: "))
y = float(input("Insert a rational number: "))

if x > y:
    print("x: {0} > y:{1}".format(x,y))
else: 
    print("y: {1} > x:{0}".format(x,y))

Insert integer number: 1
Insert a rational number: 4.5
y: 4.5 > x:1


In [6]:
anint = int(input("Insert an integer: "))
print(anint)

Insert an integer: 3
3


An integer literal can be used as float: there is automatic conversion in this case.

In [7]:
afloat = float(input("Insert a float: "))
print(afloat)

Insert a float: 4.5
4.5


`j` is the imaginary unit.

In [10]:
acomplex = complex(input("Insert a complex number: "))
print(acomplex)

Insert a complex number: 2+8j
(2+8j)


## Bool type
Same as bool in C++. Used for logical operation and uses just one bit to store the info.

In [11]:
c = 2.3 < 3
print(c, type(c), c.bit_length())

c = bool(0)
print(c, c.bit_length())

c = bool(-3)
print(c, c.bit_length())

d = True
print(d, int(d))

print(type(2.3), type(True), type("hello"))

True <class 'bool'> 1
False 0
True 1
True 1
<class 'float'> <class 'bool'> <class 'str'>


## Integers in Python
Unlike in C/C++, you can have arbitrarily large integers in Python.

In [12]:
j = 3**334 # 3^334
print(j)
print(type(j))
print((3**11567).bit_length())

2282964069396179429161277601795098342183689069116233595351030111107374894317918598839132436948135567673806054712849856030005501307907699595360638611572383720569
<class 'int'>
18334


Python3 is smart about integer division (Python2 had a different behaviour)

In [13]:
print(10/2) 
print(9/2) 
print(99/100) 
print(10.0/2.0) 
print(99.0/100.0) 

5.0
4.5
0.99
5.0
0.99


### Integers in arbitrary bases

A neat feature of integers in Python is easy conversion to an arbitrary base

In [14]:
a = int('101', base=2)
b = int(101)
print(a,b)

5 101


In [15]:
int('101', base=3)

10

In [16]:
int('101', base=5)

26

In [17]:
int('101', base=7)

50

In [18]:
int('101', base=8)

65

In [19]:
int('101', base=16)

257

In [20]:
int('1F', base=16)

31

In [21]:
x = int('FF01', base=16)

In [22]:
x = int('FF01', base=16)
print(x)
print(int('1110001',base=2))
int('FF01', base=16) + int('1110001', base=2)

65281
113


65394

# Inline help and inspection
Use the inline help facility in the interactive Python session:

In [23]:
help(int)

Help on class int in module builtins:

class int(object)
 |  int([x]) -> integer
 |  int(x, base=10) -> integer
 |  
 |  Convert a number or string to an integer, or return 0 if no arguments
 |  are given.  If x is a number, return x.__int__().  For floating point
 |  numbers, this truncates towards zero.
 |  
 |  If x is not a number or if base is given, then x must be a string,
 |  bytes, or bytearray instance representing an integer literal in the
 |  given base.  The literal can be preceded by '+' or '-' and be surrounded
 |  by whitespace.  The base defaults to 10.  Valid bases are 0 and 2-36.
 |  Base 0 means to interpret the base from the string as an integer literal.
 |  >>> int('0b100', base=0)
 |  4
 |  
 |  Built-in subclasses:
 |      bool
 |  
 |  Methods defined here:
 |  
 |  __abs__(self, /)
 |      abs(self)
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __and__(self, value, /)
 |      Return self&value.
 |  
 |  __bool__(self, /)
 |      True if 

And since **everything is an object in Python**, you can list the attributes, data, and functions (which are all objects) within any object.

In [24]:
x

65281

In [25]:
dir(str)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',


In [26]:
help(int.bit_length)

Help on method_descriptor:

bit_length(self, /)
    Number of bits necessary to represent self in binary.
    
    >>> bin(37)
    '0b100101'
    >>> (37).bit_length()
    6



In [27]:
help(bin)

Help on built-in function bin in module builtins:

bin(number, /)
    Return the binary representation of an integer.
    
    >>> bin(2796202)
    '0b1010101010101010101010'



In [28]:
bin(int('F0F', base=16))

'0b111100001111'

There are actually built-in functions for easy base conversions.

In [29]:
c = 23
print(bin(c), oct(c), hex(c))

0x1F + 3 +0b111

0b10111 0o27 0x17


41

You can quickly exercise your ability in converting between hexadecimal and binary bases.

In [30]:
print(hex(0b00011000))

0x18


In [31]:
print(hex(0b10011010))

0x9a


# 2 FLOW CONTROL

- `if/elif/else` conditional statements
- `while` loops
- `for` loops
- `try/except/else/finally` statements
- `break`, `continue`, and `pass`

The main difference with respect to C++ is the lack of `{}` and `;` for logical structure, which is instead achieved via the use of `:` and **indentation**

```c++
if(x<0) {
} else {
    cout << x << endl;
}
```

## Conditional statements with `if/elif/else`

In [32]:
x = float(input("Insert a number: "))
if x < 0 :
    print("x < 0")
elif x < 1:
    print("0 < x < 1")
elif x < 10:
    print("1 < x < 10")
else:
    print("x > 10")

Insert a number: 5
1 < x < 10


## `while` loop

In [33]:
w = -2
while w<0 or w>1:
    w = float(input("Insert x in [0,1]: "))

Insert x in [0,1]: 2.4
Insert x in [0,1]: 0.4


Easily create a user interface for input with control over user input

In [34]:
control = True
while control:
    w = float(input("insert x in [0,1]: "))
    if w>=0 and w<=1:
        control = False

insert x in [0,1]: 2.4
insert x in [0,1]: 0.4


## `for` loop
We have already seen that the use of a `for` loop that requires a sequence of objects to iterate over

In [35]:
range(0,10,1)

range(0, 10)

In [36]:
type(range(0,10,1))

range

In [37]:
help(range)

Help on class range in module builtins:

class range(object)
 |  range(stop) -> range object
 |  range(start, stop[, step]) -> range object
 |  
 |  Return an object that produces a sequence of integers from start (inclusive)
 |  to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
 |  start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
 |  These are exactly the valid indices for a list of 4 elements.
 |  When step is given, it specifies the increment (or decrement).
 |  
 |  Methods defined here:
 |  
 |  __bool__(self, /)
 |      True if self else False
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash

In [38]:
print(list(range(0,10)))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [39]:
for i in range(1,11,2):
    print("i: %-3d\t i^2: %d" % (i, i**2))
    print("i: {0}\t i^2: {1}".format(i,i**2))

i: 1  	 i^2: 1
i: 1	 i^2: 1
i: 3  	 i^2: 9
i: 3	 i^2: 9
i: 5  	 i^2: 25
i: 5	 i^2: 25
i: 7  	 i^2: 49
i: 7	 i^2: 49
i: 9  	 i^2: 81
i: 9	 i^2: 81


In this example you can also use the C-style `fprintf` formatting for displaying information.

## `try/except/else/finally`

In [40]:
x = 3
y = 2
#y = 0

try:
    print("Deleting the variable result.")
    del result
except:
    print("Nothing to delete.")

try:
    # Floor division
    result = x // y
except ZeroDivisionError:
    print("Sorry! You are dividing by zero")
else:
    print("Yeah! Your answer is:", result)
finally: 
    # This block is always executed  
    # regardless of exception generation. 
    print("This is always executed")

# result was defined, calculated, stored, as long
# as everything went well with x//y.
# If so, result is still there and accessible
print(result)

Deleting the variable result.
Nothing to delete.
Yeah! Your answer is: 1
This is always executed
1


## `break`, `continue`, and `pass`

Spot prime numbers between 2 and 9

In [41]:
for n in range(2, 10):
    # Think of this as: if x appears in [2,3,...,n) then do...
    for x in range(2, n):
        if n % x == 0:
            print(n, 'equals', x, '*', n//x)
            break
    # ...otherwise do...
    else:
        # Loop fell through without finding a factor
        print(n, 'is a prime number')

2 is a prime number
3 is a prime number
4 equals 2 * 2
5 is a prime number
6 equals 2 * 3
7 is a prime number
8 equals 2 * 4
9 equals 3 * 3


Spot even and odd numbers between 2 and 9

In [42]:
for num in range(2, 10):
    if num % 2 == 0:
        print("Found an even number", num)
        continue
    print("Found an odd number", num)
    print("Moving on to the next number")

Found an even number 2
Found an odd number 3
Moving on to the next number
Found an even number 4
Found an odd number 5
Moving on to the next number
Found an even number 6
Found an odd number 7
Moving on to the next number
Found an even number 8
Found an odd number 9
Moving on to the next number


Spot odd numbers between 2 and 9

The keyword `pass` is needed for an empty scope. It does not skip anything. It only tells the interpreter that in this scope there is nothing to do. It is equivalent to `{}` in C++.

In [43]:
for num in range(2, 10):
    if num % 2 == 0:
        pass
    else:
        print("Found an odd number", num)
    print("Moving on to the next number")

Moving on to the next number
Found an odd number 3
Moving on to the next number
Moving on to the next number
Found an odd number 5
Moving on to the next number
Moving on to the next number
Found an odd number 7
Moving on to the next number
Moving on to the next number
Found an odd number 9
Moving on to the next number


Often a `pass` can be avoided in favour of more elegant/streamlined code.

In [44]:
for num in range(2, 10):
    if num % 2 != 0:
        print("Found an odd number", num)
    print("Moving on to the next number")

Moving on to the next number
Found an odd number 3
Moving on to the next number
Moving on to the next number
Found an odd number 5
Moving on to the next number
Moving on to the next number
Found an odd number 7
Moving on to the next number
Moving on to the next number
Found an odd number 9
Moving on to the next number


# 3. FUNCTIONS AND MODULES

As in other languages, a function is defined by its name and its arguments. But there is no return type nor do you need to specify the type of arguments. Any object can be the input to any function.

The generic structure of a function is
```python
def function(arg1, arg2, arg3=val):
    statements
    return value

next_statement
```

If a function does not return a value, a `None` value is returned automatically

In [45]:
def decay(x, a=0.3, b=0.7):
    if x < a:
        print("Two body decay")
    elif x < b:
        print("Three body decay")
    else:
        print("Decay to 4 or more bodies")
    
decay(0.4)
decay(0.9, b=0.6)
# Also decay() has a return type
v = decay(0.003)
print(type(v))

# Import NumPy module
import numpy as np
x = np.random.random()
print("x = %.4f"%x)
decay(x)

print("\nSo what is import all about?")

Three body decay
Decay to 4 or more bodies
Two body decay
<class 'NoneType'>
x = 0.7590
Decay to 4 or more bodies

So what is import all about?


In [46]:
help(np.random.random)

Help on built-in function random:

random(...) method of numpy.random.mtrand.RandomState instance
    random(size=None)
    
    Return random floats in the half-open interval [0.0, 1.0). Alias for
    `random_sample` to ease forward-porting to the new random API.



## Python application and modules
An important difference with respect to C++ is the lack of an entry point.

A typical C/C++ application `app.cc` is
```c++
#include <stdio>
#include <math>

double uniform(double,double);

int main() {
  /*   code goes here */
  return 0;
}

double uniform(double a,double b) {
  /* implement uniform */
  return something
}
```
You compile and link the application using the math library as
```
g++ -o /tmp/app.exe -lm app.cc
```
and finally run the executable
```
/tmp/app.exe
```

Running the executable means that the operating system calls the `main()` function in `app.exe`.

**In Python, however, there is no such thing!**

A program is any file containing python statements. Being an interpreted language, all statements are executed as they appear in the file.

The following examples showing the use of modules and namespaces are available in the classroom drive in the directory `examples/Python`.

Our first program is `example11.py`

```
# This is my first module

print("==== Running example11.py")

a = 2.3
b = 4.5
c = a/b

def line(x, m=1., q=0.):
  print("x: {2}, m: {0}, q: {1}".format(m,q,x))
  return m*x+q

# Print using ''
print('a = {0}, b = {1}, c = {2}'.format(a, b, c))


print(line(2., q=2.3))
print(line(0., q=-1.3))
print("==== End of example11.py")
```

__Reminder__: you can execute the program from the command line with 
```
python3 example11.py
```
In Jupyter you can run a local file (with path relative to the directory where you started the notebook session) by using the magic `%run` command.  E.g.

```
%run ./example11.py
```

In [47]:
%run ./example11.py

==== Running example11.py
a = 2.3, b = 4.5, c = 0.5111111111111111
x: 2.0, m: 1.0, q: 2.3
4.3
x: 0.0, m: 1.0, q: -1.3
-1.3
==== End of example11.py


### Our first module
Suppose you want to use the `line()` function in this example in other programs. Rather than copying the code by hand we want to use a library model, or what is called a __module__ in Python. 

Unlike C, there is  no special setup to create a module.

We write a second program `example12.py`
```python
import example11

print('===== Running 12example.py ===== ')

x = float(input("Insert x:"))
y = example11.line(x)
print(y)

# A much shorter way
print(example11.line(float(input("Insert x:"))))
```

and execute it from the command line

```shell
$ python3 example12.py
==== Running example11.py
a = 2.3, b = 4.5, c = 0.5111111111111111
x: 2.0, m: 1.0, q: 2.3
4.3
x: 0.0, m: 1.0, q: -1.3
-1.3
==== End of example11.py
===== Running example12.py ===== 
insert x:-123
x: -123.0, m: 1.0, q: 0.0
-123.0
insert x:23
x: 23.0, m: 1.0, q: 0.0
23.0
```

There are 2 important aspects to note
  1. The function `line()` belongs to the `example11` namespace. So you __must__ use `example11.line` to call it.
  2. By importing `example11`, in addition to the definition of function `line` you also execute the rest of the Python program.
  This is expected because __Python is an interpreted language__.
  
Let's address these 2 issues.

### Importing only some objects of a module
To address the first issue we can do the following in `example13.py`
```python
from example11 import line

print("++++ executing "+ __file__)

print(line(-3.4, q=0.5))
```
Now when we run the program:
```shell
$ python3 example13.py 
==== Running example11.py
a = 2.3, b = 4.5, c = 0.5111111111111111
x: 2.0, m: 1.0, q: 2.3
4.3
x: 0.0, m: 1.0, q: -1.3
-1.3
==== End of example11.py
++++ executing /Users/francesco/Documents/Work/Didattica/2022-23/2022-23 Computing Methods For Physics/Course Material 2022-23/examples/Python/example13.py
x: -3.4, m: 1.0, q: 0.5
-2.9
```

### Importing objects from a module and renaming them
A different approach is shown in `example14.py` where `line` is imported and renamed `p1`.
```python
from example11 import line as p1

print("++++ executing "+ __file__)

print(p1(-3.4, q=0.5))
```
Produces

```shell
==== Running example11.py
a = 2.3, b = 4.5, c = 0.5111111111111111
x: 2.0, m: 1.0, q: 2.3
4.3
x: 0.0, m: 1.0, q: -1.3
-1.3
==== End of example11.py
++++ executing /Users/francesco/Documents/Work/Didattica/2022-23/2022-23 Computing Methods For Physics/Course Material 2022-23/examples/Python/example14.py
x: -3.4, m: 1.0, q: 0.5
-2.9
```

### What is imported? 
All objects defined in a module are available when a module is imported. 

This is shown in `example15.py`
```python 
import example11

print("++++ executing file: "+ __file__)

print("Calling example11.line(2.34, q=0.5): ", example11.line(2.34, q=0.5))

print("example11.a: %f" % example11.a)
```
When running in the terminal:

```shell
$ python3 example15.py
==== Running example11.py
a = 2.3, b = 4.5, c = 0.5111111111111111
x: 2.0, m: 1.0, q: 2.3
4.3
x: 0.0, m: 1.0, q: -1.3
-1.3
==== End of example11.py
++++ executing file: /Users/francesco/Documents/Work/Didattica/2022-23/2022-23 Computing Methods For Physics/Course Material 2021-22/examples/Python/example15.py
x: 2.34, m: 1.0, q: 0.5
Calling example11.line(2.34, q=0.5):  2.84
example11.a: 2.300000
```

### Importing only objects without executing statements
We now turn to our second problem, namely how to avoid running the statements in `example11.py` when importing it as a module.

This can be done with a more advanced feature of Python which we will understand better in future lectures. The solution is actually trivial. A modified version of `example11.py` is `mymodule.py`
```python
# This is my first module
a = 2.3
b = 4.5
c = a/b

def line(x, m=1., q=0.):
  print("=== In line === x: {2}, m: {0}, q: {1}".format(m,q,x))
  return m*x+q

print("__name__ : " +  __name__ + " in " + __file__)


if __name__ == "__main__":
  print("executing " +  __name__ + " in " + __file__)

  # Print using ''
  print('a = {0}, b = {1}, c = {2}'.format(a, b, c))
  print("calling line(): ", line(2., q=2.3))
  print("calling line()", line(0., q=-1.3))

  def p1(x, m=1., q=0.):
      print("x: {2}, m: {0}, q: {1}".format(m,q,x))
      return m*x+q
```
which has this behavior
```shell
$ python3 mymodule.py 
__name__ : __main__ in /Users/francesco/Documents/Work/Didattica/2022-23/2022-23 Computing Methods For Physics/Course Material 2022-23/examples/Python/mymodule.py
executing __main__ in /Users/francesco/Documents/Work/Didattica/2022-23/2022-23 Computing Methods For Physics/Course Material 2022-23/examples/Python/mymodule.py
a = 2.3, b = 4.5, c = 0.5111111111111111
=== in line === x: 2.0, m: 1.0, q: 2.3
calling line():  4.3
=== in line === x: 0.0, m: 1.0, q: -1.3
calling line() -1.3
```

To understand this better look at `example16.py`
```python
import mymodule

print("++++ executing namespace " + __name__ + " in file: " +  __file__)

# Local a variable
a = 'test string'

# Any object in mymodule can be used and there is no confusion with local a
print("mymodule.a: %f" % mymodule.a)
print("local a: ", a)

# Use line function from mymodule
print(mymodule.line(2.34, q=0.5))

# Function p1 is defined in mymodule but cannot be used because
# behind __name__ == "__main__" in mymodule
print(mymodule.p1(2.34, q=0.5))
```

At runtime we get the following error
```shell
$ python3 example16.py
__name__ : mymodule in /Users/francesco/Documents/Work/Didattica/2022-23/2022-23 Computing Methods For Physics/Course Material 2022-23/examples/Python/mymodule.py
++++ executing namespace __main__ in file: example16.py
mymodule.a: 2.300000
('local a: ', 'test string')
=== in line === x: 2.34, m: 1.0, q: 0.5
2.84
Traceback (most recent call last):
  File "example16.py", line 17, in <module>
    print(mymodule.p1(2.34, q=0.5))
AttributeError: 'module' object has no attribute 'p1'
```

When `mymodule` is imported it has its own namespace which is not `__main__`. At any time, only the Python program being executed has the `__main__` namespace as desired.


# 4. BUILT-IN DATA STRUCTURES: CONTAINERS AND SEQUENCES

- One of the great and popular features of Python is the presence of built-in containers for sequences of objects
  - These are provided by the STL in C++, but we have not discussed this (yet?)

- Since in Python everything is an object and all objects can be referenced in the same way, containers can include objects of different type
  - This is unlike anything seen in C++
  
- These built-in types and the reference-driven flexibility of Python has made it very popular for data analysis

- Basic built-in data structures in python are
  - tuple `(v1, v2, v3, ...)`
  - list `[v1, v2, v3, ...]`
  - dictionary `{key1:value1, key2:value2, key3:value3, ...)`
  - set `{v1, v2, v3, ...}`
  
- We will introduce more advanced types when discussing [NumPy](https://www.numpy.org) and [pandas](http://pandas.pydata.org) packages, e.g.
  - ndarrays
  - series
  - time series
  - DataFrame

## Tuples

A tuple is an **immutable** sequence of Python objects.
- Since Python does not have to build tuple structures to be modifiable, they are simpler and more efficient in terms of memory use and performance than lists (the next sequence container we will see).

To create a tuple simply separate its elements with a `,`.

In [48]:
a = 'lec23', 'lec24', 'lec25'
print(a)

('lec23', 'lec24', 'lec25')


In [49]:
len(a)

3

Given the limitation of tuple (content and size are immutable) there are very few methods.  The ones not starting with a double underscore are the ones intended to be used, and there are only two of them.

In [50]:
dir(tuple)

['__add__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'count',
 'index']

One very useful one is `count()`

In [51]:
grades = (30, 22, 24, 23, 30, 18, 24, 27, 28, 28, 25, 24, 22, 30, 30, 18, 20)
grades.count(30)

4

A tuple can contain objects of different type

In [52]:
b = 'paul', 24, 1.75, 85.3
print(b)

('paul', 24, 1.75, 85.3)


In [53]:
print(a, b, 'hi')
print(type(b))

('lec23', 'lec24', 'lec25') ('paul', 24, 1.75, 85.3) hi
<class 'tuple'>


### Accessing tuple elements
Accessing the i-th element of a tuple is achieved with the `[]` operator
- Indexing starts with 0

In [54]:
print(a[2])
print(b[3])
print(type(b[1]))
print(len(b))
print(b[4])

lec25
85.3
<class 'int'>
4


IndexError: tuple index out of range

Note how there is protection against out-of-bound access to tuples.

### Empty or one-element tuple

In [55]:
c = ()
print(type(c),c)

d = 'something',
print(type(d),d)

e = 'something'
print(type(e),e)

<class 'tuple'> ()
<class 'tuple'> ('something',)
<class 'str'> something


Note that the `,` is critical to distinguish a on-element tuple from a normal variable.

### Conversion to tuple

In [56]:
print(range(10))

range(0, 10)


In [57]:
tup = range(10)
print("length: ", len(tup))
print("tup:", tup)
print(type(tup))

length:  10
tup: range(0, 10)
<class 'range'>


Note how `tup` is not a tuple but simply a refernce to function call `range(10)`.

If you want a tuple you have to explicitly convert the output of `range(10)` to be a tuple.

In [58]:
tup = tuple(range(10))
print("length: ", len(tup))
print("tup: ", tup)
print(type(tup))

length:  10
tup:  (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
<class 'tuple'>


### Iterating over a tuple
- Is easy

In [59]:
for i in tup:
    print(i)

0
1
2
3
4
5
6
7
8
9


### Converting strings to tuples

In [60]:
tup = tuple("Hello World!")
print("tup: ", tup)
print(len(tup))

for i in tup:
    print(i)

for i in tup:
    print(i, end=" >> ")
    
print("\n")    


tup:  ('H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '!')
12
H
e
l
l
o
 
W
o
r
l
d
!
H >> e >> l >> l >> o >>   >> W >> o >> r >> l >> d >> ! >> 



### Tuples can contain any object

Even a function is a valid object to be placed in a tuple

In [61]:
def myprod(a, b=3.145, scale=1.0):
    return a*b*scale

tup = (1, 'name', myprod)
print("tup: ",tup)

for i in tup:
    print(type(i))

tup:  (1, 'name', <function myprod at 0x10fca2b90>)
<class 'int'>
<class 'str'>
<class 'function'>


**Note how the use of default values is more flexible than in C**

In [62]:
bb = 3.2
print(tup[2](2, bb))
print(myprod(2, b=bb))
print(myprod(2, scale=bb))

6.4
6.4
20.128


### A tuple can contain tuples as its elements

In [63]:
x = a, b, c, tup

for i in x:
    print("i: ", i)

i:  ('lec23', 'lec24', 'lec25')
i:  ('paul', 24, 1.75, 85.3)
i:  ()
i:  (1, 'name', <function myprod at 0x10fca2b90>)


In [64]:
print(x[2])
print(x[0])
print(x[3][2](3,5))  # calling function myprod contained in tup

()
('lec23', 'lec24', 'lec25')
15.0


### And once again: tuples are immutable

You can bind a variable to a new tuple but you cannot change an element of a tuple

In [65]:
tup[0] = 'one'

TypeError: 'tuple' object does not support item assignment

In [66]:
y = 'one', a, (2,3)
print(y)
print(tup)

('one', ('lec23', 'lec24', 'lec25'), (2, 3))
(1, 'name', <function myprod at 0x10fca2b90>)


In [67]:
tup = y
print(tup)

('one', ('lec23', 'lec24', 'lec25'), (2, 3))


In [68]:
ntuple = 'lec23', 'lec27', 'lec25', 'lec25', 3.14, 3.56, 3.97
b = ntuple
print(b)
print(b.index('lec25'))
print(b.count('lec25'))
print(b.count(3.14))
print(type(b.count('lec25')))
print(b.index('test'))

('lec23', 'lec27', 'lec25', 'lec25', 3.14, 3.56, 3.97)
2
2
1
<class 'int'>


ValueError: tuple.index(x): x not in tuple

### Tuples are comparable

The comparison operators work with tuples and other sequences. If the first item is equal, Python goes on to the next element, and so on, until it finds elements that differ.

In [69]:
(0, 1, 2) < (5, 1, 2)

True

In [70]:
(0, 1, 2000000) < (0, 3, 4)

True

With strings it checks for alphabetical order

In [71]:
('Jones', 'Sally') > ('Adams', 'Sam')

True

In [72]:
'J' > 'A'

True

In [73]:
('Jones', 'Sally') > ('Jones', 'Sally')

False

In [74]:
('Sally') > ('Sam')

False

In [75]:
('Sally') < ('Sam')

True

If it finds characters for numbers, it converts them to numbers for the comparison

In [76]:
('Jones10') > ('Jones11')

False

In [77]:
('Jones10') < ('Jones11')

True

In [78]:
('J') < ('1')

False

In [79]:
('J') == ('1')

False

In [80]:
('J') > ('1')

True

In [81]:
('j') > ('1')

True

In [82]:
('J') < ('j')

True

In [83]:
(myprod) < (myprod)

TypeError: '<' not supported between instances of 'function' and 'function'

## Lists
- Lists are also a collection of objects but unlike tuples they are **mutable**
  - variable length
  - each element can be modified

In [84]:
alist = [2, 3, 4]
print(alist)
print(alist[2])
alist[2] = -3
print(alist)

[2, 3, 4]
4
[2, 3, -3]


Given the less limited nature of `list` compared to `tuple`, it offers more methods (11).

In [85]:
dir(list)

['__add__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

Lists (and tuples) are protected against out of range index.
Let's see how long the list is.  To do this we call the `len()` function which is basically a getter for the length of collections.

In [86]:
print(len(alist))

3


And let's try to access `alist` one element beyond its length.

In [87]:
alist[3]

IndexError: list index out of range

In [88]:
# For negative indices, one basically has periodic boundary conditions
print(alist)
print(alist[-2])

[2, 3, -3]
3


But here too the list is protected against out of range index

In [89]:
print(alist[-4])

IndexError: list index out of range

In summary: `-len(list)` $\leq$ index $<$ `len(list)`

A list can cantain any type of data. In this example the list is made of strings, float, int, function, lists, and tuples

In [90]:
alist = ['one', 2, 3.24, myprod, (23, 24), ['lec1', 'lec2', [myprod, 3.14]]]
print(alist)
print(alist[5][2][0](6,7))

['one', 2, 3.24, <function myprod at 0x10fca2b90>, (23, 24), ['lec1', 'lec2', [<function myprod at 0x10fca2b90>, 3.14]]]
42.0


### Lists vs tuples
- A list is created using the `[]` operator or the explicit type `list`
- A tuple is created with the `()` operator or the explicit type `tuple`
- Lists and tuples are semantically similar
  - Many functions can take a tuple or a list
- Lists are used in data analysis to store data from iterators or generators

In [91]:
values = range(-3, 10, 2)
print(values)
print(list(values))
print(tuple(values))

range(-3, 10, 2)
[-3, -1, 1, 3, 5, 7, 9]
(-3, -1, 1, 3, 5, 7, 9)


Note that as with tuples, you have to convert the output of `range` to be a list.

### List from tuple
You can create a list from a tuple by explicit conversion 

In [92]:
print(type(a))

<class 'tuple'>


In [93]:
print(a)
blist = list(a)
print(blist)
blist[2] = 'lec28'
blist
a = tuple(blist)
print(a)

('lec23', 'lec24', 'lec25')
['lec23', 'lec24', 'lec25']
('lec23', 'lec24', 'lec28')


### List slicing

One of most popular featurs in data analysis with python is the possibility of accessing a subset of a collection by specifying the indices

```list[start:stop:step]```

In [94]:
t = ['a', 'b', 'c', 'd', 'e', 'f'] 
print(t[1:3])
print(t[:4])
print(t[3:])
print(t[:])
print(t[::])
print(t[0:6:2])
print(t[:-2])
print(t[-2:])
print(t[-6:-2])
print(t[::-1])

['b', 'c']
['a', 'b', 'c', 'd']
['d', 'e', 'f']
['a', 'b', 'c', 'd', 'e', 'f']
['a', 'b', 'c', 'd', 'e', 'f']
['a', 'c', 'e']
['a', 'b', 'c', 'd']
['e', 'f']
['a', 'b', 'c', 'd']
['f', 'e', 'd', 'c', 'b', 'a']


### Adding and removing list elements
- To add an element at the end of the list, use the `append()` method
- To insert a value at a specific location by providing the index, use the `insert()` method
- To remove an element from the list at a specific location use the `pop()` method
- To remove the **first occurrence** of an element (removal by value) from a list use the `remove()` method

In [95]:
clist = ['one', 2, 3.14, 4, 'five']
clist.append(6)
print(clist)

['one', 2, 3.14, 4, 'five', 6]


In [96]:
clist.insert(2, 'two')
print(clist)

['one', 2, 'two', 3.14, 4, 'five', 6]


Note how the new element is inserted __before__ the indicated index.

In [97]:
clist.pop(2)
print(clist)

['one', 2, 3.14, 4, 'five', 6]


The `insert` and `pop` methods have a return value. 

In particular with `pop` it is useful to see the value you have removed from the list

In [98]:
x = clist.insert(2, 'test')
print(x)
x = clist.pop(2)
print(x)
print(clist)

None
test
['one', 2, 3.14, 4, 'five', 6]


Although not very efficient, you can `remove()` a given value from the list. It will only remove the first such occurrence. Python will linearly go through all elements until it finds the first occurrence.

In [99]:
print(4 in clist)
print(clist)
clist.append(4)
print(clist)

True
['one', 2, 3.14, 4, 'five', 6]
['one', 2, 3.14, 4, 'five', 6, 4]


In [100]:
if 4 in clist:
    clist.remove(4)
print(clist)

['one', 2, 3.14, 'five', 6, 4]


In [101]:
if 4 in clist:
    clist.remove(4)
print(clist)

['one', 2, 3.14, 'five', 6]


I can modify a list that is contained in a tuple and in this sense modify the tuple, but I am not modifying its length or the type of the elements it contains.

In [102]:
tt = ([1, 0], 'a')
print(tt, type(tt))

tt[0].append(1)
print(tt, type(tt))

([1, 0], 'a') <class 'tuple'>
([1, 0, 1], 'a') <class 'tuple'>


In [103]:
ll = [1,0]
tt = (ll, ll)
print(ll, tt)
print(type(ll), type(tt))

tt[0].append(1)
print(ll, tt)
print(type(ll), type(tt))

[1, 0] ([1, 0], [1, 0])
<class 'list'> <class 'tuple'>
[1, 0, 1] ([1, 0, 1], [1, 0, 1])
<class 'list'> <class 'tuple'>


### Combining lists
Use `+` to combine or extend exisiting or new lists

In [104]:
print(blist)
print(clist)
all = blist + ['id', 'name', 'major']
all2 = blist + clist
print(all, all2)

['lec23', 'lec24', 'lec28']
['one', 2, 3.14, 'five', 6]
['lec23', 'lec24', 'lec28', 'id', 'name', 'major'] ['lec23', 'lec24', 'lec28', 'one', 2, 3.14, 'five', 6]


Note that this is very different from the following

In [105]:
all = [blist, 'id', 'name', 'major']
print(all)

[['lec23', 'lec24', 'lec28'], 'id', 'name', 'major']


In [106]:
print(all.index('id'))
print(all[-1])
print(all[-3])

1
major
id


The most efficient way to extend a list is with `extend`. It can take one or more elements to be added.

In [107]:
all.extend([2, 3, 4, 'test', 'python'])
print(all)
all.append(4.56)
print(all)
# extend works with both round and square brackets
all.extend((2, 3))
print(all)
# Append takes a single parameter, so (2,3) is interpreted as a tuple
all.append((2, 3))
print(all)
print(all[all.index((2, 3))])
print(all[all.index((2, 3))][1])
print(all[-1][1])

[['lec23', 'lec24', 'lec28'], 'id', 'name', 'major', 2, 3, 4, 'test', 'python']
[['lec23', 'lec24', 'lec28'], 'id', 'name', 'major', 2, 3, 4, 'test', 'python', 4.56]
[['lec23', 'lec24', 'lec28'], 'id', 'name', 'major', 2, 3, 4, 'test', 'python', 4.56, 2, 3]
[['lec23', 'lec24', 'lec28'], 'id', 'name', 'major', 2, 3, 4, 'test', 'python', 4.56, 2, 3, (2, 3)]
(2, 3)
3
3


### More on the difference between `append()` and `extend()`

In [108]:
list_1 = [1, 2, 3]
list_2 = [1, 2, 3]

list_3 = [10, 11]

list_1.append(list_3)
list_2.extend(list_3)

In [109]:
print("append to a list: ", list_1)
print("extend a list: ", list_2)

append to a list:  [1, 2, 3, [10, 11]]
extend a list:  [1, 2, 3, 10, 11]


### Sorting a list
Lists of elements that can be compared to each other can be sorted

In [110]:
#print(all)
#all.sort()

In [111]:
months = ['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']
print(months)

['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']


In [112]:
months.sort()
print(months)

['april', 'august', 'december', 'february', 'january', 'july', 'june', 'march', 'may', 'november', 'october', 'september']


In [113]:
months.sort(key=len)
print(months)

['may', 'july', 'june', 'april', 'march', 'august', 'january', 'october', 'december', 'february', 'november', 'september']


In [114]:
help(list.sort)

Help on method_descriptor:

sort(self, /, *, key=None, reverse=False)
    Sort the list in ascending order and return None.
    
    The sort is in-place (i.e. the list itself is modified) and stable (i.e. the
    order of two equal elements is maintained).
    
    If a key function is given, apply it once to each list item and sort them,
    ascending or descending, according to their function values.
    
    The reverse flag can be set to sort in descending order.



In [115]:
months.sort(key=len,reverse=True)
print(months)

months.sort(reverse=True)
print(months)

['september', 'december', 'february', 'november', 'january', 'october', 'august', 'april', 'march', 'july', 'june', 'may']
['september', 'october', 'november', 'may', 'march', 'june', 'july', 'january', 'february', 'december', 'august', 'april']


In [116]:
print(months)
months.sort()
print(months)

['september', 'october', 'november', 'may', 'march', 'june', 'july', 'january', 'february', 'december', 'august', 'april']
['april', 'august', 'december', 'february', 'january', 'july', 'june', 'march', 'may', 'november', 'october', 'september']


### `sort()` vs `sorted()`
In these examples, `sort()` is **applied** to the object and **modifies** it.  We might prefer keeping the data intact and have a new sorted copy, instead.

In [117]:
months = ['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']
print(months)

sorted_months_byname = sorted(months)
print(sorted_months_byname)

sorted_months_bylen = sorted(months, key=len)
print(sorted_months_bylen)

print(months)

help(sorted)

['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']
['april', 'august', 'december', 'february', 'january', 'july', 'june', 'march', 'may', 'november', 'october', 'september']
['may', 'june', 'july', 'march', 'april', 'august', 'january', 'october', 'february', 'november', 'december', 'september']
['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']
Help on built-in function sorted in module builtins:

sorted(iterable, /, *, key=None, reverse=False)
    Return a new list containing all items from the iterable in ascending order.
    
    A custom key function can be supplied to customize the sort order, and the
    reverse flag can be set to request the result in descending order.



In [118]:
print(sorted(months))
print(months)

['april', 'august', 'december', 'february', 'january', 'july', 'june', 'march', 'may', 'november', 'october', 'september']
['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']


### Lists and strings

In [119]:
chars = list("in a far away galaxy")
print(chars)
chars.count(' ')

['i', 'n', ' ', 'a', ' ', 'f', 'a', 'r', ' ', 'a', 'w', 'a', 'y', ' ', 'g', 'a', 'l', 'a', 'x', 'y']


4

The `split()` method breaks a string into parts and produces a list of strings.

In [120]:
sentence = "I quite like Python"
words = sentence.split()
print(words)

['I', 'quite', 'like', 'Python']


In [121]:
speech = "I quite like Python. I liked C++ as well. Anyways, let's carry on."
sentences = speech.split('.')
print(sentences)

for sentence in sentences:
    words = sentence.split()
    print(words)

['I quite like Python', ' I liked C++ as well', " Anyways, let's carry on", '']
['I', 'quite', 'like', 'Python']
['I', 'liked', 'C++', 'as', 'well']
['Anyways,', "let's", 'carry', 'on']
[]


### The `enumerate()` function
Keeps track of index while iterating on a collection, e.g., a list.

`enumerate` can be exploited in `for` loops.

In [122]:
print(months)

for i,m in enumerate(months):
    print("month %-2d: %s" % (i+1, m))
    
# More efficent than the following
#for i in range(0, len(months)):
#    print(i, months[i])

['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']
month 1 : january
month 2 : february
month 3 : march
month 4 : april
month 5 : may
month 6 : june
month 7 : july
month 8 : august
month 9 : september
month 10: october
month 11: november
month 12: december


In [123]:
data = 'name', 'surname', 'id'

for i,d in enumerate(data):
    print(i, "\t", d)

0 	 name
1 	 surname
2 	 id


### References and lists
All collection objects are handled as a reference. This is shown explicitly in this example.

In [124]:
newlist = months
print(newlist)

['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']


In [125]:
newlist.append('NewMonth')
print(months)
print(newlist)

['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december', 'NewMonth']
['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december', 'NewMonth']


So `newlist` __is not a new copy__. `newlist` and `months` are simply two references to the same list object!

To have a new copy you have to use the explcit conversion.

In [126]:
newlist = list(months)
newlist.append('CrazyMonth')
print(months)
print(newlist)

['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december', 'NewMonth']
['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december', 'NewMonth', 'CrazyMonth']


## Dictionaries
- Another collection of objects
  - **Mutable** (variable length and modifiable)
  - **Unordered** (unlike tuples and lists)
  

Very similar to the associative container `map<T,K>` discussed in C++. They are also known as __hash tables__ in other languages, e.g. `perl`.  To create a `dict` object:

```python
my_dict = { key1 : value1, key2: value2, ... }
```

or

```python
my_dict = dict()
my_dict['key1'] = value1
my_dict['key2'] = value2
...
```

or
```python
my_dict = dict([('key1', value1), ('key2', value2), ...])
```

Do not forget to try to get a sense of what you can do by checking out the `dir`

In [127]:
dir(dict)

['__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__ior__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__or__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__ror__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'copy',
 'fromkeys',
 'get',
 'items',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

### Dictionary creation

#### Empty dictionary

In [128]:
my_dict = dict()
print(my_dict)

my_dict = {}
print(my_dict)

{}
{}


#### 2 separate lists become a 1 dictionary

In [129]:
# Two separate lists...
months = ['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']
day_months = [31, 28, 31, 30, 31 , 30, 31, 31, 30, 31, 30 , 31]
print(months)
print(day_months)

print("month: ", months[0], " has ", day_months[0], " days")

['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']
[31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
month:  january  has  31  days


In [130]:
# ...a dictionary!
days_per_month = {}

for i, m in enumerate(months):
    days_per_month[m] = day_months[i]
    
print(days_per_month)
print(days_per_month['january'])
print(days_per_month['february'])

{'january': 31, 'february': 28, 'march': 31, 'april': 30, 'may': 31, 'june': 30, 'july': 31, 'august': 31, 'september': 30, 'october': 31, 'november': 30, 'december': 31}
31
28


The `items()` method in dictionaries returns a list of `(key, value)` tuples

In [131]:
print(days_per_month.items())

dict_items([('january', 31), ('february', 28), ('march', 31), ('april', 30), ('may', 31), ('june', 30), ('july', 31), ('august', 31), ('september', 30), ('october', 31), ('november', 30), ('december', 31)])


#### You can also create a dictionary by hand

In [132]:
dict1 = {'a' : 1,
         'b' : (1,2,3),
         'c' : ['one','two'],
         'd' : 'example',
         56 : 'name'}
print(dict1)

{'a': 1, 'b': (1, 2, 3), 'c': ['one', 'two'], 'd': 'example', 56: 'name'}


In [133]:
students = {'rio': {'name':'john', 'age':23, 'id':123456},
            'nairobi': {'name':'susan', 'id':123123, 'age':21}, 
            'tokyo': {'name':'maria', 'id':123651, 'age':24}, }
print(students)

{'rio': {'name': 'john', 'age': 23, 'id': 123456}, 'nairobi': {'name': 'susan', 'id': 123123, 'age': 21}, 'tokyo': {'name': 'maria', 'id': 123651, 'age': 24}}


#### Adding a new value for a key

In [134]:
students['oslo'] = {'name':'', 'age':30, 'id':111111} 
print(students)

{'rio': {'name': 'john', 'age': 23, 'id': 123456}, 'nairobi': {'name': 'susan', 'id': 123123, 'age': 21}, 'tokyo': {'name': 'maria', 'id': 123651, 'age': 24}, 'oslo': {'name': '', 'age': 30, 'id': 111111}}


If the key is already in use, its value will be updated (similar to modifying elements of a list)

In [135]:
students['oslo'] = {'name':'sergey', 'age':22, 'id':111112} 
print(students)

{'rio': {'name': 'john', 'age': 23, 'id': 123456}, 'nairobi': {'name': 'susan', 'id': 123123, 'age': 21}, 'tokyo': {'name': 'maria', 'id': 123651, 'age': 24}, 'oslo': {'name': 'sergey', 'age': 22, 'id': 111112}}


### `KeyError`, `in`, and `get`

`KeyError` is the error you hit when referencing a `key` which is not in the dictionary

In [136]:
students['osaka']

KeyError: 'osaka'

#### Use the `in` operator to checking whether a `key` is in use

In [137]:
while True:
    name = input("Name (press return to end): ")  
    if(name==''): break
    if name not in students:
        print("{0} not in the list. sorry.".format(name))
    else: 
        print("name: {0}\t age: {1}\t id: {2}".format(students[name]['name'], students[name]['age'], students[name]['id']))

Name (press return to end): rio
name: john	 age: 23	 id: 123456
Name (press return to end): osaka
osaka not in the list. sorry.
Name (press return to end): 


#### `get`

- Dedicated getter method for dictionaries
- Has a fall-back feature to avoid `KeyError`
- Syntax:

```python
value = some_dict.get(key, value_if_key_not_found)
```

Notice also that we process the input with the method `lower()`.

In [192]:
while True:
    name = input("Name (press return to end): ").lower()  
    if(name==''): break
    val = students.get(name, "you silly person, this is not in here")
    print(val)

Name (press return to end): rio
{'name': 'john', 'age': 23, 'id': 123456}
Name (press return to end): Rio
{'name': 'john', 'age': 23, 'id': 123456}
Name (press return to end): osaka
you silly person, this is not in here
Name (press return to end): 


### Keys are unique
- There can be only one `value` for a given `key` in a dictionary made of `key:value`
- If you need more values for a `key`, then what you want is a dictionary of `key:[value]` 

In [139]:
particles = {'boson':['Z', 'gluon', 'W', 'photon'],
             'meson':['pion', 'kaon'],
             'quark':['u','d','s'],
             'lepton':['electron', 'muon']}
particles

{'boson': ['Z', 'gluon', 'W', 'photon'],
 'meson': ['pion', 'kaon'],
 'quark': ['u', 'd', 's'],
 'lepton': ['electron', 'muon']}

In [140]:
particles['lepton'].append('tau')
particles

{'boson': ['Z', 'gluon', 'W', 'photon'],
 'meson': ['pion', 'kaon'],
 'quark': ['u', 'd', 's'],
 'lepton': ['electron', 'muon', 'tau']}

### Iterating over dict 
By default the iterator gives you the keys.  You can also *explicitly* loop over keys.

In [141]:
for p in particles:
    print(p)

boson
meson
quark
lepton


In [142]:
print(particles.keys())
for k in particles.keys():
    print(k)

dict_keys(['boson', 'meson', 'quark', 'lepton'])
boson
meson
quark
lepton


In [143]:
for k in particles:
    print(particles[k])

['Z', 'gluon', 'W', 'photon']
['pion', 'kaon']
['u', 'd', 's']
['electron', 'muon', 'tau']


#### There is more: two iteration variables!

You can loop through the `key:value` pairs in a dictionary using **two** iteration variables.
At each iteration, the first variable is the `key` and the second variable is its corresponding `value`.

In [144]:
for aaa,bbb in particles.items():
    print(aaa, 'list:', bbb)

boson list: ['Z', 'gluon', 'W', 'photon']
meson list: ['pion', 'kaon']
quark list: ['u', 'd', 's']
lepton list: ['electron', 'muon', 'tau']


### Accessing values without keys
If you do not care about the keys, but need all the values python provides with `values` function.

This operation is also called __flattening__.

In [145]:
print(particles.values())

dict_values([['Z', 'gluon', 'W', 'photon'], ['pion', 'kaon'], ['u', 'd', 's'], ['electron', 'muon', 'tau']])


In [146]:
all_vals_ext = []
all_vals_app = []

for v in particles.values():
    print("Looping over keys in dict")
    print(v)
    all_vals_ext.extend(v)
    all_vals_app.append(v)

print("All_vals (flattened)")
print(all_vals_ext)

print("All_vals (not flattened)")
print(all_vals_app)

Looping over keys in dict
['Z', 'gluon', 'W', 'photon']
Looping over keys in dict
['pion', 'kaon']
Looping over keys in dict
['u', 'd', 's']
Looping over keys in dict
['electron', 'muon', 'tau']
All_vals (flattened)
['Z', 'gluon', 'W', 'photon', 'pion', 'kaon', 'u', 'd', 's', 'electron', 'muon', 'tau']
All_vals (not flattened)
[['Z', 'gluon', 'W', 'photon'], ['pion', 'kaon'], ['u', 'd', 's'], ['electron', 'muon', 'tau']]


In [147]:
dic2 = {123: (1,2,3), 'one': [1.2, 2.3] , (1,2): 'tuple'}
print(dic2)
for i in dic2:
    print(type(i), type(dic2[i]))

{123: (1, 2, 3), 'one': [1.2, 2.3], (1, 2): 'tuple'}
<class 'int'> <class 'tuple'>
<class 'str'> <class 'list'>
<class 'tuple'> <class 'str'>


Same behavior can be obtained with a double loop.

In [148]:
flat=[]
for v in particles.values():
    for i in v:
        flat.append(i)
print(flat)

['Z', 'gluon', 'W', 'photon', 'pion', 'kaon', 'u', 'd', 's', 'electron', 'muon', 'tau']


### Valid key types
- Keys must be hashable, i.e., a unique identifier can be created based on a given key.
- Hashable types:
    - immutable scalar type such as int, float, string
    - tuples
- You can check if a variable is hashable or not with `hash()`

In [149]:
hash('boson')

4365647662091428947

In [150]:
hash((2,3,2.4))

7938176375874763610

In [151]:
hash(3.1234324)

284615736650468355

In [152]:
c = 2.9
dict3 = {c:'Value of c', 5.9:'Value of something'}
print(dict3)

c = 5.4
dict3[c] = 'New'
print(dict3)

c = 5.6
dict3[-3.4] = 'New val'
print(dict3)

{2.9: 'Value of c', 5.9: 'Value of something'}
{2.9: 'Value of c', 5.9: 'Value of something', 5.4: 'New'}
{2.9: 'Value of c', 5.9: 'Value of something', 5.4: 'New', -3.4: 'New val'}


In [153]:
dict4 = {[1,2] : 'value'}

TypeError: unhashable type: 'list'

In [154]:
hash([1,2])

TypeError: unhashable type: 'list'

### Sorting a dictionary by `key`

We can take advantage of the ability to sort a list of tuples to get a sorted version of a dictionary.

First we sort the dictionary by the `key` using the `items()` method and `sorted()` function.

In [155]:
d = {'a':1, 'c':3, 'b':2}
d.items()

dict_items([('a', 1), ('c', 3), ('b', 2)])

In [156]:
sorted(d.items())

[('a', 1), ('b', 2), ('c', 3)]

In [157]:
print(d) # NB: dictionary is unchanged

{'a': 1, 'c': 3, 'b': 2}


### Sorting a dictionary by `value`

Requires constructing a list of tuples of the form `(value, key)` to sort by `value`.

In [158]:
tmp = list()
for k, v in d.items():
    tmp.append((v, k))
print(tmp)

[(1, 'a'), (3, 'c'), (2, 'b')]


In [159]:
tmp = sorted(tmp, reverse=True)
print(tmp)

[(3, 'c'), (2, 'b'), (1, 'a')]


## Sets

A **set** is

- an unordered collection of __unique__ elements
- the natural example is the collection of the keys of a dictionary

A set is created with `{}` or with the `set` function
```python
{v1, v2, v3, ...}
```
```python
my_set = set(v1, v2, v3, ...)
```

As usual, let's start with `dir`.  Looks like some logic-oriented operations can happen with sets.
Remember: SETL --> ABC --> Python.

In [160]:
dir(set)

['__and__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__iand__',
 '__init__',
 '__init_subclass__',
 '__ior__',
 '__isub__',
 '__iter__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__or__',
 '__rand__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__ror__',
 '__rsub__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__xor__',
 'add',
 'clear',
 'copy',
 'difference',
 'difference_update',
 'discard',
 'intersection',
 'intersection_update',
 'isdisjoint',
 'issubset',
 'issuperset',
 'pop',
 'remove',
 'symmetric_difference',
 'symmetric_difference_update',
 'union',
 'update']

Consider our `days` dictionary that stores the days for each month.

In [161]:
#day_len = days_per_month.values()
day_len = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
day_len

[31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]

We can create a set from this list

In [162]:
days_set = set(day_len)
print(days_set)

new_set = {1, 2, 3, 4, 1, 34, 3, 2, 34}
print(new_set)

{28, 30, 31}
{1, 2, 3, 4, 34}


In [163]:
my_tuple = (1, 2, 3, 2, 2, 3, 1)
print(set(my_tuple))

{1, 2, 3}


### Note that `{}` on its own creates a dictionary, not a set

In [164]:
tmp = {}
type(tmp)

dict

In [165]:
tmp = set()
type(tmp)

set

### Examples of common operations

In [166]:
new_set2 = {1, 3, 33}

print(new_set)
print(new_set2)

{1, 2, 3, 4, 34}
{1, 3, 33}


In [167]:
print(new_set.intersection(new_set2))
print(new_set2.intersection(new_set))

{1, 3}
{1, 3}


In [168]:
new_set.union(new_set2)

{1, 2, 3, 4, 33, 34}

In [169]:
new_set.difference(new_set2)

{2, 4, 34}

In [170]:
new_set.symmetric_difference(new_set2)

{2, 4, 33, 34}

In [171]:
new_set & new_set2

{1, 3}

In [172]:
new_set | new_set2

{1, 2, 3, 4, 33, 34}

In [173]:
print(new_set)
new_set.add(5)
print(new_set)
new_set.remove(5)
print(new_set)

{1, 2, 3, 4, 34}
{1, 2, 3, 4, 34, 5}
{1, 2, 3, 4, 34}


## Comprehensions for lists, sets, dictionaries
- One of Pythons most loved features
- Allows concise operation on collections without too many loops
- Output of the operation is a new collection (set, list, dict)

The basic expression for lists is

```python
[ expression(element) for element in collection if some_condition ]
```

Similar ones hold for other collections.  Basically, rather than writing, e.g.,
```python
a = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
even = {0, 2, 4, 6, 8}
odd = {1, 3, 5, 7, 9}
```
by hand we can use a comprehension with an algorithm.

### Example 1: odd and even numbers

In [174]:
# For loop
aa = set()
for j in range(10):
    aa.add(j)
print(aa)

# Comprehension
a = {i for i in range(10)}
print(a)

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}


In [175]:
even = {i for i in range(0,10,2)}
print(even)

odd = {i for i in range(1,10,2)}
print(odd)

{0, 2, 4, 6, 8}
{1, 3, 5, 7, 9}


### Example 2: grades (using sets and lists for data analysis)

#### Digression: generating random numbers
Suppose we want to analyse the results of an exam. 

First we need to generate N grades between 10 and 30.

As we saw previously, the [random](https://docs.python.org/3/library/random.html) module provided many useful functions for generation of random numbers or collections of numbers

In [176]:
import random as r

n_students = 50

grades = []

for i in range(n_students):
    grades.append(r.randrange(10,31))
print(grades)

[22, 23, 15, 16, 15, 10, 21, 23, 22, 26, 29, 29, 26, 15, 15, 12, 30, 24, 23, 20, 19, 10, 14, 11, 12, 21, 26, 13, 29, 22, 20, 25, 16, 18, 14, 14, 28, 22, 14, 26, 18, 10, 24, 23, 13, 15, 25, 25, 27, 12]


The same (modulo the randomness of the numbers!) but using **comprehension**

In [177]:
grades = [r.randrange(10,31) for i in range(n_students)]
print(grades)

[18, 27, 19, 27, 28, 12, 16, 28, 18, 24, 29, 11, 26, 12, 15, 11, 21, 12, 23, 23, 10, 20, 18, 27, 12, 13, 16, 21, 12, 20, 20, 28, 14, 30, 14, 28, 17, 23, 11, 24, 15, 25, 21, 14, 19, 12, 15, 12, 24, 18]


Using `set` we find the unique values of `grades`

In [178]:
vals = set(grades)
print(vals)
print(grades.count(18))
print(grades.count(23))

{10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 30}
4
3


In [179]:
data = {}
for v in vals:
    data[v] = grades.count(v)
    print("grade: {0}  frequency: {1}".format(v,data[v]))

grade: 10  frequency: 1
grade: 11  frequency: 3
grade: 12  frequency: 7
grade: 13  frequency: 1
grade: 14  frequency: 3
grade: 15  frequency: 3
grade: 16  frequency: 2
grade: 17  frequency: 1
grade: 18  frequency: 4
grade: 19  frequency: 2
grade: 20  frequency: 3
grade: 21  frequency: 3
grade: 23  frequency: 3
grade: 24  frequency: 3
grade: 25  frequency: 1
grade: 26  frequency: 1
grade: 27  frequency: 3
grade: 28  frequency: 4
grade: 29  frequency: 1
grade: 30  frequency: 1


In [180]:
data

{10: 1,
 11: 3,
 12: 7,
 13: 1,
 14: 3,
 15: 3,
 16: 2,
 17: 1,
 18: 4,
 19: 2,
 20: 3,
 21: 3,
 23: 3,
 24: 3,
 25: 1,
 26: 1,
 27: 3,
 28: 4,
 29: 1,
 30: 1}

The most basic question is how many people failed the exam.

You could do simple counting:

In [181]:
nfail = 0
for v in grades:
    if v < 18:
        nfail += 1
print("# grades <18:  %2d"%(nfail))

# grades <18:  21


In general, however, having a list of information rather than just a count is more flexible for future analyses.

In [182]:
failed = []
for v in grades:
    if v < 18:
        failed.append(v)
print("# grades <18:  {0}".format(len(failed)))

# grades <18:  21


Note that the following sequence of operations was performed
  - creation of a new empty list
  - iteration over existing objects
  - check of some_condition on each object
  - if outcome of check is positive object added to new list

Once again, this can be written concisely with __comprehension__.

In [183]:
new_failed  = [v for v in grades if v<18]
good_grades = [v for v in grades if v>=18]
print(len(new_failed), len(good_grades))

21 29


You can also also apply any function to each item.

In [184]:
def isodd(x):
    if x%2 != 0:
        return True
odds  = [v for v in grades if isodd(v)]
evens  = [v for v in grades if not isodd(v)]
print(len(odds))
print(len(evens))

import math
sqrts = [math.sqrt(v) for v in grades]
print(sqrts[:10])

21
29
[4.242640687119285, 5.196152422706632, 4.358898943540674, 5.196152422706632, 5.291502622129181, 3.4641016151377544, 4.0, 5.291502622129181, 4.242640687119285, 4.898979485566356]


### Example 3: comprehension with dictionaries
We now use a comprehension to invert our dict of months and days

In [185]:
# Two separate lists...
months = ['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']
day_months = [31, 28, 31, 30, 31 , 30, 31, 31, 30, 31, 30 , 31]
# ...a dictionary!
days_per_month = {}

for i, m in enumerate(months):
    days_per_month[m] = day_months[i]

In [186]:
days_per_month.values()

dict_values([31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31])

In [187]:
set(days_per_month.values())

{28, 30, 31}

In [188]:
inv_map = {i: [] for i in set(days_per_month.values())}
print(inv_map)

{28: [], 30: [], 31: []}


In [189]:
for i in days_per_month:
    inv_map[days_per_month[i]].append(i)
print(inv_map)

{28: ['february'], 30: ['april', 'june', 'september', 'november'], 31: ['january', 'march', 'may', 'july', 'august', 'october', 'december']}


# 5. FILES

As in C, the first step is to open a file object on disk before extracting/writing information from/into it
  
Main syntax
```Python
handle = open(filename, mode)
```

- `filename` is the path to the file and is a string
- `mode`: optional; possible modes are:


| Character | Meaning |
| :-: | :-- |
| `r` | open for reading (default) |
| `w` | open for writing, truncating the file first |
| `x` | create a new file and open it for writing |
| `a` | open for writing, appending to the end of the file if it exists |
| `b` | binary mode |
| `t` | text mode (default) |
| `+` | open a disk file for updating (reading and writing) |

Opening a file can fail
  - location does not exist
  - no write privilege for the location

It is important to close the file to make sure that (when writing) all data are flushed from memory to disk and the file handle closed properly.

In [230]:
# If you are running on google colab, make sure you
# upload examples/Python/words.txt to the directory
# content/ (the default search space for a jupyter
# notebook on colab)

fh = open('words.txt', 'r')

In [231]:
print(fh)
print(type(fh))
fh.close()

<_io.TextIOWrapper name='words.txt' mode='r' encoding='UTF-8'>
<class '_io.TextIOWrapper'>


Bonus track: see what files are available in the current working directory.

In [1]:
import os

l0 = os.listdir()

print(l0)

['1-Basics.ipynb', 'fall_asleep_55.py', 'Bilby.ipynb', 'example02.ipynb', '.DS_Store', 'GWPy.ipynb', '2-FirstApplications.ipynb', '3-MoreFuncs.ipynb', 'fall_asleep_3.py', 'GWs', 'example13.py', 'example12.py', 'PyCBC.ipynb', '__pycache__', 'example16.py', 'fall_asleep_2.py', 'example11.py', 'example01.py', '5-AnimatedPlots.ipynb', 'example15.py', 'fall_asleep_1.py', '4-NumPy.ipynb', 'example14.py', 'mymodule.py', 'fall_asleep_4.py', '.ipynb_checkpoints', 'words.txt', 'gravity1.py', '6-SciPy.ipynb', 'cmd_line_args.py', '7_MCMC.ipynb', 'polymorphism.py']


# Reading text files

### Line-by-line

A file handle opened in read mode can be treated as a **sequence of strings**.

Each line in the file is a string in the sequence.

NB: if you read numbers and want to process them, you will have to convert them from string to a number type.

In [233]:
fh = open('words.txt', 'r')
lines = []

for line in fh:
    print(line)
    lines.append(line)

print(lines)

fh.close()

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.



Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.



Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.



Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

['Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.\n', '\n', 'Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.\n', '\n', 'Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.\n', '\n', 'Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n']


#### Note the `\n` newline character
- It causes the blanks
- Use `rstrip` to get rid of them

In [234]:
fh = open('words.txt', 'r')
lines = []

for line in fh:
    line = line.rstrip()
    print(line)
    lines.append(line)

print(lines)

fh.close()   

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
['Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.', '', 'Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.', '', 'Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.', '', 'Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.']


### All at once

We can read the whole file (newlines and all) into a **single string** with `read()`

In [236]:
fh = open('words.txt', 'r')
content = fh.read()
print(len(content))
print(content[:20])

fh.close()

print(content)

449
Lorem ipsum dolor si
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.



### So how do I get words?  `split()`!

In [237]:
fh = open('words.txt', 'r')
lines = []
words = []

for line in fh:
    line = line.rstrip()
    words.extend(line.split())

print(words)

fh.close()

['Lorem', 'ipsum', 'dolor', 'sit', 'amet,', 'consectetur', 'adipiscing', 'elit,', 'sed', 'do', 'eiusmod', 'tempor', 'incididunt', 'ut', 'labore', 'et', 'dolore', 'magna', 'aliqua.', 'Ut', 'enim', 'ad', 'minim', 'veniam,', 'quis', 'nostrud', 'exercitation', 'ullamco', 'laboris', 'nisi', 'ut', 'aliquip', 'ex', 'ea', 'commodo', 'consequat.', 'Duis', 'aute', 'irure', 'dolor', 'in', 'reprehenderit', 'in', 'voluptate', 'velit', 'esse', 'cillum', 'dolore', 'eu', 'fugiat', 'nulla', 'pariatur.', 'Excepteur', 'sint', 'occaecat', 'cupidatat', 'non', 'proident,', 'sunt', 'in', 'culpa', 'qui', 'officia', 'deserunt', 'mollit', 'anim', 'id', 'est', 'laborum.']


## Searching through a file
* `startswith()`
* `in`
* `endswith()`

In [238]:
fh = open('words.txt', 'r')
for line in fh:
    line = line.rstrip()
    if line.startswith('Duis'):
        print(line)

fh.close()

Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.


In [239]:
fh = open('words.txt', 'r')
for line in fh:
    line = line.rstrip()
    if not 'esse' in line:
        print(line)

fh.close()

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.


Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.


In [240]:
fh = open('words.txt', 'r')
for line in fh:
    line = line.rstrip()
    if line.endswith('laborum.'):
        print(line)

fh.close()

Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.


## Writing to a text file

By default `write()` does  not have a carriage return so you need to add `\n` to start a new line.

In [242]:
fname = 'output.txt'
fh = open(fname, 'w')

fh.write('first file in Python\n')
fh.write('a second line\n')
    
fh.close()

Check that the file was created by comparing the new list of files to the old one.

In [243]:
lnew = set(os.listdir())

new_items = lnew.difference(l0)
print(new_items)

{'output.txt'}


Check the file content Python-ically

In [244]:
fh = open(fname, 'r')
for line in fh:
    line = line.rstrip()
    print(line)

fh.close()

first file in Python
a second line


Check it again using the magic Jupyter `!` powers

In [245]:
!cat output.txt

first file in Python
a second line


In [246]:
!ls

1-Basics.ipynb            [34m__pycache__[m[m               fall_asleep_2.py
2-FirstApplications.ipynb cmd_line_args.py          fall_asleep_3.py
3-MoreFuncs.ipynb         example01.py              fall_asleep_4.py
4-NumPy.ipynb             example02.ipynb           fall_asleep_55.py
5-AnimatedPlots.ipynb     example11.py              gravity1.py
6-SciPy.ipynb             example12.py              mymodule.py
7_MCMC.ipynb              example13.py              output.txt
Bilby.ipynb               example14.py              polymorphism.py
GWPy.ipynb                example15.py              words.txt
[34mGWs[m[m                       example16.py
PyCBC.ipynb               fall_asleep_1.py


## Getting rid of `close()`

To make it less C-like and feel more like Python we can get rid of `close()` by using the `with` statement.

`with` makes sure that ofile is an open file handle in the `with` scope. Once it ends you can no more use the handle, because `close()` has been called automatically.

In [247]:
fname = 'output2.txt'

with open(fname, 'w') as ofile:
    ofile.write('A new file in python\n')
    ofile.write('1.2 3.2 4.5\n')

In [248]:
!cat output2.txt

A new file in python
1.2 3.2 4.5


**Deleting the two output files to avoid having a proliferation of small test files.**

In [249]:
!rm output.txt
!rm output2.txt

## Storing lists and multiple values

You can use the C-style output to format and store elements of a list

In [252]:
import random  # Let's generate some random numbers

nevents = 3

fname = 'output.txt'

with open(fname,'w') as f:
    for i in range(nevents):
        measurements = [random.random() for j in range(5)]
        for val in measurements:
            f.write("%.5f\t"%val)
        f.write('\n')

In [253]:
!cat output.txt
!rm output.txt

0.87885	0.27548	0.33935	0.98734	0.40571	
0.37831	0.60046	0.43215	0.59216	0.94938	
0.45907	0.16555	0.21361	0.43678	0.01313	


A more Pythonic approach is to use the `writelines()` function and comprehensions

In [287]:
import random

nevents = 10

fname = 'output.txt'
with open(fname,'w') as f:
    for i in range(nevents):
        f.writelines("%.3f\t"%val for val in [random.random() for j in range(5)])
        f.write('\n')

In [288]:
!cat output.txt

0.971	0.158	0.409	0.506	0.271	
0.875	0.701	0.364	0.100	0.563	
0.176	0.960	0.446	0.050	0.940	
0.294	0.668	0.681	0.832	0.796	
0.083	0.440	0.015	0.653	0.254	
0.798	0.869	0.177	0.413	0.000	
0.785	0.066	0.168	0.451	0.843	
0.792	0.822	0.968	0.243	0.990	
0.804	0.007	0.787	0.184	0.229	
0.825	0.886	0.532	0.949	0.094	


Let's read the file assuming we have to process its data
- we need to remove the `\t`'s
- we need to ensure we have floats

Python has no problem removing the tabs!  If you inspect `help(str.split)` it says:
"[...] will split on any whitespace character (including \\n \\r \\t \\f and spaces) and will discard empty strings from the result."

Should you need to split things differently, you can provide the sequence of characters for splitting as and argument to `split()`.  For example, see what happens if you try splitting with `split(0.)`.

In [289]:
fname = 'output.txt'
lines = [l.strip() for l in open(fname)]
print(lines)

['0.971\t0.158\t0.409\t0.506\t0.271', '0.875\t0.701\t0.364\t0.100\t0.563', '0.176\t0.960\t0.446\t0.050\t0.940', '0.294\t0.668\t0.681\t0.832\t0.796', '0.083\t0.440\t0.015\t0.653\t0.254', '0.798\t0.869\t0.177\t0.413\t0.000', '0.785\t0.066\t0.168\t0.451\t0.843', '0.792\t0.822\t0.968\t0.243\t0.990', '0.804\t0.007\t0.787\t0.184\t0.229', '0.825\t0.886\t0.532\t0.949\t0.094']


In [290]:
fname = 'output.txt'
lines = [l.strip() for l in open(fname)]
raw_data = [l.split() for l in lines]
print(raw_data)

[['0.971', '0.158', '0.409', '0.506', '0.271'], ['0.875', '0.701', '0.364', '0.100', '0.563'], ['0.176', '0.960', '0.446', '0.050', '0.940'], ['0.294', '0.668', '0.681', '0.832', '0.796'], ['0.083', '0.440', '0.015', '0.653', '0.254'], ['0.798', '0.869', '0.177', '0.413', '0.000'], ['0.785', '0.066', '0.168', '0.451', '0.843'], ['0.792', '0.822', '0.968', '0.243', '0.990'], ['0.804', '0.007', '0.787', '0.184', '0.229'], ['0.825', '0.886', '0.532', '0.949', '0.094']]


In [291]:
data = [[float(n) for n in l] for l in raw_data]
print(data)

[[0.971, 0.158, 0.409, 0.506, 0.271], [0.875, 0.701, 0.364, 0.1, 0.563], [0.176, 0.96, 0.446, 0.05, 0.94], [0.294, 0.668, 0.681, 0.832, 0.796], [0.083, 0.44, 0.015, 0.653, 0.254], [0.798, 0.869, 0.177, 0.413, 0.0], [0.785, 0.066, 0.168, 0.451, 0.843], [0.792, 0.822, 0.968, 0.243, 0.99], [0.804, 0.007, 0.787, 0.184, 0.229], [0.825, 0.886, 0.532, 0.949, 0.094]]


Even more concisely

In [292]:
fname = 'output.txt'
data = [[float(n) for n in l.strip().split('\t')] for l in open(fname)]
print(data)

[[0.971, 0.158, 0.409, 0.506, 0.271], [0.875, 0.701, 0.364, 0.1, 0.563], [0.176, 0.96, 0.446, 0.05, 0.94], [0.294, 0.668, 0.681, 0.832, 0.796], [0.083, 0.44, 0.015, 0.653, 0.254], [0.798, 0.869, 0.177, 0.413, 0.0], [0.785, 0.066, 0.168, 0.451, 0.843], [0.792, 0.822, 0.968, 0.243, 0.99], [0.804, 0.007, 0.787, 0.184, 0.229], [0.825, 0.886, 0.532, 0.949, 0.094]]


In [293]:
!rm output.txt # Cleaning up

## Storing Lists, Dicts, and Tuples

As seen with the examples above, **with textfile there is no automatic writing of objects**. So for a dictionary you need to take care of formatting the output file. 

In [271]:
import random

datum = {'val':-1.1, 'err':0.2}

fname = 'output.txt'

with open(fname,'w') as f:
    f.writelines("%s\t"%v for v in datum.keys())
    f.write('\n')
    for i in range(10):
        datum['val'] = random.uniform(-3.,3.)
        datum['err'] = random.normalvariate(0., 0.2)
        f.writelines("%.3f\t"%val for val in datum.values() )
        f.write('\n')

In [272]:
!cat output.txt
!rm output.txt # Cleaning up

val	err	
0.491	-0.050	
-0.696	0.297	
1.903	0.327	
-0.591	-0.029	
-1.813	-0.113	
-0.784	-0.067	
-1.350	-0.213	
-2.530	0.167	
-1.889	-0.142	
1.632	-0.064	


## Data storage with `pickle` 

Python provides a built-in [pickle]() library for easy storage of Python object hierchies in binary format: this is known as **pickling**.

Notice the `b` in the handle: it stands for binary.

In [278]:
import random
import pickle
import os

data = {'val':[], 'err':[]}
for i in range(10):
    data['val'].append(random.uniform(-3.,3.))
    data['err'].append(random.normalvariate(0., 0.2))

fname = 'pickle1.data'
with open(fname,'wb') as f:
    pickle.dump(data,f)

**Unpickling** is the opposite process, namely reading the binary file and rebuilding the object hierarchy.

Notice the `b` in the handle: it stands for binary.

In [279]:
fname = 'pickle1.data'
with open(fname,'rb') as f:
    in_data = pickle.load(f)

print(in_data)
print(data == in_data)
!rm pickle1.data

{'val': [-0.7680225795747164, 2.527001241047061, 2.5974287848465236, 2.637106796987837, -2.6304191103262893, 2.8659051671785036, -1.8419907563025608, 0.46763953798136004, 1.540037537220254, -0.9616677230325439], 'err': [0.157080213935106, -0.1748309460145931, 0.04416631796531073, 0.13280157857375383, 0.1549997179622558, 0.3404345802585028, 0.06194936329930785, 0.24800930897480633, -0.09785715889903192, -0.3844094971656137]}
True


## Data storage with JSON 

A commonly used format for data storage that is cross platform and cross language is [JSON (JavaScript Object Notation)](https://www.json.org).

The JSON library in Python allows you to convert Python objects (including your custom classes) into JSON for storage.

Converting or enconding an object into JSON is commonly called **serialization**. Converting from JSON to Python objects is referred to as **deserialization**. See [this page](https://realpython.com/python-json/) for further details.

There are two functions commonly used:
- `dump()`: convert an object into JSON and possibly write to file
- `dumps()` note the extra **s**: convert to JSON string but cannot interact with file

The two functions are identical except for the file interaction.

The following is an example of a dictionary and a list being stored into JSON files.

In [281]:
import json
import os

dict_data = {'val':-1.1, 'err':0.2}

x = json.dumps(dict_data)
print(x, dict_data)
print(type(x), type(dict_data))

list_data = [z for z in range(10)]
y = json.dumps(list_data)
print(y, list_data)
print(type(y), type(list_data))

with open('data.json','w') as of:
    json.dump([dict_data, list_data], of)

{"val": -1.1, "err": 0.2} {'val': -1.1, 'err': 0.2}
<class 'str'> <class 'dict'>
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
<class 'str'> <class 'list'>


**Deserialize** the data from file with `load()`

In [282]:
with open('data.json') as infile:
    indata = json.load(infile)
print(indata)

# Showing consistency between what was written to and what was read from file
print(dict_data == indata[0])
print(type(indata[0]))
print(list_data == indata[1])
print(type(indata[1]))

[{'val': -1.1, 'err': 0.2}, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]
True
<class 'dict'>
True
<class 'list'>


In [283]:
!rm data.json # Cleaning up

# READY FOR `examples/Python/2-FirstApplications.ipynb`!