# Lecture 3

In this lecture we will talk about strings, printing and files, as well as introduce external libraries and NumPy.

The material covers:
* [1] Sections 2.4, 2.5.3 & 7.1 - 7.4
* [2] Sections 3.1.2, chapter 7
* [4] Data types and array creation from the NumPy User manual

We will also go through Computer Assignment 1.

## Strings

* Strings are _immutable_.

* Strings can use both single and double quotes as delimiters:

In [1]:
a = 'single "quote" string'
b = "double 'quote' string"
print(a)
print(b)

single "quote" string
double 'quote' string


This enables us to easy have double or single quotes as part of the string!

To construct multi-line strings, use triple quotes:

In [2]:
c = '''This is
a multi-
line string
'''
d = """This is
another multi-
line string"""
print(c)
print(d)

This is
a multi-
line string

This is
another multi-
line string


* Individual characters or slices of a string can be accessed by indexing
    * Indexing starts from **0** from the left
    * Indexing starts from **-1** from the right
    * For slices, the first index is *inclusive*, the second is *exclusive*, i.e. in math-notation: [i,j)

Some examples:

In [3]:
e = "Another example string"
print(e[0])
print(e[-3])
print(e[8:15])
print(e[14:7:-1])
print(e[-14:-6])

A
i
example
elpmaxe
example 


* There are quite a few methods available for manipulating strings, some examples:
    * startswith(prefix[, start[, end]]) -> bool
    * endswith(suffix[, start[, end]]) -> bool
    * find(sub[, start[, end]]) -> int
    * lower() -> str
    * upper() -> str
    * replace(old, new[, count]) -> str
    * split(sep=None, maxsplit=-1) -> list of strings
    * strip([chars]) -> str
    
Examples:

In [4]:
print(c.startswith("This"))
print(d.endswith("list"))
print(a.upper())
print(c.replace("string","list"))

True
False
SINGLE "QUOTE" STRING
This is
a multi-
line list



For mor methods and descriptions, use help()

**Remember strings are immutable and a new string is always returned!**

In [5]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(...)
 |      S.__format__(format_spec) -> str
 |      
 |      Return a formatted version of S as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getatt

* A common task (usually when reading files) is to split a string into parts
* This is very easy using the **split()** method

For example, lets say you have the string "Blue, Green, Red, Yellow, Black" and want to have the individual parts (colours) in a list of strings?

In [6]:
data = "Blue, Green, Red, Yellow, Black"
split_data = data.split( ',' )
print( split_data )

['Blue', ' Green', ' Red', ' Yellow', ' Black']


You can of course use any character to split on.

Also look at **rsplit()**

* Strings can be used to provide a formatted view of some data using the **format()** method

In [7]:
f1 = '1: Hello {}, do you like {} tea?'
f2 = '2: Hello {0}, do you like {1} tea? I like {1} tea!'
f3 = '3: Hello {1}, do you like {0} tea?'
f4 = '4: Hello {0[0]}, do you like {0[1][2]} tea?'
f5 = '5: Hello {name}, do you like {colour} tea?'
print( f1.format( "John", "green") )
print( f2.format( "John", "green") )
print( f3.format( "John", "green") )
print( f4.format( ["John", ["black", "red", "green"]] ) )
print( f5.format( colour= "green", name="John" ) )

1: Hello John, do you like green tea?
2: Hello John, do you like green tea? I like green tea!
3: Hello green, do you like John tea?
4: Hello John, do you like green tea?
5: Hello John, do you like green tea?


## Printing

A basic and common task is (like seen above) to print information.

Printing is by default done to the file *stdout*, but can be re-directed to (other) files as well.

The full syntax for the Python 3 print command is:

In [8]:
help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



In [9]:
print(1,2,3,4,5,sep=':')
print("First output", end="|")
print("Second output")

1:2:3:4:5
First output|Second output


* There are two basic methods for formatting output in Python
    * Using the syntax from earlier versions of Python
    * Or the preferred method from Python 3, using string formatting

In [10]:
print("My integer is: %d, my float is %4.3f" % (42, 3.141592653589793))
print("My integer is: {}, my float is {:4.3f}".format(42, 3.141592653589793))

My integer is: 42, my float is 3.142
My integer is: 42, my float is 3.142


The format specifier (the part after the **:** inside the brackets) have this syntax:

```python
format_spec ::=  [[fill]align][sign][#][0][width][,][.precision][type]
fill        ::=  <any character>
align       ::=  "<" | ">" | "=" | "^"
sign        ::=  "+" | "-" | " "
width       ::=  integer
precision   ::=  integer
type        ::=  "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"
```

For full information on the *str.format* formats, see https://docs.python.org/3/library/string.html#format-string-syntax

## Files

* All input and output is through the use of files (the screen and keyboard are handled using the *default* files *sys.stdout* and *sys.stdin* respectively).
* *sys.stdout* and *sys.stdin* is available automatically (no explicit open needed)
* To open a file, use the **open()** function: *file_handle = open(filename_string, mode_string)*
* Typical modes are "r" for reading from a file, "w" for writing to a file and "a" to append to an existing file
* After processing a file, it is closed using the *close* function

Example:

In [11]:
file1 = open( "testfile1.txt", 'r' )

name2 = "testfile2.txt"
mode2 = 'w'
file2 = open( name2, mode2 )

file3 = open( "testfile1.txt", 'a')

file1.close()
file2.close()
file3.close()

### Reading from files

* You can read full lines or streams of characters from a file:
    * **readline()** - reads a full line (from a text file)
    * **read(n)** - reads *n* (or all) characters from the file
    
* Or you can iterate on the file object itself!

The following two examples does exactly the same thing:

In [12]:
file1 = open( "testfile1.txt", 'r' )
line = file1.readline()
while line:
    print(line)
    line = file1.readline()

file1.close()

Apple

Pear

Banana

Kitchen sink



In [13]:
file1 = open( "testfile1.txt", 'r' )

for line in file1:
    print(line)

file1.close()

Apple

Pear

Banana

Kitchen sink



While the input-file contained the words each on one line with no empty lines in between, in the output above there are empty lines in the output.

A common method to avoid this is to use one of the **strip()**, **lstrip()** or **rstrip()**) methods to clean the input:

In [14]:
file1 = open( "testfile1.txt", 'r' )

for line in file1:
    print(line.rstrip())

file1.close()

Apple
Pear
Banana
Kitchen sink


* You can use the **readlines()** method to read the entire file in to a list of strings at once, but this is discouraged as it might lead to high memory usage and a slower code. Much better to process lines as you read them!

### Writing to files

* To write, simply use the **write** method
* Note that newlines have to be inserted manually

In [15]:
name2 = "testfile2.txt"
mode2 = 'w'
file2 = open( name2, mode2 )

file2.write("My test output\n")
file2.write("The magic number is: {}\n".format(42))

file2.close()

Content of "testfile2.txt":
```
My test output
The magic number is: 42
```

* As an alternative to **write()**, you can redirect the standard **print** function to use your file instead of *sys.stdout*!
    * Note the absence of a newline in the first print statement below (print adds a newline)
    * If you want the same behaviour as for the **write()** method, append ", end=''" to the list of arguments for **print()**

In [16]:
name2 = "testfile2.txt"
mode2 = 'w'
file2 = open( name2, mode2 )

print("My test output", file=file2)
print("The magic number is: {}\n".format(42), file=file2, end="")

file2.close()

## External packages

Although Python includes a lot of modules in its standard library (https://docs.python.org/3/library/), a _lot_ of external packages are available!

A good starting point in finding an external package for any need, is to start at the Python Package Index (https://pypi.python.org/pypi) currently listing *73168* packages (up from *54406* packages one year ago)! 

We have already installed a few on these in the Computer session, for example:
* NumPy
* SciPy
* Matplotlib

### Package structure

Usually packages (and namespaces!) are made up of a directory/file hierachy.

For example, the NumPy package looks something like this on disc:
```
numpy/
-rw-r--r--   1 root  wheel    1338  5 Jan 06:41 __config__.py
-rw-r--r--   1 root  wheel    6516  2 Nov 13:22 __init__.py
drwxr-xr-x  11 root  wheel     374 13 Jan 22:21 __pycache__
-rw-r--r--   1 root  wheel   13078  2 Nov 13:22 _import_tools.py
-rw-r--r--   1 root  wheel  218813  2 Nov 13:22 add_newdocs.py
drwxr-xr-x   7 root  wheel     238 13 Jan 22:21 compat
drwxr-xr-x  35 root  wheel    1190 13 Jan 22:21 core
-rw-r--r--   1 root  wheel   13747  2 Nov 13:22 ctypeslib.py
drwxr-xr-x  32 root  wheel    1088 13 Jan 22:21 distutils
drwxr-xr-x  21 root  wheel     714 13 Jan 22:21 doc
-rw-r--r--   1 root  wheel    1864  2 Nov 13:22 dual.py
drwxr-xr-x  22 root  wheel     748 13 Jan 22:21 f2py
drwxr-xr-x  10 root  wheel     340 13 Jan 22:21 fft
drwxr-xr-x  30 root  wheel    1020 13 Jan 22:21 lib
drwxr-xr-x  10 root  wheel     340 13 Jan 22:21 linalg
drwxr-xr-x  13 root  wheel     442 13 Jan 22:21 ma
-rw-r--r--   1 root  wheel    9569  2 Nov 13:22 matlib.py
drwxr-xr-x   7 root  wheel     238 13 Jan 22:21 matrixlib
drwxr-xr-x  15 root  wheel     510 13 Jan 22:21 polynomial
drwxr-xr-x   9 root  wheel     306 13 Jan 22:21 random
-rw-r--r--   1 root  wheel     919  2 Nov 13:22 setup.py
drwxr-xr-x  11 root  wheel     374 13 Jan 22:21 testing
drwxr-xr-x   4 root  wheel     136 13 Jan 22:21 tests
-rw-r--r--   1 root  wheel     195  5 Jan 06:41 version.py
```
```
numpy/linalg/
-rw-r--r--  1 root  wheel    2310  2 Nov 13:22 __init__.py
drwxr-xr-x  6 root  wheel     204 13 Jan 22:21 __pycache__
-rwxr-xr-x  1 root  wheel  120924  5 Jan 06:41 _umath_linalg.so
-rw-r--r--  1 root  wheel    1198  2 Nov 13:22 info.py
-rwxr-xr-x  1 root  wheel   21148  5 Jan 06:41 lapack_lite.so
-rw-r--r--  1 root  wheel   67345  2 Nov 13:22 linalg.py
-rw-r--r--  1 root  wheel    1892  2 Nov 13:22 setup.py
drwxr-xr-x  6 root  wheel     204 13 Jan 22:21 tests
```
```
numpy/linalg/tests/
-rw-r--r--  1 root  wheel   1749  2 Nov 13:22 test_build.py
-rw-r--r--  1 root  wheel    710  2 Nov 13:22 test_deprecations.py
-rw-r--r--  1 root  wheel  40062  2 Nov 13:22 test_linalg.py
-rw-r--r--  1 root  wheel   2913  2 Nov 13:22 test_regression.py
```

and so on

We can see that we for example have the main module *numpy* as well as a submodule *numpy.linalg* and a subsubmodule *numpy.linalg.tests*.

Each of the modules contain their own methods, in their own namespace

In [17]:
import numpy as np
dir(np)

['ALLOW_THREADS',
 'BUFSIZE',
 'CLIP',
 'DataSource',
 'ERR_CALL',
 'ERR_DEFAULT',
 'ERR_IGNORE',
 'ERR_LOG',
 'ERR_PRINT',
 'ERR_RAISE',
 'ERR_WARN',
 'FLOATING_POINT_SUPPORT',
 'FPE_DIVIDEBYZERO',
 'FPE_INVALID',
 'FPE_OVERFLOW',
 'FPE_UNDERFLOW',
 'False_',
 'Inf',
 'Infinity',
 'MAXDIMS',
 'MachAr',
 'NAN',
 'NINF',
 'NZERO',
 'NaN',
 'PINF',
 'PZERO',
 'PackageLoader',
 'RAISE',
 'SHIFT_DIVIDEBYZERO',
 'SHIFT_INVALID',
 'SHIFT_OVERFLOW',
 'SHIFT_UNDERFLOW',
 'ScalarType',
 'Tester',
 'True_',
 'UFUNC_BUFSIZE_DEFAULT',
 'UFUNC_PYVALS_NAME',
 'WRAP',
 '__NUMPY_SETUP__',
 '__all__',
 '__builtins__',
 '__cached__',
 '__config__',
 '__doc__',
 '__file__',
 '__git_revision__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_import_tools',
 '_mat',
 'abs',
 'absolute',
 'absolute_import',
 'add',
 'add_docstring',
 'add_newdoc',
 'add_newdoc_ufunc',
 'add_newdocs',
 'alen',
 'all',
 'allclose',
 'alltrue',
 'alterdot',
 'amax',
 'amin',
 'angle',


In [18]:
help(np.diagonal)

Help on function diagonal in module numpy.core.fromnumeric:

diagonal(a, offset=0, axis1=0, axis2=1)
    Return specified diagonals.
    
    If `a` is 2-D, returns the diagonal of `a` with the given offset,
    i.e., the collection of elements of the form ``a[i, i+offset]``.  If
    `a` has more than two dimensions, then the axes specified by `axis1`
    and `axis2` are used to determine the 2-D sub-array whose diagonal is
    returned.  The shape of the resulting array can be determined by
    removing `axis1` and `axis2` and appending an index to the right equal
    to the size of the resulting diagonals.
    
    In versions of NumPy prior to 1.7, this function always returned a new,
    independent array containing a copy of the values in the diagonal.
    
    In NumPy 1.7 and 1.8, it continues to return a copy of the diagonal,
    but depending on this fact is deprecated. Writing to the resulting
    
    In NumPy 1.9 it returns a read-only view on the original array.
    Attemp

In [19]:
import numpy.linalg as la
dir(la)

['LinAlgError',
 'Tester',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_umath_linalg',
 'absolute_import',
 'bench',
 'cholesky',
 'cond',
 'det',
 'division',
 'eig',
 'eigh',
 'eigvals',
 'eigvalsh',
 'info',
 'inv',
 'lapack_lite',
 'linalg',
 'lstsq',
 'matrix_power',
 'matrix_rank',
 'norm',
 'pinv',
 'print_function',
 'qr',
 'slogdet',
 'solve',
 'svd',
 'tensorinv',
 'tensorsolve',
 'test']

## NumPy - _the_ library for linear algebra in Python

(For detailed documentation on NumPy, look at the NumPy User Guide (http://docs.scipy.org/doc/numpy/user/), or the NumPy Reference Guide (http://docs.scipy.org/doc/numpy/reference/).)

Why NumPy?

Remember that unlike for example Matlab, Python lists contain *references* to object instead of the data itself. While this makes Python lists very flexible, it is not optimal from either a performance perspective or looking at memory usage.

NumPy on the other hand stores *data* directly in its array object. The data is furthermore stored continuous in memory to further improve the performance.

Quite a few other external libraries (like SciPy) expects data in NumPy format to work at all or work efficiently.

In addition basic operations, like multiplying objects in two vectors below, are not allowed for Python lists.

In [20]:
a = [1, 3, 4]; b = [3, 2, 5]
# Multiply each element and add 1
c = a * b + 1 # Does not work!
#c = []
#for A, B in zip(a,b):
#    c.append(A*B+1)
#print(c)

TypeError: can't multiply sequence by non-int of type 'list'

* The major NumPy data-type is the **array** object, which can hold a N-dimensional array
* All elements in the array _must_ be of the same type
* Supported types include
    * int8, int16, int32, int64, int (= int32 or int64)
    * uint8, uint16, uint32, uint64
    * float16, float32, float64 (= float)
    * complex64, complex128 (= complex)
* Type automatically chosen based on input if no type specification given

In [None]:
import numpy as np
a = np.array([1, 3, 4])
b = np.array([3, 2, 5], dtype=np.int8)
c = a * b + 1
print(c)
print(type(c[0]))

* In the example above, we created a NumPy array from a Python list. This is one way of creating a NumPy array, more will be shown later.
* The **shape()** of an array is a tuple
* You can for example use a shape-tuple to create empty arrays etc. using **zeros()**, **ones()** or **empty()**

In [None]:
a = np.array([[1., 3.],
              [2., 7.],
              [.4, 6.]])
np.shape(a)

In [None]:
print(a)

In [None]:
b = np.zeros((2,4), dtype=np.complex64)
print(b)

* To create a series of numbers you can use the **arange**(*start=0, stop, step=1*) method for integers
    * Note: like for slicing, start is *inclusive*, stop is *exclusive*
* For floats, use **linspace**(*start, stop, num=10*)
    * Note: both ends are *inclusive*!

In [None]:
a = np.arange(3)
b = np.arange(stop = 10, start = 24, step = -2)
print("a = {}".format(a))
print("b = {}".format(b))
c = np.linspace (-1, 1, 5)
print("c = {}".format(c))

## Computer Assignment 1

The best way to learn a new language (computer or other) is of course to use it!

In the computer assignments you will have plenty of opportunity to practice, please try to do as much as possible by your self (or in your group).

In Computer Assignment 1 you will be able to show your skill in:
* Basic Python language elements
* Creating your own functions
* Reading files
* Storing data in NumPy arrays
* Using Matplotlib to display 2D data
* Using algorithms from SciPy
* Documenting your Python code


* Lectures 1-3 should be enough to get you started and to do task 1 (and 3)
* In Lecture 4 we will go through material that will help you complete task 1 - 3
* Lecture 5 will then wrap up block on these external libraries and you have the information needed to complete all tasks in CA1

Start by going to PingPong and create your project group
* name the group according to your names or similar (EvyJohn, SmithBloom etc.)
* remember to check the box at the the 3 computer assignments
* add members to the group (remember 1 or 2 members in each group!)

Deadline for CA1 is Sunday 7/2!

Make sure you read all the instructions, good luck!