# Computational Programming with Python
### Lecture 10: File handling and more about plotting


### Center for Mathematical Sciences, Lund University
Lecturer: Claus Führer, Malin Christersson, Robert Klöfkorn


# This lecture

- `split` and `join`
- File handling
- More about plotting


# `split` and `join`

The methods `split` and `join`can be used on instances of `str`.

```python
separation.join(some_list)    # returns a string

some_string.split(separation) # returns a list
```

In both cases, `separation` is a string.

## Examples using `join`

Using different separators:

In [1]:
town_list = ["Stockholm", "Paris", "London"]
towns1 = "".join(town_list)
towns2 = "\t".join(town_list)  # tab
towns3 = '"'.join(town_list) 
print(towns1)
print(towns2)
print(towns3)

StockholmParisLondon
Stockholm	Paris	London
Stockholm"Paris"London


When the list elements are not strings, they must be cast to `str`.

In [2]:
nr_list = [3, 7.4, 1e-4, 3+7j]
numbers = " and ".join(str(nr) for nr in nr_list)
print(numbers)

3 and 7.4 and 0.0001 and (3+7j)


## Examples using `split`

If no argument is given, blank is used as separator:

In [3]:
msg ="Let no one ignorant of geometry enter!"
L1 = msg.split()
print(L1)

L2 = msg.split("of geometry")
print(L2)

['Let', 'no', 'one', 'ignorant', 'of', 'geometry', 'enter!']
['Let no one ignorant ', ' enter!']


In [4]:
filename = "test.min.js"
L3 = filename.split(".")
print("The file extension is", L3[-1])

The file extension is js


## Using `split` and `input`

We could use `split` after letting the user enter a number of numbers:

In [None]:
answer = input("Enter numbers separated by comma: ")
strings = answer.split(',')
floats = [float(element) for element in strings]
print("The sum is", sum(floats)) 


# File handling

## File I/O
File I/O (input and output) is essential when
- working with measured or scanned data
- interacting with other programs
- saving information for comparisons or other postprocessing needs
- ...

## File objects

A file is a Python object with associated methods:
```python
myfile = open('mydata.txt', 'w') #create a file for writing
```
If the file mydata.txt already exists (in the current working directory), it will be overwritten. If not, it will be created.

To write some data to the file:
```python
myfile.write('some data')
myfile.write('some other data')
```

When you're done, close the file:
```python
myfile.close()
```

## Example &hyphen; Writing to a file

In [None]:
names = ["Emma", "Hugo", "Erik", "Josefin", "Mia", "Lukas", "Tim", "Bo"]

myfile = open('names.txt', 'w')
for name in names:
    myfile.write(name)
myfile.close()

- Run the code and check out the file that has been creted.
- Make a comment of the last row so `myfile` is never closed. Then check out names.txt.
- Concatenate each name with a blank `' '` when writing to the file. Then check out names.txt.
- Concatenate each name with a `'\n'` (new line) when writing to the file. Then check out names.txt.

## Reading from a file

```python
myfile = open('mydata.txt', 'r') #create a file for reading
```
The file mydata.txt must exist (in the current working directory).

The whole file can be read and stored in a string by

```python
s = myfile.read()
```

You can also read one line at a time by

```python
myfile.readline()
```

You can read all lines and make a list by using:

```python
L = myfile.readlines()
```

or use it like:
```python
for line in myfile.readlines():
    print(line)
```

## Files as generators
A file object is a **generator**. We will talk more about generators later.

A generator is like a list, except the values need not exist until asked for.

A main feature of generators is that they are disposable. When you read a line from a file, it is removed from the file object (not from the file itself). The following code will print three different things:

```python
print(myfile.readline()) # Line 1
print(myfile.readline()) # Line 2
print(myfile.readline()) # Line 3

```


## File close method

A file has to be closed before it can be reread.

```python
myfile.close() # closes the file object
``` 

It is automatically closed when
- the program ends
- the enclosing program unit (e.g. function) is left

Before a file closes, you won't see any changes in an external editor.

## Example &hyphen; Reading from a file

Assuming that the names in name.txt has been written as one name per row, we can make a list of names, and then "shuffle" them:

In [None]:
from random import shuffle

myfile = open('names.txt', 'r')
L = myfile.readlines()
myfile.close()

L = [element.strip() for element in L]

shuffle(L)
print(L)

- The string method `strip()` removes leading and trailing blanks. See the result when not using `strip()`!
- Change the code that writes names.txt so that the names are separated by a blank. How do you read it now? Then use `split` to make a list when reading the file.

## What you could do - Random groups

Define a function that takes a filename and a maximum group size as arguments. The function can read the names from the file and then print random groups using the maximum group size. 

```python
def print_groups(filename, max_size):
   ...          
print_groups("names.txt", 3)
```
The output could look like this:
```
There are 8 names.
Maximum group size is 3
Group 1
    Lukas
    Mia
    Hugo
Group 2
    Bo
    Erik
    Josefin
Group 3
    Emma
    Tim
```

## Reading tabular data from a file

When reading tabular data, you can use `readline()` or `readlines()`, and then use `split()` for each line.

### Creating tabular data from a spreadsheet

When using a spreadsheet (e.g. Excel), you can save the file as a CSV file. In a a CSV file, data on one row is separated by a comma or a semicolon.

![csvFile](http://cmc.education/slides/notebookImages/csvFile.png)

## The `with` statement

If you forget to close a file, problems can occur. Also, an error
might prevent you from closing the file. Consider

```python
myfile = open(filename, 'w')
myfile.write('some data')
a = 5/0
myfile.write('some other data')  
myfile.close()
```

The `with` statement helps with this:
```python
with open(filename, 'w') as myfile:
    myfile.write('some data')
    a = 5/0
    myfile.write('some other data') 
```

With this construction, the file is always closed even if an exception occurs. It is a shorthand for a clever `try-except` block.


## File modes (read, write, etc.)

```python
file1 = open('file1.dat', 'r')  # read only
file2 = open('file2.dat', 'r+') # read/write
file3 = open('file3.dat', 'a')  # append
file4 = open('file4.dat', 'r')  # (over-) write
file5 = open('file5.dat', 'wb') # writing to binary file
file6 = open('file5.dat', 'rb') # reading from binary file
```

The modes `'r'`, `'r+'`, `rb`, and `'a'` require that the file exists.

#### File append example

```python
file3 = open('file3.dat', 'a')
file3.write('something new\n')  # note the '\n'
``` 

## Saving numpy arrays

The `read` and write `methods` convert data to strings.
Complex data types (like arrays) cannot be written this way. Numpy provides its own methods for storing arrays.

In [None]:
from numpy import *

a = array([10, 20, 30])
savetxt('outfile.txt', a) # numpy function for saving a to text file

You can also just give the name of the file and store it as a binary file

In [None]:
a = array([10, 20, 30])
save('outfile', a) # saves b in outfile.npy

By using `numpy.savez`, one can save several arrays in one file.

## Reading numpy arrays

In [None]:
a = loadtxt('outfile.txt')
print("a =", a)

b = load('outfile.npy')
print("b =", b)

There are several more options, see the documentation.

## The module `pickle`

We can use the module `pickle` to write to, and read from, binary files. You can pickle (almost) any Python object.

#### Pickle `dump` example

We open the file for writing, in binary mode, by using `'wb'`.

In [None]:
import pickle
arr = array([1, 2, 3])
number = 367
with open('mydata.dat', 'wb') as myfile:
    pickle.dump(arr, myfile)
    pickle.dump(number, myfile)

#### Pickle `load` example

We open the file for reading, in binary mode, by using `'rb'`.

In [None]:
with open('mydata.dat', 'rb') as myfile:
    arr = pickle.load(myfile)
    number = pickle.load(myfile)
print(f"arr = {arr}, number = {number}")

You must read (`load`) the data in the same order as you wrote (`dump`) it.

# More about plotting

## Backends using Jupyter Notebook

* `%matplotlib inline`   will give you static images
* `%matplotlib notebook` will give you interactive plots

Run **one** of the commands **once** during a session.

In [None]:
from matplotlib.pyplot import *
%matplotlib inline

## Backends using Spyder

Choose <code>Inline</code> (static) or <code>Automatic</code> (interactive). Then restart the kernel.

![inlineOrAutomatic](http://cmc.education/slides/notebookImages/inlineOrAutomatic.png)

## GUI and event driven programming

The general-purpose package `Tkinter` can be used.

We are instead going to use `matplotlib.widgets` (next lecture).

For interactive programs choose:

* `Automatic` in Spyder
* `%matplotlib notebook` in Jupyter Notebook

## State machine or object-oriented programming (OOP)

`matplotlib.pyplot` can be used with command-like functions.

* Various states are preserved.
* A state holds information about <span class=alert>current figure</span> and <span class=alert>current axes</span>.
* Under the hood, OOP is used.

`matplotlib.pyplot` can be also be used for OOP.
* Instead of functions, methods of objects are used.

For event driven programs, OOP **should** be used.

## State machine - two figures
The function calls `plot`, can be seen as commands. The same command results in different outcomes depending on the state.

In [None]:
x1 = linspace(-2*pi, 2*pi, 100)
x2 = linspace(0, 2, 10)

plot(x1, sin(x1), 'r') # a figure is created
plot(x1, cos(x1), 'b--')
title('My first figure')
 
figure() # a new figure is created 

plot(x2, sqrt(x2), 'go')
plot(x2, x2**2, 'y^')
title('My second figure')

## State machine - two subplots in one figure
The plots are made in a subplot of the <span class=alert>current axes</span> in the <span class=alert>current figure</span>.

In [None]:
x1 = linspace(-2*pi, 2*pi, 100)
x2 = linspace(0, 2, 10)

subplot(211) # the first subplot using a 2 rows and 1 column grid
plot(x1, sin(x1), 'r')
plot(x1, cos(x1), 'b--')
title('My first subplot')
 
subplot(212) # the second subplot using a 2 rows and 1 column grid
plot(x2, sqrt(x2), 'go')
plot(x2, x2**2, 'y^') 
title('My second subplot') 

## matplotlib as an object-oriented interface

For event driven programs you will need references to objects.

![figAndAx](http://cmc.education/slides/notebookImages/figAndAx.png)

## OOP - getting references to objects

#### Use return values

```python
fig = figure()
ax = subplot(111)
```

#### Get current figure/axes

```python
fig = gcf()
ax = gca()
```


## OOP - getting references to objects (cont)

#### Use `subplots`

Only one axes object:

```python
fig, ax = subplots()
``` 

An array of axes objects:
```python
fig, (ax1, ax2) = subplots(nrows = 1, ncols = 2)
``` 

Another array of axes objects:
An array of axes objects:
```python
fig, ((ax1, ax2), (ax3, ax4)) = subplots(nrows = 2, ncols = 2)
``` 

## Comparison of basic plotting

### State machine - using functions

In [None]:
x = linspace(-2*pi, 2*pi, 100)

plot(x, sin(x/2), label = 'sine')
plot(x, cos(x/2), label = '$\cos(x)$')
legend(loc = 'lower center', fontsize = 'small')

### OOP - using methods

In [None]:
x = linspace(-2*pi, 2*pi, 100)

fig, ax = subplots()
ax.plot(x, sin(x/2), label = 'sine')
ax.plot(x, cos(x/2), label = '$\cos(x)$')
ax.legend(loc = 'lower center', fontsize = 'small')

## OOP - the Figure class

Top level container for all plot elements.

Given 

```python
fig, ax = subplots()
# plotting code
```

the figure can be saved to a file:

```python
fig.figsave("my_figure.png")
# fig.canvas.get_supported_filetypes()
``` 

## OOP - the Axes class

<span style = "font-size: 80%">(Image from https://matplotlib.org/gallery/showcase/anatomy.html) </span>
  
<center><img src="https://matplotlib.org/_images/sphx_glr_anatomy_001.png" title = "anatomy" style = "width: 60%"></center>

## Axes objects &hyphen; Coordinate axes

We can set the limits and add labels.

In [None]:
fig, ax = subplots(figsize=(8, 3))  # setting figsize in inches
x =linspace(-2*pi, 2*pi, 100)
ax.plot(x, sin(x))

ax.set_xlim(-3*pi, 3*pi)
ax.set_ylim(-1.5, 1.5)
ax.set_xlabel('x')
ax.set_ylabel('sin(x)')

## Axes objects &hyphen; Ticks and ticklabels

In [None]:
fig, ax = subplots(figsize=(8, 3))
x =linspace(-2*pi, 2*pi, 100)
ax.plot(x, sin(x))

ax.set_xticks([-pi, 0, pi])
ax.set_xticklabels(['$-\pi$', '$0$', '$\pi$'], fontsize = 16)  # using LaTeX

# set minor ticks
ax.set_xticks([-pi*1.5, -pi*0.5, pi*0.5, pi*1.5], minor = True)

# use empty array to hide ticks
ax.set_yticks([])

## Axes objects &hyphen; Spines

By default four **spines** are rendered at the boundary of a plot. We can change the position of spines, and hide them.

In [None]:
fig, ax = subplots(figsize=(8, 3))
x =linspace(-2*pi, 2*pi, 100)
ax.plot(x, sin(x))

ax.spines['left'].set_position('zero')
ax.spines['right'].set_color('none')      # hide
ax.spines['bottom'].set_position('zero')
ax.spines['top'].set_color('none')        # hide

## Line plots

The line plots are stored in the list `ax.lines`.

In [None]:
fig, ax = subplots(figsize = (8, 3))
x =linspace(-2*pi, 2*pi, 100)

ax.plot(x, sin(x), 'r--', label = "sine")
ax.plot(x, cos(x), 'b', label = "cosine")

print("The number of line plots is", len(ax.lines))
print("Each line plot is of the type", type(ax.lines[0]))

## Line plots (cont)

We can use getter methods.

In [None]:
print("The linestyle of the first line plot is", ax.lines[0].get_linestyle())
print("The color of the last line plot is", ax.lines[-1].get_color())

We can use setter methods.

In [None]:
ax.lines[0].set_linewidth(5)

We can get and set the `xdata` and `ydata` of a line plot.

In [None]:
print("The first y-value of the first plot is ", ax.lines[0].get_ydata()[0])

## Line plots - getting a reference to a line plot

Rather than referencing a line plot as `ax.lines[0]`, we can make variables.

The plot method returns a list of line plots. In many cases that list has only one element.

```python
#unpacking a list of one element
li_sin, = ax.plot(x, sin(x))
li_cos, = ax.plot(x, cos(x))
``` 

We can make several line plots at once:

```python
#unpacking a list of two elements
li_sin, li_cos = ax.plot(x, sin(x), x, cos(x))
``` 

## Some setter methods for line plots

In [None]:
x = linspace(-2*pi, 2*pi, 100)
fig, ax = subplots(figsize=(8, 4))
li_sin, li_cos = ax.plot(x, sin(x), x, cos(x))

li_sin.set_label('sin(x)')
li_cos.set_label('cos(x)')
li_sin.set_linewidth(5)
li_cos.set_color('red')
li_cos.set_linestyle('--')
ax.legend()   # we use legend as an axis method, not as a command

## Annotation

The arrow is given properties using a dictionary.

In [None]:
fig, ax = subplots(figsize = (8, 3))
x =linspace(-2*pi, 2*pi, 100)
ax.plot(x, sin(x))

# we need space above the curve
ax.set_ylim([-1.2, 2])

ax.annotate("local max", xy = (pi/2, 1), xytext = (pi, 1.8),
            arrowprops = {'facecolor': 'green', 'shrink': 0.1, 'width': 2})

## Fill the area between curves

In [None]:
fig, (ax1, ax2) = subplots(nrows = 1, ncols = 2, figsize = (15, 3))
x =linspace(-2*pi, 2*pi, 100)
y1 = sin(x)
y2 = cos(x)

ax1.plot(x, y1, 'r')
ax1.plot(x, y2, 'b')
ax1.fill_between(x, y1, y2, color = 'green', alpha = 0.3)

# specify fill condition using where
ax2.plot(x, y1, 'r')
ax2.plot(x, y2, 'b')
ax2.fill_between(x, y1, y2, where = y1 > y2, color = 'red', alpha = 0.3)