<span id="064cf737-f00c-4590-a664-55cebafdaac6"></span>

# Python tutorial 5

## Working with files

In our work, Python is useful mainly used as an analysis tool. To
achieve this we need to learn how to open, read, analyse and write data.

Any file we open we can do one of three things

1.  `write` - we write data to a file, we use the 'w' to denote this
2.  `read` - We read data from a file, we use the 'r' to denote this
3.  `append` - We add more data to the end of a file, we use the 'a' to
    denote this

These three 'modes' have restrictions. For example, opening a file in
`read` mode will prevent you from overwriting any of the data.
Alternatively, opening it in `write` mode will overwrite any of your
data so be careful when you use it.

Lets open a file.

## Opening a file

In [None]:
text_file = "tutorial5-input.txt"

file = open(text_file, "r") # r for read!

content = file.read()
print(content)


<span id="7513b2c7-f089-4a53-bc34-51c6a82be469"></span> When we open a
file in Python, imagine that we're taking a book off a shelf and opening
it. Python will leave the book open unless we ask it to close it. So in
the above code, we've opened a file, but not closed it. This is usually
not a big problem, but can lead to issues like slowing your code down
because Python is keeping track of all these open files which you don't
need anymore. To close a file its as simple as

In [None]:
file.close()


<span id="c4f953e2-15a0-4e7c-965a-7e265e777949"></span> Closing a file
means the book is closed, so we can't read it again. What do you think
will happen below?

In [None]:
file = open(text_file, 'r')
content = file.read()
print(content)
file.close()
content2 = file.read()
print(content2)


<span id="08964dcd-29b6-4228-9a7a-d7a447cc775d"></span> However, a more
common way of opening a file to prevent this kind of problem is to use a
`with` statement

## opening a file with `with`

If you remember our for loops and functions. Python separates chunks of
code out using indents. Keep that in mind for this section

In [None]:
with open("tutorial5-input.txt", 'r') as f:
    content = f.read()

print(content)
content2 = f.read()
print(content2)


<span id="7ef0612b-c64b-45ef-83d9-a7d3e8998f77"></span> So in the above
block, we've opened our file but only in the indented section. So we
opened our file, read the contents, then exited the indented section.
This automatically closes the file and prevents any left-open-file
problems that you might have otherwise. In general, this method is
preferred so you should try and use it. There are some exceptions.

## Writing to a file

Above we were opening a file in `read` mode, which means that we are
only allowed to read to it, not write to it (again like a book). We can
try to write to see what happens. Like reading, the command to write it
`write`

In [None]:
text_file = "tutorial5-input.txt"

with open(text_file, "r") as f:
    f.write("This is a test.\n")





<span id="34cd0866-453c-4b40-991e-cd639d45beb7"></span> Our write fails
because our file is not writable because we opened it in read only mode.
This is very useful in case you have data which you want to make sure
you don't accidently overwrite. Python will not let you write over a
file which you open as read.

In [None]:
text_file_write = "tutorial5-output.txt"
with open(text_file_write, 'w') as f:
    f.write("This is a test")


<span id="99451c3f-a001-41e8-98b3-489d3d284cb2"></span> This ran with no
errors. If we check our new file we should see our text in it

In [None]:
with open(text_file_write, 'r') as f:
    content = f.read()
print(content)


<span id="eaacad02-ce4e-41f6-8a7d-e05b063b587a"></span> What about if we
want to add some more to our file?

In [None]:
with open(text_file_write, 'w') as f:
    f.write('And this is also a test')


<span id="fd6415aa-07f5-43a2-899f-ef00405236d3"></span> What do you
think the content of this file is going to be now?

In [None]:
with open(text_file_write, 'r') as f:
    content = f.read()
print(content)


<span id="128acf78-5943-48b1-97b2-f0f1d39140c9"></span> We've lost our
initial statement! This is expected because we opened our file in
`write` mode both times. When you open a file as `write` its going to
assume you want to start writing over it from the beginning. So always
be careful using `write` because it won't warn you before deleting your
data!

## Opening a file with append

If we want to add another line to our file we can open it in `append`
mode. This is like write but assumes that we want to add to the end of
the file rather than from the start

In [None]:
# Start off resetting our file
text_file_write = "tutorial5-output.txt"
with open(text_file_write, 'w') as f:
    f.write("This is a test\n") # Add a newline at the end for better formatting when appending


with open(text_file_write, 'r') as f:
    content = f.read()

print("File content:\n", content)

# Now let's append to the file

with open(text_file_write, 'a') as f:
    f.write("This is an appended line.")

with open(text_file_write, 'r') as f:
    content = f.read()

print("File content after appending:\n", content)


<span id="95d2a7e9-7b4c-4244-8387-bce05d48acc1"></span> So we've
successfully made a new file, written to it, read from it, and added to
it! There are some other points to keep in mind but this is the basics
in opening and closing files.

## Working with data

Lets open some read data. Some data file are read like the method above.
Others just handle all the opening/closing for you. But, lets work with
opening a matlab file. Github is doesn't allow large files, so I'll give
it to you on a usb stick.

In [None]:
file_path = "/Volumes/tomdrive/OA/Phantom_data/20260211_phantom_MBOldVsNew/reconstructed/tubing_top_recon.mat"
import scipy.io

mat_data = scipy.io.loadmat(file_path)


<span id="47d9cf30-a7ad-4e0a-b380-f6c529eb9167"></span> This is
expected. Can anyone tell me what the error is telling us?

## h5py for Matlab

Newer matlab files (likely those ones that you're used to using)
actually save using a special file format called HDF5. This format is
very good because it is space-efficient and lets you read sections of
the file (rather than having to read the whole thing).

Lets try and use HDF5 to read this matlab file. The best reader for
Python is called h5py. HDF5/Matlab files are read into Python like a
dictionary is. So if you remember, a dictionary has a `key` and a
`value` associated with it. Lets look at what `keys` this file has

In [None]:
import h5py
with h5py.File(file_path, "r") as f:
    for key in f.keys():
        print(key)


<span id="29b2ed56-d1df-4351-8407-c3a1201089f2"></span> It might look
familiar because this is an OA pre-processed data file. It has some
additional parts which we'll ignore, we're interested in `recon_swp`.
Lets check what kind of data it is. If you're exploring ever unsure
about what options you have in python. There is an inbuilt function
called `dir` which lists everything that a variable can do

In [None]:
with h5py.File(file_path, 'r') as f:
    print(dir(f['recon_swp']))


So there is a lot of information there. You can usually ignore the ones
with \_ at the beginning. Thats used as a marker which means that humans
can usually ignore this value, but its important for python.

So lets print this again, but only the ones without the \_

In [None]:
with h5py.File(file_path, 'r') as f:
    dired = dir(f['recon_swp'])
    # remove the ones starting with '__'
    dired = [d for d in dired if not d.startswith('_')]
    print(dired)


So looking at this there are two fields I am interested in. Firstly,
what `type` of data is this, and what `shape` is it? Python already uses
the term `type` for objects, so to know what kind of data is inside of
this object, we can use the `dtype` field.

In [None]:
with h5py.File(file_path, 'r') as f:
    print('shape of data', f['recon_swp'].shape)
    print('type of data', f['recon_swp'].dtype)


Great! This looks like OA data to me. Its 27 (wavelengths x 100 x 200 x
200). Lets read it in and look at it. Now because this h5py format is
used for really big data, even when we read from it, it will not by
default load all our data in. To tell it that we do want that we add the
`[:]` at the end, which basically means "take all of the array". Because
we're reading all this data in, it might take a second or two.

In [None]:
with h5py.File(file_path, 'r') as f:
    data = f['recon_swp'][:]



Lets look at it. Its got 27 wavelengths so lets just select one

In [None]:
import matplotlib.pyplot as plt
wl_index = 0 # which wavelength
fig, ax = plt.subplots(1, 3, figsize=(10, 4))
ax[0].imshow(data[wl_index].max(axis=0), aspect=1)
ax[1].imshow(data[wl_index].max(axis=1), aspect=2)
ax[2].imshow(data[wl_index].max(axis=2), aspect=2)
[i.set_xticks([]) for i in ax]
[i.set_yticks([]) for i in ax]
ax[0].set_title("Transverse")
ax[1].set_title("Coronal")
ax[2].set_title("Transverse")
fig.suptitle(f"Wavelength index:{wl_index+1}")


## Homework

1.  Try and write your own file. Any text file
2.  Try and load a different matlab file. Give it a go to plot the three
    images like we did there