# 7PAVITPR: Introduction to Statistical Programming
# Python practical 9

_Angus Roberts<br/>
Department of Biostatistics and Health Informatics<br/>
Institute of Psychiatry, Psychology and Neuroscience<br/>
King's College London<br/>_

# Simple file handling

__NOTE__ This notebook assumes the files _numbers.txt_ and _hba1c.csv_ are in the same directory as the notebook. If you are using colab, you will need to upload the files to your colab session. If you are using Jupyter Notebook, you will need to make sure the files are in the same directory as the notebook.

The Standard Library has built in file handling. We will look at the basics of reading files here. There are lots of Python packages that also incorporate file handling and reading. We will look at one of these later.

Below is an example - take a look and run it.




In [None]:
# Open a file, and iterate over its lines, printing them

with open('numbers.txt') as file:
    for line in file:
        print(line.rstrip())
    

The are several parts to this:

- `with` - keyword indicating a block of code that will be executed as a whole, with any cleaning up handled by Python. For files, this means that the file will be closed cleanly once it is finished with.
- `open(filepath)` - opens the file, for reading
- `as` - introduces the variable name for the file
- __name__ - a name for the file
- `:` - next lines are an indented block. Once the block is completed, the file will be closed
- `for line in file:` - files are iterable!

(Note: It is possible use `open()` without the `with...as...` statement, but you would have to deal with closing the file yourself, and it is not recommended)

## <font color=green>💬 Discussion point</font>

In the above code, what is `rstrip()` doing? Why is it needed?


Files have other methods, for example:

In [None]:
# Open a file, read its lines in to a list

with open('numbers.txt') as file:
    lines = file.readlines()
    
print(len(lines))
        

## <font color=green>💬 Discussion point</font>

- The above two pieces of code show different methods for reading lines from a file. How do you think they differ? When might you use one and not the other?
- Files also have a method `read()` which reads in the whole file, in to a string for text files. Why might you use this method?


## <font color=green>❓ Question</font>

The file hba1c.csv, in the same directory as this notebook, contains HbA1C results for 50 patients. Each line has a patient ID followed by a comma followed by a result. For example:

- PT1234,5.6

which means patient id PT1234 has a HbA1C result of 5.6

Create a dictionary mapping each patient ID to their result. Show that your code works by performing some lookups and tests on the dictionary (code provided for this below).

(__Hint:__ you will need to partition or split the line at the comma. There are useful methods of string that will help with this.)

## <font color=green>⌨️ Your answer</font>

In [None]:
# Write your answer here

# Create an empty dictionary to hold your results
hba1c = dict()

# Open the file, using "with" 
with open('hba1c.csv') as file:
    lines = file.readlines()
    
# Iterate over the lines, putting the contents in the dictionary
# Don't forget to remove newlines,
# and to convert the string results to floats
for l in lines:
    parts = l.rsplit(',')
    hba1c[parts[0]] = float(parts[1])
             
# Some tests - these should all print True
print( True == (len(hba1c) == 50) )
print( True == (hba1c['pt1252'] == 7.68) )
print( True == (hba1c['pt1273'] == 3.47) )
    
    

