# Notes 7 - Files and Modules

## 1) Files

So far the only interaction the programs we've wrote has is taking user input, and anything calculated while running is outputted via the 'console'. However, more complex programs will usually require either the loading or saving of some data in a more permanent way. For example, a program which is designed to anaylse test results will have to load a file containing all the results, perform some anaylsis and then save the calculated results to another file. Therefore being able to use files is an important skill.


### 1.1) Opening

Opening a file in Python is a lot simpler than many languages. The builtin function `open` is used, it takes a parameter `filename` and returns a file object, you can then use this object to interact with the specified file. The syntax to open a text file called `names` would be:

```python
file = open("names.txt")
```

We are mainly going to be using text files in this course, however you can handle other file formats in a similar way. All text files are read in as strings.

In [1]:
file = open("names.txt")

In [19]:
# file object is a TextIOWrapper which allows you to interact with file
print(type(file))

<class '_io.TextIOWrapper'>


When opening files you can choose to open them in several possible 'modes'. Each mode allows you to interact with a file differently, so choose depending on what you intend to do. The modes are:

- `"r"` - Read, is the default value. Opens a file in read only mode. Errors if specified file doesn't exist.

- `"w"` - Write. Opens a file for writing, clears everything currently in file. Creates file if does not exist.

- `"a"` - Append. Opens a file for writing, but appends to existing text. Creates file if does not exist.


In [6]:
# can read the file
file = open("names.txt", "r")

# can write to file, deletes existing initially
file = open("names.txt", "w")
    
# can write to file, appends to existing
file = open("names.txt", "a")

You can also use the keyword `with` when opening files, this opens the file as an object and assigns it to a variable also. The difference is that `with` begins a block of code (indents it) and the file object is only open in this block. The benefit of this can be seen in section 1.4. 

In [None]:
# open file assign to variable 'file'
with open("names.txt", "w") as file:
    # interact with file here
    pass

### 1.2) Reading

Once you have opened a file in reading mode (`"r"`), you can access the content inside the file in three different ways. One option is to use the method `read`, this will return all the content of the file at once as a string. 

In [21]:
with open("names.txt") as file:
    print(file.read())

apple
orange
dog



In [20]:
# using split converts file into a list, each line as separate item
with open("names.txt") as file:
    content = file.read().split()
    print(content)

['apple', 'orange', 'dog']


The second option allows you to read in a file a line at a time, instead of reading the whole file in at once and storing in memory. This can be useful with large datasets as you won't have to store a huge amount of data in memory. It works by iterating through the file object in a for loop, as if it were a list where each line is an element.

In [22]:
with open("names.txt") as file:
    for line in file:
        print(line)

apple

orange

dog



In [None]:
with open("names.txt") as file:
    for line in file:
        # strip removes the newline character from each line
        print(line.strip())

The third option is to use the method `readline`. `readline` returns the next available line each time it is called, returning an empty string when the lines finish.

In [None]:
with open("names.txt") as file:
    print(file.readline())

In [None]:
# would not recommend
with open("names.txt") as file:
    while True:
        line = file.readline()
        if not line:
            break
        print(line.strip())

### 1.3) Writing

Once you have opened a file in a writing mode (`"w"`/`"a"`/`"r+"`), you can write to the file using the method `write`, which takes the string to write as a parameter.

In [None]:
# clears file, writes "10" to it
with open("integers.txt", "w") as file:
    file.write(str(10))  # has to be string

In [None]:
# appends "100" to file
with open("integers.txt", "a") as file:
    file.write("100")

If you run the following block of code it will print out the contents of the file. As you can see, both items written to the file are on the same line, forming the line `10100`. This is because `write` doesn't automatically insert a new line each time you write. If you want the content to be on a newline, you have to write a special 'newline' character to the file, the newline character is: `\n`.

In [None]:
with open("integers.txt", "r") as file:
    print(file.read())

Therefore if we add a newline character (`\n`) to the end of our string, the next write will be on a new line.

In [None]:
scores = [100, 43, 34, 94, 23, 75]

with open("integers.txt", "w") as file:
    for s in scores:
        file.write(str(s) + "\n")

In [None]:
with open("integers.txt", "r") as file:
    print(file.read())

### 1.4) Closing

When you open a file in Python, the file object is created and the referenced file 'locked'. It being 'locked' means that other programs, or yours with a different mode, cannot access the file. In order to make it accessible again, you need to close the file once you are finished with it. If you don't close it will remain open until your program terminates, impacting performance also.

If you open a file by assigning it to a variable (`file = open(filename)`) you can close it by using the method `file.close()`. This should be done as soon as you no longer need the file open.

In [None]:
file = open("names.txt")
# do operations
file.close()

The benefit of opening a file using the `with` keyword is that it handles closing the file when the block of code completes, therefore you don't need to do anything.

In [None]:
with open("names.txt") as file:
    # do operations
    pass
# file is automatically closed
file.read()  # will error

***

## 2) Using modules

We have made use of a lot of builtin functions, however in Python there are many more functions available in 'modules'. Modules are libraries which contain a related collection of functions, many of these come packaged with the Python installation. A list of modules available by default can be found [here](https://docs.python.org/3/py-modindex.html).

### 2.1) Importing

In order to use a module and its functions you **must** first import it into your program. It is good practice to import any required modules at the start of your program.

In [9]:
# import the module named math
import math

You can then use the functions defined in that module using the syntax `module.function()`, replacing module and function with the desired option.

In [10]:
x = math.pow(5, 2)
print(x)

25.0


### 2.2) Using from

Each module contains a set of functions, it is possible to import single functions by name using the keyword `from`. This allows you to reference the function without using the module name first. This is useful as it can make your code easier to write if you are using a lot of module functions repeatedly, however you need to be careful to avoid having local functions with the same name otherwise they will be overwritten. 

The syntax to import a function `pow` from module `math` would be:

```python
from math import pow
```

In [None]:
from math import pow

x = pow(5, 2)
print(x)

In [None]:
# import mulitple from same module
from statistics import mean, median

### 2.3) Renaming modules

When importing modules you can also rename them using the keyword `as`. This allows you to refer to them using a different name in your program, this is sometimes done to shorten the name if you are repeatedly using functions from a module.

The syntax to rename a module `math` to `m` would be:
```python
import math as m
```

In [None]:
import math as m

x = m.pow(5, 2)
print(x)

***

## 3) math

The `math` module contains functions which perform mathematic operations. The full list of math functions can be found [here](https://docs.python.org/3/library/math.html#module-math).


### 3.1) sqrt

`sqrt(x)` returns the square root of `x`. 

In [11]:
import math

x = math.sqrt(25)
print(x)

5.0


### 3.2) pow

`pow(x, y)` returns `x` raised to the power `y`.

In [None]:
x = math.pow(5, 2)
print(x)

### 3.3) Other useful functions

`math` contains lots more functions which are useful when performing more complex mathematics. Such as `cos` and `sin` for cosine and sine, `log` for logarithm and `pi` to get the constant pi.

***

## 4) statistics

The `statistics` provides functions for calculating mathematical statistics of numeric data. The full list of statistics functions can be found [here](https://docs.python.org/3/library/statistics.html#module-statistics).


### 4.1) mean

`mean(data)` returns the arithmetic mean of the numeric collection `data`. The arithmetic mean is commonly refered to as 'the average', it is the sum of the data divided by the number of data points.

In [13]:
import statistics

scores = [100, 43, 34, 94, 23, 75]

x = statistics.mean(scores)
print(x)

61.5


### 4.2) median

`median(data)` returns the median of the numeric collection `data`. The median is the 'middle' value of the data when it is sorted.

In [None]:
scores = [100, 43, 34, 94, 23, 75]

x = statistics.median(scores)
print(x)

### 4.3) mode

`mode(data)` returns the mode of collection `data`. The mode is the single most common data point in the collection. If there are multiple elements with the same frequency, it returns the first value encountered.

In [14]:
scores = [100, 90, 90, 90, 45, 45, 20, 10]

x = statistics.mode(scores)
print(x)

90


***

## 5) random

The `random` module provides functions for generating random numbers of various distributions and choosing random elements from sequences. The full list of random functions can be found [here](https://docs.python.org/3/library/random.html#module-random).


### 5.1) randint

`randint(a, b)` returns a random integer `N`, such that `a <= N <= b`.

In [15]:
import random

x = random.randint(50, 100)
print(x)

89


### 5.2) randrange

`randrange` returns a randomly selected element from `range(start, stop, step)`. Has the same parameter options as `range`:

In [None]:
# specify just a stop
x = random.randrange(100)
print(x)

In [None]:
# specify a start and stop
x = random.randrange(50, 100)  # same as randint
print(x)

In [None]:
# specify a start, stop and step
x = random.randrange(50, 100, 5)
print(x)

### 5.3) random

`random()` returns a random float within the range [0.0, 1.0].

In [None]:
x = random.random()
print(x)

### 5.4) uniform

`uniform(a, b)` returns a float `N` such that `a <= N <= b` when `a <= b`. `a` and `b` order doesn't matter, will return a random float within those values.

In [None]:
x = random.uniform(10, 20)
print(x)

### 5.5) choice

`choice(seq)` returns a random element from the non-empty sequence `seq`, such as a list. Will cause an error if `seq` is empty.

In [None]:
scores = [100, 43, 34, 94, 23, 75]

x = random.choice(scores)
print(x)

### 5.6) sample

`sample(population, k)` returns a list of length `k` with unique elements chosen from the population sequence or set.

In [None]:
scores = [100, 43, 34, 94, 23, 75]

x = random.sample(scores, 3)
print(x)

### 5.7) shuffle

`shuffle(seq)` shuffles the sequence `seq` in place, meaning it randomises the order of the elements.

In [None]:
scores = [100, 43, 34, 94, 23, 75]

random.shuffle(scores)
print(scores)

***

## 7) csv

As datasets become more complex you will often want to have multiple pieces of related data on a single line, however reading and writing data like this with a `file` object is difficult. This is where comma separated (csv) files are useful. In a comma separted file, each line contains a row of data separated by commas:

```
bob,21,male
ella,22,female
fred,19,male

```

In Python there is a `csv` module which makes handling csv files easier. The full documentation for the csv module can be found [here](https://docs.python.org/3/library/csv.html#module-csv).


### 7.1) File handling

Opening and closing csv files is performed the same way as with standard text files.

In [16]:
with open("people.csv") as csvfile:
    # perform functions
    pass

### 7.2) reader

`reader(csvfile)` is used to create an object which can be used to iterate over the lines in the given `csvfile`. The csv `reader` will split each line into a list, separating it using the commas.

In [17]:
import csv

with open("people.csv") as csvfile:
    csvreader = csv.reader(csvfile)
    for row in csvreader:
        print(row)

['bob', '21', 'male']
['ella', '22', 'female']
['fred', '19', 'male']
['sam', '40', 'male']
['olly', '21', 'male']
['amy', '23', 'female']


If you want to use the elements of each line you can unpack the list in several ways:

In [None]:
# using list indices
with open("people.csv") as csvfile:
    csvreader = csv.reader(csvfile)
    for row in csvreader:
        print("name: " + row[0] + ", age: " + row[1] + ", gender: " + row[2])

In [None]:
# unpacking list and assigning to variables
with open("people.csv") as csvfile:
    csvreader = csv.reader(csvfile)
    for row in csvreader:
        name, age, gender = row
        print("name: " + name + ", age: " + age + ", gender: " + gender)

### 7.3) writer

`writer(csvfile)` is used to create an object which can convert the user's data into strings that can be written to a file.

In [None]:
import csv

with open("modules.csv", "w") as csvfile:
    csvwriter = csv.writer(csvfile)
    csvwriter.writerow(["bob", "3"])
    csvwriter.writerow(["ella", "19"])    

### 7.4) dictreader
`DictReader(csvfile)` creates an object that allows you to map the data to a dictionary.


In [None]:
import csv

with open("people_dict.csv") as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for row in csv_reader:
        print(row['age'])                  


### 7.5) dictwriter
`DictWriter` allows you to write dictionaries into a `csv` file.


In [None]:
with open('dict_names_write.csv', 'w', newline='') as csvfile:
    fieldnames = ['first_name', 'last_name']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
    writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
    writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})
