# Computer Programming and Algorithms

## Week 8.3: Modules for reading files

* * *

<img src="https://github.com/engmaths/EMAT10007_2023/blob/main/weekly_content/img/full-colour-logo-UoB.png?raw=true" width="20%">
</p>

# Aims

In this video we will:

* Read a CSV file using a computer program
* Introduce the concept of a module
* Introduce the `csv` module
* Use the `csv` module to read data stored in a CSV files
* Use the `csv` module to save data to a CSV files

# CSV file 
- CSV stands for Comma Separated Value
- Text file that stores tabular data.
- Each line in a CSV file represents a row in the table.
- The values within each row are separated by a specific delimiter, usually a comma (,).

# Reading a CSV file

Text files (containing human-readable data) can be easily read using a computer program



In [89]:
with open('temperature.csv') as file:
    print(file.read())

Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
6,5,8,10,13,16,18,18,15,13,8,7



The data in the file is a set of numerical temperature values corresponding to the months of the year. 

It's likely we will want to use the individual numerical values. 

For example, ee may want to perform statistical operations, like finding the mean. 

```
Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
6,5,8,10,13,16,18,18,15,13,8,7
```

We can cast the file object as a list. 

Each element of the list is a new line of the file

What type of data is each line stored as? 

In [376]:
with open('temperature.csv') as file:
    file = list(file)
    print(file[1])
    print(type(file[1]))

6,5,8,10,13,16,18,18,15,13,8,7

<class 'str'>


The individual numerical values, seperated by commas, are stored as a single string. 

We can transform this string to numerical data with several operations:
1. Convert file object to list
2. Access line of file conaining numerical values
3. Convert string to list of values with commas removed
4. Remove the new line character from any strings in the list
5. Cast all strings as integer values

In [379]:
with open('temperature.csv') as file:
    file = list(file)                   
    line = file[1]            
    values = line.split(',')   
    values = [v.replace('\n', '') for v in values] 
    values = [int(v) for v in values]              
    print(values)

mean = sum(values) / len(values)
print(mean)

[6, 5, 8, 10, 13, 16, 18, 18, 15, 13, 8, 7]
11.416666666666666


This is a lot of work to get the data in a useable form!

We can import a *module* to make it easier to read data from this file type.




# What is a module?

A module is a source code file that contains some additional variables and functions.

As we are using Python, the source code file is a .py file. 

To use a module:
- import the module using the keyword `import` followed by the module name (`import` statements should appear at the beginning of your program)
- prefix any functions from the module with the module name followed by a dot/point (.)

# The `csv` module

Certain modules were installed on your computer automatically when Python was installed

These include a module called `csv` for handling CSV files 

All functions from the `csv` module are listed here: https://docs.python.org/3/library/csv.html

# Reading a csv file with the `csv` module

In [441]:
import csv

with open('temperature.csv') as file:
    
    file = csv.reader(file)
    
    for value in file:
        print('Line:', value)

Line: ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
Line: ['6', '5', '8', '10', '13', '16', '18', '18', '15', '13', '8', '7']


The `reader` function is used, (prefixed with module name `csv`) after opening the file. 

Notice the difference in how the data is represented in the output of the function. 

Each line is now a series of comma seperated string values.

This avoids the need to seperate the values manually. 

The `reader` function returns a reader object which, like a file object, is *iterable* but not *subscriptable*.   

The process of converting the imported data to numerical data requires fewer and less complicated steps than before. 

1. Convert file object to reader object 
2. Convert reader object to list (reader object is iterable but not subscriptable)
3. Access line of file containing numerical values
4. Cast all strings as integer values 

In [424]:
import csv

with open('temperature.csv') as file:
    file = csv.reader(file, delimiter=',', dialect='excel')
    file = list(file)                   
    values = file[1]            
    values = [int(v) for v in values]              
    print(values)

[6, 5, 8, 10, 13, 16, 18, 18, 15, 13, 8, 7]


This example illustrates the purpose of importing a module. 

By importing a module, we can access additional functions, to handle tasks that we are likely to encounter.

This can avoid writing longer and more complicated code from scratch.  

Instructions for how to use the functions from a module are given in the documentation for the module. 

`csv` module documentation: https://docs.python.org/3/library/csv.html

Let's look at the documentation for `csv.reader`

```
csv.reader(csvfile, dialect='excel', **fmtparams)
```

The function takes:

__Positional argument__: `csvfile`
- Must be entered between parentheses when we use the function

__Optional arguments__: `dialect`, `**fmtparams`
- Identified by `=` or `**` sign
- Only entered between parentheses when using a function, if a different value required

We are also told:  
*If csvfile is a file object, it should be opened with newline=''.*

This is so that newline characters are not altered in the imported file which is particularly important where the file was created on a different operating system. 

`with open('temperature.csv', newline='') as file:`

# Saving data to a CSV file using the `csv` module

To save data to a file, rather than read it, a second argument, mode `w` should be given when opening the file:

```python
with open('data.csv', 'w') as file:
```

- Creates a new file in the current working directory
- Indicates that the file is to be written to, rather than read

The `writer` function is used, (prefixed with module name `csv`) after opening the file. 

The `writer` function returns a writer object

(From `csv` module documentation https://docs.python.org/3/library/csv.html):<br>
*If csvfile is a file object, it should be opened with newline=''.*

In [454]:
with open('data.csv', 'w', newline = '') as file:
    writer = csv.writer(file)

Like all objects, the writer object type has a set of specific properties and behaviours.

The section 'Writer Objects' in the `csv module documentation lists methids that can be used with writer objects

 <br>Writer objects have the following methods:

`csvwriter.writerow(row)` <br>Write the row parameter to the writer’s file object.<br>A row must be an iterable (e.g. list, tuple) of strings or numbers for writer objects.

In [480]:
x = list(range(10))

with open('x_data.csv', 'w', newline = '') as file:
    writer = csv.writer(file)
    writer.writerow(x)

`csvwriter.writerows(rows)`<br>Write all elements in rows (an iterable of row objects as described above)

In [483]:
x = list(range(10))
y = [i**2 for i in x]

with open('xy_data.csv', 'w', newline = '') as file:
    writer = csv.writer(file)
    writer.writerows([x,y])

# Summary

- By importing a module, we can access additional functions, to handle tasks that we are likely to encounter.
- This can avoid writing longer and more complicated code from scratch.
- Details of the functions associated with a module can be found in the module documentation
- The `csv` module provides functions for handling data using CSV files
- There are many other modules for easy handling of different file types e.g.:
    - `json`: JSON files
    - `zipfile`: ZIP files
    - `PyPDF2`: PDF files
    - `pyxlsb`, `openpyxl`: Excel files
    - `Pillow`: image files

### Need to see some more examples? 
https://python-adv-web-apps.readthedocs.io/en/latest/csv.html
<br>https://www.geeksforgeeks.org/working-csv-files-python/
<br>https://realpython.com/python-csv/

### Want some more advanced information?
https://docs.python.org/3/tutorial/modules.html