[< __INTRO MODULE 5__](./0.Introduction.ipynb)

---

# Index:

- [What is a CSV](#what-is-a-csv)
- [Reading data from a CSV](#reading-data-from-a-csv)
    - [`reader`](#reader)
    - [`DictReader`](#dictreader)
- [Saving data to a CSV](#saving-data-to-a-csv)
    - [`writerow`](#writerow)
    - [Parameters of `writerow`](#parameters-of-writerow)
    - [`DictWriter`](#dictwriter)

---

## What is a CSV

CSV stands for Comma Separated Values. It is __one of the most popular formats for storing information__ from different programs by comma separation.

> NOTE: There are different databases that make use of this system and the popular data processing program EXCEL.

The structure of these files is as follows:
```csv
header1, header2
value1_H1, value1_H2
value2_H1, value2_H2
value3_H1, value3_H2
value4_H1, value4_H2
```

A real example could be the following:
```csv
Name,Phone
mother,222-555-101
father,222-555-102
wife,222-555-103
mother-in-law,222-555-104
```

---

## Reading data from a CSV

During this module we will work with CSV through the python build-in module called `csv`, however, for commercial reasons, it is recommended to use the module called `pandas`.

We could segment the `csv` module into two classes called `reader` and `writer`, but well, we will go into detail about them below.

First of all, the following examples will be created from the file [contacts.csv](./persistance/contacts.csv).

### `reader`

As we can see, `reader` reads the content of a csv file in a structured way, however, we must point out that this reading is not done directly from the file, but from an instance of `open` with which we have the data of the file.

It is important to note that, apart from the file itself, `reader` must receive the `delimiter` parameter indicating how the data will be separated within the CSV (although by default it is already set to `,`, but it could also be a `;`).

Below is an example of reading the file:

In [1]:
import csv

with open('./persistance/contacts.csv', newline='') as file:
    reader = csv.reader(file, delimiter=',')

#### And the data?

`reader` returns, in an iterator, the data organized in lists where each list represents a line of the CSV file itself, therefore, through a loop we could directly access them line by line.

Below is an example where we show the data:

In [5]:
with open('./persistance/contacts.csv', newline='') as file:
    reader = csv.reader(file)  # As you can see in this section reader does not receive the delimiter, but is still able to parse the data (as the data is separated by a ,).
    for row in reader:
        print(row)

['Name', 'Phone']
['mother', '222-555-101']
['father', '222-555-102']
['wife', '222-555-103']
['mother-in-law', '222-555-104']


### `DictReader`

The module also provides another system for reading CSVs in a more organised way, in particular, this is achieved with a class called `DictReader`.

This returns an `OrderedDict` with all the elements in the CSV in dictionary format. This means that, for each row, we can access the value we are interested in using as key the header (column) where it is located.

> NOTE: An `OrderedDict` is a type of Python dictionary which has persistence in the order in which the data is stored, i.e. if you store a = 1 and b = 2 in the file, when you iterate the dictionary in a for loop, the first thing you will receive is a and then b.

DictReader uses `reader` to format the CSV data, so the class will assume that the file delimits the data with a comma. If this is not the case, it will be necessary to send the delimiting symbol as a parameter.

Below is an example where the CSV data seen previously is read:

In [24]:

with open('./persistance/contacts.csv', newline='') as file:
    data = csv.DictReader(file)
    for row in data:
        print(row)

{'Name': 'mother', 'Phone': '222-555-101'}
{'Name': 'father', 'Phone': '222-555-102'}
{'Name': 'wife', 'Phone': '222-555-103'}
{'Name': 'mother-in-law', 'Phone': '222-555-104'}


#### About headers

Note that `DictReader`, when reading the CSV file, assumes that the first row of the CSV will be the headers.

In case the CSV does not have a header, this can be a problem, since we will have no way to access the data, as we will not know the name of the key of each value of the dictionary that we will obtain per row.

An example is shown below:

In [29]:
from io import StringIO

raw_csv = """mother,222-555-101
father,222-555-102
wife,222-555-103
mother-in-law,222-555-104
"""

with StringIO(raw_csv, newline='') as file:
    data = csv.DictReader(file)
    for row in data:
        print(row)


{'mother': 'father', '222-555-101': '222-555-102'}
{'mother': 'wife', '222-555-101': '222-555-103'}
{'mother': 'mother-in-law', '222-555-101': '222-555-104'}


As we can see, `mother` and `222-555-101` become the headers for each value, which is wrong, fortunately DictReader allows a parameter called `fieldnames` with which we can indicate the headers to apply to this file without header.

An example is shown below:

In [30]:
with StringIO(raw_csv, newline="") as file:
    data = csv.DictReader(file, fieldnames=["Name", "Phone"])
    for row in data:
        print(row)



{'Name': 'mother', 'Phone': '222-555-101'}
{'Name': 'father', 'Phone': '222-555-102'}
{'Name': 'wife', 'Phone': '222-555-103'}
{'Name': 'mother-in-law', 'Phone': '222-555-104'}



---

## Saving data to a CSV



### `writerow`



### Parameters of `writerow`



### `DictWriter`


---

[< __INTRO MODULE 5__](./Introduction.ipynb)