# 08 - File Operations
---
<sup>[Return Home](../README.md)</sup>

## `8.1` File Opening

Opening a file requires **`open(file_path, mode)`**.
- If a `file.txt` is in the _same folder_ as the Jupyter Notebook, the _relative_ path is simply `file.txt`.
- If in the folder is the Jupyter Notebook and _another folder_ `black` that contains `file.txt`, the path is then `black/file.txt`. You can have as many folders need be in the path

To approach this there are 2 methods to **safely access** & do whatever needs to be:

```python
# Method 1 - Opening and Closing
f = file.open("folder_1/file_1.txt", "r")
...
f.close()

# Method 2 - A file instance
with file.open("folder_2/file_2.txt", "w") as f:
    ...

```

| File Mode | ?? | Related Functions |
| --- | --- | --- |
| **`r`** | **Read** from start of file | `f.read()`, `f.readline()` |
| **`w`** | **Write**  a new file (which **overrides** that file if it exists) | `f.write()` |
| **`a`** | **Appends** new content to the back **without** overriding | `f.write()` |
| `r+` | Read & write. Once written, will overwrite. | - |
| `a+` | Read & append. Will not overwrite. | - |


## `8.2` - Reading and Processing

As long as the file contains newlines ("\\n"), these pestering things will appear everyone in your data.
| Reading type | ?? |
| --- | --- |
| `s = f.read()` | Dumps the entire file as a **str** (with all those newlines).<br>Generally **NOT recommended**|
| `for line in f:` | Personal favourite |
| `line = f.readline()` | When called, returns the next line as a str |

In [None]:
def READING_A_FILE():

    # METHOD 1 - f.read()
    with open("1_to_5.txt", "r") as f:
        content = f.read()       # "1\n\2\n3\n4\n5\n"
    
    datas = content.split("\n")
    # ['1', '2', '3', '4', '5']


    # METHOD 2 - for line in f
    with open("1_to_5.txt", "r") as f:
        datas = []
        for line in f:
            # line.strip() strips off any "\n" and " "
            datas.append(line.strip())
    # ['1', '2', '3', '4', '5']


    # METHOD 3 - f.readline()
    # Continues to read a line until EOF
    with open("1_to_5.txt", "r") as f:
        datas = []
        line = f.readline()
        while line:
            datas.append(line.strip())
            line = f.readline()
    
    # ['1', '2', '3', '4', '5']

Then each line can be processed this way
| Function | ?? |
| -- | -- |
| `s = s.strip()` | **Cleans up** any leading and trailing **whitespaces**, incl. newlines `\n` and spaces  ` ` |
| `s = s.strip(XX)` | Cleans up leading & trailing `"XX"` from string, if any |
| `lst = s.split(XX)` | Returns a **list** of strs **separated** by XX.<br>Default separator is spaces |
| `",".join(lst)` | Conjoins a **list of strs** into a str separated by `,` here.<br>Of course the separator can be `""`, which effectively converts list to str.|

In [2]:
# Demonstrating strip and split

strip = "    \n   x yz \n \n ".strip()
split_A = "1 2 3 4 5".split()
split_B = "1,2,3,4,5,".split(",")

print(strip)        # "x yz"
print(split_A)      # ['1', '2', '3', '4', '5']
print(split_B)      # ['1', '2', '3', '4', '5', '']

x yz
['1', '2', '3', '4', '5']
['1', '2', '3', '4', '5', '']


Of course, no one is stopping you from _pre-processing_ data before reading (like I do). <br> You can write processed data by doing `f.write(line)` granted you are on write/append mode.
<hr>

### Application on Reading, Processing and Writing

Given a "messy.csv" file that resembles a table of data, with headers & data as follows,
1. Print the list of data
2. **Clean up** the file by writing a new file at "cleaned.csv".

```py
 Name     , Class, Score
Xiao Ming,   20J13,89
Xiao Hong  , 20J13,76
Xiao Qiang, 20J12, 56
```

In [None]:
def FULL():
    with open("messy.csv", "r") as f:
        datas = []
        line = f.readline()
        while line:
            # This strip() doesn't remove inner spaces within commas
            datas.append(line.strip())
            line = f.readline()

    PROCESSED = []
    for data in datas:
        # Splitting by ","
        lst = []
        for item in data.split(","):
            lst.append(item.strip())
        PROCESSED.append(lst)
    
    # Task 1 - return the list of data
    print(PROCESSED[1:])
    # [['Xiao Ming', '20J13', '89'], ['Xiao Hong', '20J13', '76'], ['Xiao Qiang', '20J12', '56']]

    # Task 2 - write to a new file
    with open("cleaned.csv", "w") as f:
        for data in PROCESSED:
            f.write(",".join(data) + "\n")


# Not recommended but sure
def FULL_not_recommended():
    s = lambda i: i.strip()
    with open("messy.csv", "r") as f:
        datas = [
            [s(i) for i in s(line).split(",")]
            for line in f
        ]
    with open("cleaned.csv", "w") as f:
        [f.write(",".join(d) + "\n") for d in datas]


# Esoteric enough for you to not even try this.
def FULL_esoteric():
    with open("messy.csv") as f:
        with open("cleaned.csv", "w") as o:
            [o.write(",".join([i.strip() for i in line.strip().split(",")]) + "\n") for line in f]