# ICN Programming Course

<p align="center">
    <img width="500" alt="image" src="https://github.com/Lenakeiz/ICN_Programming_Course/blob/main/Images/cog_neuro_logo_blue_png_0.png?raw=true">
</p>

---

# **WEEK 5** - Handling files, exception, a working example with simple plotting

## Working with files

After conducting your experiments results are typically stored in files. For example:

 - A text file (.txt) may contain timestamps of neuronal spikes, each recorded on a new line.

 - A CSV file (.csv) may contain structured data such as subject IDs, ages, conditions, and behavioural accuracy across trials.

 - Specific formats that depends on the information you have recorded (fMRI, EEG, etc)

Being able to read data from files is then the starting step from any analysis pipeline. 

Similarly when running an experiment you will need to write back data into files for saving your analysis outputs—so that you or other researchers can check, share, and build on your work.

### Opening a file

The basic way to open a file in Python is using the `open()` function.

- The **first argument** is the file name (e.g. `"example.txt"`).  
  - This can be a **relative path** (just the file name, if the file is in the same folder as your script).  
  - Or it can be a **full (absolute) path**, for example:  
    - On Windows:  
      ```python
      file = open("C:\\Users\\Andrea\\Documents\\data\\example.txt", "r")
      ```
      *(note the double backslashes)*  
    - On macOS/Linux:  
      ```python
      file = open("/Users/andrea/Documents/data/example.txt", "r")
      ```

- The **second argument** is the *mode*, which tells Python what you want to do with the file.  

#### Common File Modes

| Mode | Meaning | Notes |
|------|---------|-------|
| `'r'` | Read | Default mode. Opens the file for reading. Fails if the file doesn’t exist. |
| `'w'` | Write | Creates a new empty file or overwrites an existing one. |
| `'a'` | Append | Opens a file for writing, but adds content to the end (does not erase existing data). |
| `'x'` | Create | Creates a new file. Fails if the file already exists. |
| `'r+'` | Read and Write | Opens an existing file for both reading and writing. |

In [1]:
file = open("data\example.txt", "r")  # "r" means read mode
content = file.read()
print(content)
file.close()  # Always close the file after use

10 15 20 35 40
55 60 75 90 120
130 145 160 170 185
200 210 225 240 250
270 280 300 310 325
340 355 370 385 400
420 435 450 465 480
500 520 540 560 580
600 620 640 660 680
700 720 740 760 780


### Using a Context Manager (with)

When working with files, it’s important to close them once you are done.

Closing a file frees up system resources and ensures that any changes are properly saved.

If you forget to close a file, you may run into problems such as:  

- **Memory leaks**: the program holds resources longer than needed, although this will not happen when opening files into jupyter notebooks     
- **File locks**: some operating systems prevent other programs from accessing a file while it is open 

To avoid these issues, Python provides a convenient structure called a **context manager**, used with the `with` statement.  

When you open a file inside a `with` block:  
- The file is automatically **closed** as soon as the block finishes, even if an error occurs inside the block.  
- This makes your code shorter, cleaner, and less error-prone compared to manually calling `file.close()`.  


In [2]:
with open("data\example.txt", "r") as file:
    content = file.read()
    print(content)

10 15 20 35 40
55 60 75 90 120
130 145 160 170 185
200 210 225 240 250
270 280 300 310 325
340 355 370 385 400
420 435 450 465 480
500 520 540 560 580
600 620 640 660 680
700 720 740 760 780


### Creating the path

When working with files, it’s important to build paths in a way that works on **any operating system**.  
As mentioned, Windows uses backslashes (`\`) while macOS/Linux use forward slashes (`/`).

If you type the path manually, you might get an error depending on which operating system you are running the notebook.  

To stay operating system independent, use the `os` library:

In [None]:
import os 

path = os.path.join("data", "example.txt")

with open(path, "r") as file:    
    content = file.read()
    print(content)

10 15 20 35 40
55 60 75 90 120
130 145 160 170 185
200 210 225 240 250
270 280 300 310 325
340 355 370 385 400
420 435 450 465 480
500 520 540 560 580
600 620 640 660 680
700 720 740 760 780


### Reading line by line

Sometimes you would have to read a stored data line by line, for preprocessing reasons.
Reading line by line lets you parse and clean each row safely before storing it.

You can do this with a for loop on your file object.

In [None]:
import os 
import numpy as np

file_path = os.path.join("data", "example.txt")

# Start with empty arrays
col1 = np.array([], dtype=int)
col2 = np.array([], dtype=int)
col3 = np.array([], dtype=int)
col4 = np.array([], dtype=int)
col5 = np.array([], dtype=int)

with open(file_path, "r") as f:
    for line in f:
        # removing leading/trailing whitespace and splitting by spaces
        # split() returns a list of strings
        parts = line.strip().split()
        if not parts:
            continue

        a, b, c, d, e = (int(x) for x in parts)

        # Append each value to the corresponding NumPy array
        col1 = np.append(col1, a)
        col2 = np.append(col2, b)
        col3 = np.append(col3, c)
        col4 = np.append(col4, d)
        col5 = np.append(col5, e)

# Quick check
print(col1[:5])  
print(col2[:5])
print("Shapes:", col1.shape, col2.shape, col3.shape, col4.shape, col5.shape)

[ 10  55 130 200 270]
[ 15  60 145 210 280]
Shapes: (10,) (10,) (10,) (10,) (10,)


## Writing to a file

in real experiments you don’t just analyse data once and forget about it.

You often need to:
- **Save results** so you can reuse them later without re-running the entire analysis.  
- **Share data** with collaborators, who sometimes use different programming languages to yours (MATLAB, R, etc).  
- **Document your work**: a file output is a permanent record of what your code produced at that time.  
- **Feed into pipelines**: later steps in your research may take your processed data as input. 

<p align="center">
    <img width="650" alt="image" src="https://github.com/Lenakeiz/ICN_Programming_Course/blob/main/week_5/images/data_files_comics.png?raw=true">
</p>

Let’s say you’ve got a 2D NumPy array with several rows (trials) and several measuerments for each trial.

You can save a text file so that each row becomes one line, and values are separated by spaces.

In [19]:
import numpy as np
import random as rnd
import os

# get 50 random integers between 0 and 100 and put them in a 10x5 array
np.random.seed(42)  # for reproducibility
data = np.random.randint(0, 100, 50).reshape(10, 5)
print(data)

filepath = os.path.join("data", "output.txt")

with open(filepath, "w") as f:
    # the outer loop returns the entire row as a NumPy array
    for row in data:
        line = ""
        for x in row:
            line += str(x) + " "
        line = line.strip()
        f.write(line + "\n")


# you can also do all of the three things at once for creating each lineuse a comprehension to create each line
# 1: loop as a function parameter, which returns the entire row as a NumPy array
# 2: you pass to the function as a generator expression that loops over each element in the row
# 3: piece together using a single join method call
with open(filepath, "a") as f:  
    for row in data:
        line = " ".join(str(x) for x in row).strip()
        f.write(line + "\n")

[[51 92 14 71 60]
 [20 82 86 74 74]
 [87 99 23  2 21]
 [52  1 87 29 37]
 [ 1 63 59 20 32]
 [75 57 21 88 48]
 [90 58 41 91 59]
 [79 14 61 61 46]
 [61 50 54 63  2]
 [50  6 20 72 38]]


## Working with `.csv` files

One of the most common file formats for structured data is the **CSV file** (Comma-Separated Values).

### Why CSV?

- **Human-readable**: you can open a CSV file in a simple text editor and still understand the data, since values are separated by commas.  
- **Universal format**: almost every software (Excel, MATLAB, R, SPSS, etc.) can read and export CSV files, which makes them a great choice for collaboration.  
- **Simple structure**: each line is one “record” (like one subject, one trial, or one measurement), and each value is separated by a delimiter (commas by default, but you can define custom delimiters).

A csv file is usually structured as follows:

- The first row is usually a **header** describing the columns. This is optional.
- Each following row corresponds to one data entry for the columns described.
- Each data entry is separated by a **delimiter** which usually is a comma, but can be changed to be tabs, semicolons, etc

In Python you can work with `.csv` files using the `csv` module.

In [None]:
import csv
import os

datafile = os.path.join("data", "drug_study.csv")

with open(datafile, "r") as file:
    reader = csv.reader(file)
    # the reader object is an iterator, so we can use next() to get the first line
    # what happens is that it implements the iterator protocol, which means it has a __next__() method that returns the next item in the sequence
    header = next(reader)  
    print("Header:", header)
    
    for row in reader:
        print(row)  # each row is a list of values


### Reading a csv into as a dictionary

Another way to import CSV data in Python is to treat it as **dictionary** data structure.  
In fact you can think of:

- The **keys** come from the header row (e.g., `id`, `age`, `condition`, `accuracy`).  
- The **values** are the data entries for that row.  

Using dictionaries can be especially useful when you want to work with **specific columns**.  
For example:  
- If you want to access only the `accuracy` column for analysis, you can do so directly with `loaded_data["accuracy"]`.  
- If your data represents a **time series**, you can easily focus on the column that contains timestamps or signal values, without worrying about the position of that column in the file.

In [21]:
import csv
import os

datafile = os.path.join("data", "drug_study.csv")

# define empty dictionary
data = {}

with open(datafile, "r") as file:
    reader = csv.DictReader(file)
    for row in reader:
        for key, value in row.items():
            if key not in data:
                data[key] = []
            data[key].append(value)

print(data["accuracy"][:5])  # print first 5 accuracy values


['0.84', '0.8', '0.61', '0.63', '0.93']


## Exception handling

When you run a program, things don’t always go as planned.  
- Maybe a file you want to open is not there.  
- Maybe a user types letters when your program expects a number.  
- Maybe you try to divide by zero.  

In all these cases, Python will raise an **exception** indicating that something unexpected happened.  
If you don’t handle it, your program will **crash** and stop running.

Fortunately, you can handle exceptions as part of your program’s **control flow**.  
This is especially useful when the error only occurs in specific cases.

For example, imagine you are analysing EEG data from multiple participants and focusing on selected channels.  
Some participants might have poor recordings, so those channels contain no values or invalid values.  
Trying to process them could raise exceptions, but instead of letting your program crash, you can catch the errors, skip the bad data, and continue processing the valid participants.

In python you can handle exection using the `try ...  except ...` statement

In [22]:
try:
    # Code that might cause an error
    x = int("hello")  # this will fail
except ValueError:
    # What to do if that type of error happens
    print("That was not a number!")

That was not a number!


In [None]:
try:
    result = 10 / 0
except ZeroDivisionError:
    print("You tried to divide by zero — not allowed.")

Sometimes you may not know exactly what kind of error could occur.  
You can use a generic `except` to catch *any* exception:

In [None]:
try:
    x = int("hello") / 0
except Exception as e:
    print("Something went wrong:", e)

Something went wrong: invalid literal for int() with base 10: 'hello'


In [24]:
values = ["10", "20", "hello", "30", "NaN", "40"]

for v in values:
    try:
        number = int(v)
        print("Processed:", number)
    except ValueError:
        print("Skipping invalid value:", v)

Processed: 10
Processed: 20
Skipping invalid value: hello
Processed: 30
Skipping invalid value: NaN
Processed: 40
