# Lesson 3: File Handling - Part 1

## File I/O Operations


Python provides built-in functions for reading from and writing to files. Files can be opened in different modes:
- **`r`**: Read mode (default).
- **`w`**: Write mode (overwrites the file if it exists, creates a new file if it doesn't).
- **`a`**: Append mode (writes to the end of the file).
- **`b`**: Binary mode (e.g., `rb`, `wb` for reading and writing binary files).
- **`+`**: Update mode (e.g., `r+` for reading and writing).

### Opening, Reading, Writing, and Closing Files
- Use the `open()` function to open a file.
- Use `read()`, `readline()`, or `readlines()` to read from a file.
- Use `write()` or `writelines()` to write to a file.
- Always close the file using `close()` to free up system resources.

**Example**:

```python
file = open('example.txt', 'w')
file.write('Hello, world!')
file.close()
```
            

In [None]:
# Writing to a file
file = open('sample.txt', 'w')
file.write('This is a sample text file.')
file.close()

In [None]:
# Reading from a file
# ...



In [None]:
# Appending to a file
# ...


In [None]:
# Reading the updated file
file = open('sample.txt', 'r')
updated_content = file.read()
print("Updated file content:", updated_content)
file.close()

## Using the `with` Statement for Efficient File Handling


The `with` statement provides a way to automatically close the file when you are done with it, even if an error occurs during file operations.

**Example**:
```python
with open('example.txt', 'r') as file:
   content = file.read()
   print(content)
```
- You don't need to call `file.close()` when using the `with` statement; it is handled automatically.
            

In [None]:
# Using the with statement for file operations
with open('sample.txt', 'r') as file:
    # ...

In [None]:
# Writing with the with statement
with open('sample.txt', 'w') as file:
    # ...

## Handling Large Files Efficiently


When dealing with large files, it's important to avoid loading the entire file into memory at once. Instead, read the file line-by-line or in chunks.

### Reading Files Line-by-Line
Use a `for` loop to iterate through the file's lines:

```python
with open('large_file.txt', 'r') as file:
for line in file:
print(line.strip())
```

### Reading in Chunks
Read a specific number of bytes at a time using `read(size)`:

```python
with open('large_file.txt', 'r') as file:
   chunk = file.read(1024)  # Read 1024 bytes
   while chunk:
      print(chunk)
      chunk = file.read(1024)
```
            

In [None]:
# Simulating reading a large file line-by-line
with open('sample.txt', 'r') as file:
    # ...

In [None]:
# Simulating reading a large file in chunks
with open('sample.txt', 'r') as file:
    chunk = file.read(10)  # Read 10 bytes at a time
    while chunk:
        print("Chunk:", chunk)
        chunk = file.read(10)

## Exercises: File I/O Operations

**Create a program that writes the numbers 1 to 10 to a file, each number on a new line**

In [None]:
with open('numbers.txt', 'w') as file:
   # ...
   

**Modify the above program to append numbers 11 to 20 to the same file using the `with` statement**

In [None]:
# ...


**Write a program that reads the contents of the file named `numbers.txt` line-by-line and prints each line**

In [None]:
with open('numbers.txt', 'r') as file:
   for line in file:
      print(line.strip())

**Calculate frequency of occurence per base for a list of strings from the file testdna2.txt using dictionaries**

In [None]:
# ... copy paste code from from previous lab, and adapt!!!
# ...


**Make a plot with the results!**

In [None]:
import matplotlib.pyplot as plt

# Extract keys and values
keys = list(dict.keys())
values = list(dict.values())

# Plotting
# ...

# Adding labels and title
@ ...

# Display the plot
plt.show()

**Calculate (A+T) / (C+G) ratio per sequence after loading multiple sequences from a file**

* The purpose of this exercise is to allow you to become familiar with the concepts of file parsing, loading file contents into convenient data structures and performing operations on loaded data in a convenient way.
* This notebook can also serve as a useful guide with hints useful for solving the PWM on-site exercise (EX4).

In [None]:
sequences_file = 'gata3.txt'

# initiate an empty list to hold the information that we will load from the file
sequences_as_list_of_lists = list()

# ...

# open a connection to the file and attach it to a file handle

# use the file handle as a proxy to loop through the lines of the file one-by-one

# remove the newline character from the end of the string

# 2 ways to split the letters of a string to a list

# dynamically update the 2-d array

# detach the file handle from the file and close the connection to the file

print(sequences_as_list_of_lists)

* The cell above essentially creates a 2-dimentional array/matrix
* What if our objective is to go through the elements of such an array one-by-one, and perform some calculations? How can we do it?
* For example, how can we calculate the (A+T)/(G+C) ratio for each row in this 2-D array?

In [None]:
# ...

# Loop through each element of the list using indices (each element is another list)
    
# initialize 2 variables that will hold the AT and GC counts
    
# Loop through each element of the nested list

# check if the element is equal to 'A'

# check if the element is equal to 'T'

# check if the element is equal to 'G'

# if all 3 conditions above are false then we expect 'C'
        
## Are these if-elif-else conditions safe enough?

# print the calculated ratios

## How can we use print in a prettier form?
    
## Is the calculation of the ratio safe enough?