# IO and Context

IO is what we call reading and writing files, context is a way of managing what is available to each code block. Objectives:

- Open a file
- Read data from a file
- Close a file
- Use a context to open a file then close it after reading.
- Create and write to a file.

Let's start again with an example:

In [None]:
# Open a file for reading
f = open('lorem_ipsum.txt', 'r') # 'r' is the default mode
file_contents = f.read()         # Read the entire file
print(file_contents)             # Print the file contents
f.close()                        # Close the file

That printed 5 paragraphs of 'lorem ipsum'*. What exactly did it do?

`open` is python function that takes two arguments. The `path` (fancy word for location) of a file, and the `mode`, which is the way in which you want to open the file.

Here our path was just the name of the file, `lorem_ipusm.txt`, as the file is in the same folder this notebook is in. We used the mode `r` which means read. 

We then assign the result of `open` to the variable `f`. The result of open is a `stream`**. What exactly this is isn't that important right now, but what we can do with it is important.

`f.read()` returns the entire content of the file as a string, and we assign this string to a variable `file contents`. Then we print that to output with `print(file_contents)`.

Lastly and this is very important, we do `f.close()` this closes the file and deactivates the stream. If we run any other commands on the stream python will rase an error that the file has been closed. 


## Why do all this just to open a file?

Our file may be very large, in fact it may be gigabytes of memory. We want to be able to open it, search it, and close it. During the time it is open it will be using our computers resources. We want to extract the information we need and close it to regain these resources. Thus using `f.read()` is only common for smaller files as otherwise we simply have a massive string stored in memory. 

Once we have looked at contexts we will look into how to read files more selectively in memory efficient ways.


<details>
<summary>*What is lorem ipusm</summary>

'Lorem ipsum' is placeholder text generated when you need lots of words with very little meaning! [Lorem Ipsum Generator](https://loremipsum.io/)

</details>

<details>
<summary>**What is a stream</summary>

A file stream is a sequence of characters in a program that works with files. A stream provides a connection to an external resource, like a file, and allows the program to read from it or write to it. [More details](https://bito.ai/resources/python-file-stream-python-explained/) 

</details>


### Contexts

Contexts are ways to automate and protect file IO or file handling as it is sometimes called. Let's take a look at an example of using a file context:

In [None]:
with open('lorem_ipsum.txt', 'r') as f:
    file_contents = f.read()
    print(file_contents)

This code produces the same result as the code above, but it uses a context. The context here is created using the `while` keyword to create a block with the context variable `f`. 
Much like a loop we use indentation to specify what is inside the block. The context makes the `f` available to the code in this block, then when the block is over *automatically* closes `f`.
This pattern is much safer than the first pattern, as it prevents the user from accidentally leaving files open and wasting their computer resources!

## Using the content of a file

Often we will want to work with the content of a file without assigning the entire content to a variable. For this we can use the properties of the `stream` to our advantage. A side note that opening a file in this way has it's uses especially via the [stream methods](https://www.w3schools.com/python/python_ref_file.asp), of which `f.read()` is one. However, in most cases there will be a simpler way of loading your data, but that is out of scope of this course*.

The following code is going to open the file as before, but instead of using `f.read` which transfers the entire content of the file to a variable, we will use a `for` loop with our stream. This should give you a hint as to a property of `f`; `f` can also be an iterable that when looped over returns one line of the file at a time. For a file with many many lines, such as a datafile from an experiment, this is much more memory efficient*. Within the for loop we can summarize the data and print the data summary.


<details>
<summary>*Other ways of opening files that are better and more common</summary>

This is the built in way we may load a file in Python. Most data processing libraries `pandas`, `numpy`, `scipy`, or a domain specific library such as `astropy`, `chempy`, `biopython` contain functions that load and process common files.
The library methods are almost always superior for doing boilerplate or common tasks. This iteration method is however useful for cleaning or processing esotheric data.

</details>

In [None]:
# Loop over the lines in a file and print the word and character count
with open('lorem_ipsum.txt', 'r') as f:
    for line_num, line_content in enumerate(f):
        words = line_content.split() # Split on whitespace, this returns a list of the words
        chars = line_content.strip(' ').strip('\n') # Strip the spaces and newlines from the line
        print (f"Line {line_num+1} has {len(words)} words and {len(chars)} characters excluding spaces and newlines.")


This is great, but what if I want to save this output? For this we will need to open a file for writing. To open a file for writing, we need to use another file mode. 

| Mode   | Keyword | Description                                                      |
|--------|---------|------------------------------------------------------------------|
| Read   | 'r'     | Reads from a file, raises an error if it does not exist.         |
| Append | 'a'     | Writes lines to the end of a file. Creates the file if required. |
| Write  | 'w'     | Writes to a file, creates the file if required.                  |
| Create | 'x'     | Creates the file, raises an error if it exists.                  |

Here are all the opening modes*, we can combine create and with write, `'wx'` to make the program error if the file already exists. Note this is very useful as just using `'w'` will delete the content of the file if it does exist.


So to create and open a file we use `open(filepath, 'w')`. When we have a file stream in write mode, we can use the `write` or `writelines` methods. `write` writes a string to the file, `writelines` writes a list of strings to a file.


<details>
<summary>*Format Modes</summary>

There are also format modes that can be included. e.g. 'rt' is read text

| Mode   | Keyword | Description                                   |
|--------|---------|-----------------------------------------------|
| Text   | 't'     | This is default, the file is standard text.   |
| Binary | 'b'     | Binary Mode for opening files such as images. |

</details>

In [None]:
# Loop over the lines in a file and print the word and character count
file_summary = []
with open('lorem_ipsum.txt', 'r') as f:
    paragraph_num = 0
    for line_num, line_content in enumerate(f):
        words = line_content.split() # Split on whitespace, this returns a list of the words
        chars = line_content.strip(' ').strip('\n') # Strip the spaces and newlines from the line
        # Check if the line is empty
        if len(words) > 0:
            # Number the paragraphs
            paragraph_num += 1
            file_summary.append(f"Paragraph {paragraph_num} has {len(words)} words and {len(chars)} characters excluding spaces and newlines.\n")
            #                                                                                                                                ^^
            #                                                                                                                                This is a newline character it is used to move to the next line but you wont see it in the output

# Write to a file
with open('ipsum_summary.txt', 'w') as f:
    f.writelines(file_summary)

In the above file we have used two file contexts, one which reads the file and generates summary information, note the small update made to ignore the blank lines between our paragraphs. 
Then using the file context created using `open(filename, 'w')` we can write to a new file.


# Challenge, reformat the lorem ipsum text to make it look better when opened.

If you open the lorem_ipsum file you will see each paragraph is on a single very long line. This makes it very hard to read. In the cell below write some code that performs the following operations.

1. Open the lorem_ipsum text in python
2. Set a 90 character limit for each line, make sure that a newline character `'\n'` is placed before the word that would make the line longer than 90 characters.
3. Save these new lines and make sure between paragraphs there is a double new line.
4. Open a new file called lorem_ipsum_formatted.txt and write the lines to it.
5. Open the new file in an editor and make sure it looks correct.


<details>
<summary>Hint 1, How to get words</summary>

Once you have opened the file and started iterating over the lines. 
You can separate each line into words using `line.split()` replace line with your loop variable.

</details>

<details>
<summary>Hint 2, How to get the length of a word</summary>

Remember, that strings and lists share many properties. We can use `len(word)` to 
get the length of the word.

</details>

<details>
<summary>Hint 3, I forgot the spaces</summary>

`line.split` removes the whitespace, in fact it splits on the whitespace discarding it in the process.
In the list that returns the words we need to put the spaces back in. After each word is appended add a space. 

<details>
<summary>Hint 3a, hint 3 made me go over my character count</summary>

The space needs to be counted so add one to your count!

</details>

</details>

<details>
<summary>Hint 4, A case of order </summary>

You need to check the length of what the line would be after you added the next word. Then if that would be greater add the newline character, reset the count, add the word to the start of the new line.

</details>

In [None]:


# Do not edit below this line it will print you helpful error messages if you have not completed the exercise correctly
from example_helpers import check_formatted_lorem
check_formatted_lorem()

In [None]:
# Use this to check your work
with open("lorem_ipsum_formatted.txt", "r") as f:
    for line_num, line in enumerate(f):
        print(line_num, len(line))

<details>
<summary>Solution</summary>

```python
with open('lorem_ipsum.txt', 'r') as f:
    formatted_lines = []
    for line in f:
        words_in_line = line.split()
        current_line_length = 0
        for word in words_in_line:
            if current_line_length+len(word)+1 < 90:
                formatted_lines.append(word)
                formatted_lines.append(" ")
            else:
                formatted_lines.append("\n")
                current_line_length = 0
                formatted_lines.append(word)
                formatted_lines.append(" ")
            current_line_length += len(word) + 1
        # We will always have a trailing space but instead we want a double newline to indicate the end of a paragraph.
        # Instead of removing the trailing space we will just replace the last list element with a double newline.
        formatted_lines[-1] = "\n\n"

with open('lorem_ipsum_formatted.txt', 'w') as f:
    f.writelines(formatted_lines)
```

</details>

## Next Section

[05-Conclusion](./05-Conclusion.ipynb)