 *Artificial Intelligence for Vision & NLP* &nbsp; | &nbsp;  *ATU Donegal - MSc in Big Data Analytics & Artificial Intelligence*

# Manipulating Text Files

Python uses file objects to interact with external files on your computer. These file objects can be any sort of file such as audio, text, emails, spreadsheets, etc. 

<strong> Note: </strong> You will probably need to install particular libraries or modules to interact with those various file types, but they are easily available.

Python has a built-in open function that allows us to open and edit basic file types. 


## Creating a File with IPython
This function is specific to Jupyter notebooks in Colab. We'll create a basic text file and add some text to it:

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

file1 = open("/content/gdrive/My Drive/mytextfile.txt", "w")

contents = "This is the first line of my new text file.\nThis is the second line of the text file."

file1.write(contents)

file1.close()

A text file should now be created in your Google Drive.

Knowing the Google Drive path you are working in is important when opening a file that is saved in the same location as your notebook. Of course we can open a file from any location, and not just the working directory of the notebook.
Now we'll edit the text file `mytextfile.txt` that we've created.


In [None]:
# Open the mytextfile.txt file I created earlier
my_text_file = open("/content/gdrive/My Drive/mytextfile.txt")

Let's examine some details about this text file:

In [None]:
my_text_file

This feedback from the interpreter means we're using a wrapper to open the file that has opened the text file in a <strong> read-only </strong> mode. It is now an open file object held in memory. We'll perform some reading and writing exercises, and then we have to close the file to free up memory.

## Reading and Seeking

Lets first read the file.

In [None]:
my_text_file.read()

If I try to open the file again, something unexpected happens.

In [None]:
# What happens if we try to read it again?
my_text_file.read()

This happens because the reading <strong> cursor position </strong> is at the end of the file after having read it. So there is nothing left to read. 
We can reset the <strong> cursor position </strong> like this, to index position 0 (start of the file).

In [None]:
# Seek to the start of file (index 0)
my_text_file.seek(0)

This command resets the cursor position back to the beginning point of the file.

Now if we try to open the file again, we should be able to re-read all of its contents.

In [None]:
my_text_file.read()

I can read the contents of the file into a string with this command. Make sure you reset the cursor position first with the <strong> `.seek` </strong> command, otherwise there will be nothing read into the string.

In [None]:
my_text_file.seek(0)
file_contents = my_text_file.read()

And I can show its contents using the print command

In [None]:
print(file_contents)

Then we no longer need to re-read the file contents again, and instead we can work directly with the contents of the string.

It is important to close any files you open. We do this using the <strong> `.close()` </strong> command.

In [None]:
my_text_file.close()

## The `.readlines()` Method

We can use the `.readlines()` method to read a file line by line. Use this command with caution with large files, since everything will be held in memory. We will learn how to iterate over large files later in this course.

Open the file again, and then use the `.readlines()` command:

In [None]:
my_text_file = open("/content/gdrive/My Drive/mytextfile.txt")
all_my_lines = my_text_file.readlines()

In [None]:
my_text_file.close()

Now that we have the contents of the text file in individual lines, we can perform various functions on it. For example, we can use a loop to iterate through each line and print out the fourth word of each line.

In [None]:
for line in all_my_lines:
    print (line.split()[3])

## Writing to a File

By default, the <strong> `open()` </strong> function will only allow us to read the file. We need to pass the argument `'w'` to write over the file. For example:

In [None]:
my_text_file = open("/content/gdrive/My Drive/mytextfile.txt", "w+")

Let's check what's in the file now:

In [None]:
my_text_file.read()

This indicates that the text file contents have has been overwritten. Use the `w+` option with caution!
Opening a file with 'w' or 'w+' *truncates the original*, meaning that anything that was in the original file is deleted! Lets add some new text to the file and see what happens to its contents.

In [None]:
my_text_file.write("This is new contents I'm adding to the text file.")

In [None]:
# Return the indexer to the start of the file
my_text_file.seek(0)
my_text_file.read()

The text file no longer contains the original text we entered into it earlier. It now contains new text only. Thats is because we used the text mode `w+` argument when I opened the file. Remember that `w+` allows us to read and write to the file. If we want to add text to a file, we need to append text to it.

In [None]:
# Close the file before we continue
my_text_file.close()

## Appending to a File
Passing the argument `'a'` with the `open` command opens the file and puts the pointer at the end, so anything written is appended. Like `w+`, `a+` lets us read and write to a file. If the file does not exist, one will be created.

In [None]:
my_text_file = open("/content/gdrive/My Drive/mytextfile.txt", "a+")

In [None]:
my_text_file.write("This is the first line of my text using the a+ option.")

In [None]:
my_text_file.close()

Lets look at the contents of the file.

In [None]:
my_text_file = open("/content/gdrive/My Drive/mytextfile.txt")

In [None]:
my_text_file.read()

The `a+` option lets us write contents to the end of the file.

Note that we can also press the `SHIFT` + `TAB` buttons to view more detail on the command we are using at any time. This options allows us to get more imformation on each of the various options available to us in a command. This works for all commands.

What happens if we try to open a file that doesn't exist?

In [None]:
my_text_file = open("/content/gdrive/My Drive/testfile.txt")

The file is not automatically created because the standard mode when opening a file is `r`.

We can easily resolve this issue by changing the mode to `a+`. That will then create the new file if it does not currently exist.

In [None]:
my_text_file = open("/content/gdrive/My Drive/testfile.txt", "a+")

Now we'll add some text to this new file:

In [None]:
my_text_file.write("This is the first line of text in my new file.")

Now we'll close the file:

In [None]:
my_text_file.close()

Next we'll reopen the file, but we'll only open it with read permisisons. Remember that this is the default option when opening a file:

In [None]:
my_text_file = open("/content/gdrive/My Drive/testfile.txt")

Now we'll try to write some contents to the file. This won't work as we haven't speficied which mode we'd like to read from the file, so the default `r` mode is used:

In [None]:
my_text_file.write("This is a test to see if I can write to my text file.")

We can easily fix this error - close the file first, and then change the mode to `a+` to allow reading and writing:

In [None]:
my_text_file.close()
my_text_file = open("/content/gdrive/My Drive/testfile.txt", "a+")
my_text_file.write("I'm adding a new line to my test text file using the a+ option.")

Now we can seek to the start of the file and then read its contents into a string.

In [None]:
my_text_file.seek(0)
my_text_file.read()

All of the text is shown on one line of code. If we want to split each line into individual new lines, we need to add the `\n` special charcater when we're writing text to the file. Here's an example - note that we include the special character inside the quote marks along with the text that we're inserting at the end of the text file:

In [None]:
my_text_file.write("\nThis is another new line in my text file.")

Now we'll reset the seek to the start of the file and read all the files contents again:

In [None]:
my_text_file.seek(0)
my_text_file.read()

to allow the special character `\n` to work, we need to use the `print` command to show our text to the screen.

In [None]:
my_text_file.seek(0)
for line in my_text_file:
    print (line)

We could also show the contents of the `.read()` command directly within the `print` statement.

In [None]:
my_text_file.seek(0)
print(my_text_file.read())

## Aliases and Context Managers
We can assign temporary variable names as aliases, and manage the opening and closing of files automatically using a *context manager*.

We can use the `with` command to control access to the text file. It will automatically control access to the file, and close it when we're done with the file. This is commonly used when interacting with text files in Python.

Here's an example of how to use the `with` command:

In [None]:
with open("/content/gdrive/My Drive/testfile.txt", "r") as my_text_file:
    file_contents = my_text_file.readlines()

Then we can show the contents of the text file. We dont need to issue the `.close()` command as all that is taken care of throuth the `with` context manager.

In [None]:
file_contents

Note that the `with ... as ...:` context manager automatically closed `test.txt` after assigning the first line of text to first_line:

## Iterating through a File

In [None]:
with open("/content/gdrive/My Drive/testfile.txt", "r") as my_text_file:
    for line in my_text_file:
        print(line, end="")  # the end="" argument removes extra linebreaks