# Working with Files

Python code can interact with external files, either on a computer, or a cloud storage drive, or a dataset. These files can be text files, spreadsheet files like Excel documents, audio files, e-mails, video, etc. Basic Python has a built-in open function that allows us to open, read and write basic file types (like .txt files). Some file types might require the installation of a certain library or module.

# Writing a File

First, we have to create a file for our dataset. You do this by using *%%writefile*, followed by name and type of the file.

NOTE: you cannot use comments on lines used to write new files.

In [1]:
%%writefile myfile.txt
Hello, this is the first file.
Hi again, this is the second line.

Writing myfile.txt


# Opening a File

Opening a file in the same notebook or in the same folder is easy as just using the built in function *open*. However, there might be instances where you need to open files from a different folder.

On Windows: use double \ so python doesn't treat the second \ as an escape character:

    "C:\\Users\\YourUserName\\Home\\Folder\\myfile.txt"

MacOS and Linux: use slashes in the opposite direction:

    "/Users/YouUserName/Folder/myfile.txt"
    
Finding the current location is done using *pwd*:

In [2]:
pwd

'/kaggle/working'

In [3]:
#As mentioned, in order to open a file, use open:

my_file = open("myfile.txt")

# Reading a File

After opening a file, different operations can be used with the file. The most basic one is reading it.

In [4]:
#Reading the file: file.read()

my_file.read()

'Hello, this is the first file.\nHi again, this is the second line.\n'

In [5]:
#Reading it again:

my_file.read()

''

Trying to read the file again will result in an empty output ''. This happens because you can imagine the reading *cursor* is at the end of the file after having read it. So there is nothing left to read. 

In [6]:
#Reseting the cursor: file.seek(0)

my_file.seek(0)

0

In [7]:
#Reading it again

my_file.read()

'Hello, this is the first file.\nHi again, this is the second line.\n'

In [8]:
#The cursor can be planced on any other character.

my_file.seek(31)

31

In [9]:
my_file.read()

'Hi again, this is the second line.\n'

In [10]:
#The text in a file can be assigned to a string

my_file.seek(31)
second_line = my_file.read()
second_line

'Hi again, this is the second line.\n'

There is a workaround this, so that we don't have to use *seek()* every time. You can read a file line by line using the *readlines* method. However, this method should be used with caution as it can cause issues when trying to read large files.

In [11]:
#First, reset the cursor to the first character

my_file.seek(0)

#Using the readlines method: file.readlines()

my_file.readlines()

['Hello, this is the first file.\n', 'Hi again, this is the second line.\n']

# Closing a File

It's important that, after using a file and no longer needing to work with, that you close the file. Otherwise, you will not be able to delete it, for example.

In [12]:
#Closing a file: file.close()

my_file.close()

# Writing to a File

By using de simple default *open*, we are able to only read the file. In order to write to a file, we need to pass an argument. Below, a table with all available arguments when opening a file in Pyhton:

<table class="table table-bordered">
<tr>
<th style="width:10%">Argument</th><th style="width:45%">Description</th>
</tr>

<tr>
<td>r</td>
<td>Read ony. Just reading the file.</td>
</tr>

<tr>
<td>w</td>
<td>Write Only. Overwriting a file or creating new files.</td>
</tr>

<tr>
    <td> a </td>
    <td>Append. Adding a line at the end of the file</td>
</tr>
    
<tr>
    <td> r+ </td>
    <td>ReadWrite. Being able to both read and write the file</td>
</tr>
    
<tr>
    <td> w+ </td>
    <td>WriteRead. Writing and reading, able to overwrite files or create new files.</td>
</tr>
    
<tr>
    <td> a+ </td>
    <td>AppendRead. Adding a line at the end of the file and the ability to read it.</td>
</tr>
</table>

Opening a file with *w* or *w+* truncates the original, meaning that anything that was in the original file **is deleted**.

Using these arguments in Python is called **The File Mode**.

In [13]:
#Using the w+ argument to create a new file

my_file = open('myfile2.txt', 'w+')

In [14]:
#Writing to the file

my_file.write('The first line')

14

Note that the output of running the previous cell is the number of characters indexed in the file we wrote.

In [15]:
#Reading the file

my_file.seek(0)
my_file.readlines()

['The first line']

In [16]:
#Close the file after it's been used

my_file.close()

## Appending

Passing the *a+* argument allows us to open the file and puts the pointer at the end, so anything written is appended. If the file doesn't exist, a new one will be created.

In [17]:
#Opening the file used previously

my_file = open('myfile2.txt', 'a+')
my_file.write('. And this, stil on the first line.')

35

In [18]:
#Use \n to add the item you are writing to a new line

my_file.write('\nThis is the second line.')

25

In [19]:
my_file.seek(0)
print(my_file.read())

The first line. And this, stil on the first line.
This is the second line.


In [20]:
#Remember to close the file after it's been used

my_file.close()

# Alternative File Mode

There is a different way to work in file mode, which can also help with the anoying part of having to close the file every time you use one. Using the *width* function, enables us to enter File Mode and not have to use close() at the end.

In [21]:
%%writefile myfile3.txt
ONE FIRST
TWO SECOND
THREE THIRD

Writing myfile3.txt


In [22]:
#Append

with open('myfile3.txt', 'a+') as f:
    f.write('FOUR FORTH')
    f.seek(0)
    print(f.read())

ONE FIRST
TWO SECOND
THREE THIRD
FOUR FORTH


In [23]:
#Read

with open('myfile3.txt', 'r') as f:
    print(f.read())

ONE FIRST
TWO SECOND
THREE THIRD
FOUR FORTH


In [24]:
#The contents in a file can be assigned to a variable

with open ('myfile3.txt') as my_new_file:
    contents = my_new_file.read()
    
print(contents)

ONE FIRST
TWO SECOND
THREE THIRD
FOUR FORTH
