# Workshop 3

## _File Reading & Writing in Python_

In this workshop, we will cover the basics of reading and writing files which are _external_ to Python scripts we write. While there are many cases in which we may want read and write files, one pertinent example for Data Science is data cleaning. Folklore among data scientists is that data cleaning can take up to 90% of a project's lifecycle. We will not discuss the extremely difficult cases of data cleaning in this workshop, but we will introduce some basics to get you started.

## Interacting with a File 

In Python, we open a file using the *open()* command and assign the resulting object to a variable name of our choice. This object is the file waiting to be interacted with using other commands. The primary commands we will use to interact with a file object are *read()*, *readline()*, and *write()*. Finally, once we are done interacting with a file, we will need to close the connection to the file to finalize the changes before exiting the python script. We will do this through the *close()* function in python.


### Opening a File

When using the open command, we will pass two arguments in the function call. The first is the filename as a string datatype. For this workshop, we will assume that all of the files we open are in the same directory as the python script we are running. However, you can alter the [file path](todo) to open a file from anywhere on your computer. The second argument we will pass is the _mode_ which we are opening the file in. There are three possible modes we can open a file in:

* Read mode -- This allows us to read from the file but not write any information to the file. We can open a file with this mode by passing 'r' as the mode argument in the *open()* function. 

* Write mode -- This allows us to write to the file but not read any information from the file. We can open a file with this mode by passing 'w' as the mode argument in the *open()* function.

* Append mode -- This allows us to append information to the end of the file without overwriting information currently in the file. We can open a file with this mode by passing 'a' as the mode argument in the *open()* function.

### Interacting with a File

Once we have opened a file, we will want to use one of the three functions discussed earlier to either read or modify the contents of the file. When using the *read()* command, we will read _all_ of the contents of the file at once. This can be useful in some scenarios, but we will mostly want to use the *readline()* function to step through each new line of the file sequentially. 

All output from the *readline()* function will be outputted as a _string_ datatype. However, sometimes we will want process the contents of the file as other datatypes such as _integers_ or _floats_. In order to do change a string to one of these other datatypes, we can _cast_ the string into another datatype by using either the *int()* function or *float()* function. These functions require that the string passed adhere to what would be expected of a float (no letters, no punctuation except the decimal point in the float datatype). Otherwise, an error will be returned and the script will close. If we are unsure what type an object is, we can apply the *type()* function to find the answer.

Python provides convenient functionality for looping through lines of each file by allowing you to define the readline function implicitly in a for loop. We can treat an opened file object as a list and iterate through the contents. 

Finally, if we want to write contents to a file, we can open the file with the correct permissions and send output. Remember that if you do not use the append functionality, python will overwrite the file completely. If we are using the append function, we do not have to worry about the write function overwriting pre-existing data.

### Closing a file

In order to ensure our changes persist, we must close the file using the *close()* function on the opened file object. Once we have done this, we can then open the file with an external editor to examine the changes.

In [34]:
# Open the file with read permissions
opened_file = open('test.txt', 'r')

# Read the entire contents of the file
print(opened_file.read())

# If I try reading anymore content of the file, we will find nothing is returned.
print(opened_file.read())

# Close the file as we are done operating on it
opened_file.close()

1000200020002000
2000
2000
2000
2000




In [32]:
# Open file with read permissions
opened_file = open('test.txt', 'r')

# Read only one line of the file
print(opened_file.readline())

# Read the next line of the file
print(opened_file.readline())

# Close the file as we are done operating on it
opened_file.close()

1000200020002000

2000



In [15]:
# Open the file with write permissions
opened_file = open('test.txt', 'w')

# Write a new number to the first line
opened_file.write('1000')

# Close the file to finalize changes
opened_file.close()

# Read the file to check the changes
opened_file = open('test.txt', 'r')
print(opened_file.read())
opened_file.close()

1000


In [22]:
# Open the file with append permissions
opened_file = open('test.txt', 'a')

# Write some more content into the file as a string
opened_file.write('2000')

# Close the file to finalize changes
opened_file.close()

opened_file = open('test.txt', 'r')
print(opened_file.read())
opened_file.close()

1000200020002000
2000
2000
2000
2000



## Using Python Tricks for File Reading and Writing

### For Loop for Each New Line

There are two key python shortcuts we can use to improve our ability to interact with files in Python. The first is using a _for_ loop to iterate over each the lines in the file. Python recognizes an open file just like a list with each item being a new line in the file. Using this, we still need to open and close the file manually before interacting with it.

In [24]:
# We can use built-in Python functionality to read files sequentially using a for loop structure
opened_file = open('test.txt', 'r')

# For each new line in the opened file, perform some operation (in this case print)
for line in opened_file:
    print(line)
    
# Close file as we are done operating on it
opened_file.close()

1000200020002000

2000

2000

2000

2000



### Automatically Closing files using a With structure

The with structure temporarily defines a variable to be used with an indentation block. Conveniently, Python knows when that variable is a file object and can close the file automatically after exiting the indentation block. This allows us to never forget to close a file after operating on it by using a with structure. The _with_ structure may not apply to every problem, but it can be convenient in may contexts.

In [30]:
# We can use one more Python trick to automatically 
# close the file once we are done using the "with" command

with open('test.txt', 'r') as opened_file:
    for line in opened_file:
        print(int(line))
        
# If I try to operate with the file outside of the identation block, I get an error
#opened_file.read()

1000200020002000
2000
2000
2000
2000


ValueError: I/O operation on closed file.