# Thursday, Week 1: Module
# Reading and Writing to Files

__Learning Objective:__ practice opening, reading, manipulating, and writing to files. 

Following: https://www.pythonforbeginners.com/files/reading-and-writing-files-in-python

Python knows how to open/read/write/close files; you don't need a special library to do this. 
The open() function opens a file. Nice and simple! 

open() returns a file object. File objects and files are not the same things! Files are actual files on your computer with data. File objects contain methods and attributes that can be used to tell you information and manipulate the file you just opened. 

For instance, the name attribute of a file object tells you the name of the file you just opened and stored in the file object. The mode attribute of a file object tells you in what mode the file was opened (more on this in the next cell).

In [None]:
# this cell does not contain code! don't run it. 
# just some general syntax for using open(). 

file_object  = open(“filename”, “mode”) #file_object is the variable where we're going to store the file object. 
#filename is the name of the file we're trying to open
#mode can be a couple of different things and determines what we'll be able to do with the file. 

‘r’ – Read mode: only read the file (don't write/edit)

‘w’ – Write mode: edit and write new information to the file (any existing files with the same name will be erased when this mode is used!) 

‘a’ – Append mode: add new data to the end of the file (without overwriting the entire file)

‘r+’ – Special read and write mode: handle both actions when working with a file 

In [2]:
file = open('testfile.txt','w') 
 
file.write('Hello, world! ') 
file.write('We have just created this text file! ') 
file.write('Now we are adding a new line to it.\n') 
file.write('And now we are adding another new line.\n') 
file.write('12345.\n') 
file.write('Okay, closing the file after this.') 

 
file.close() 

read() function can be used when a file was opened with 'r' mode. read(num) reads num characters of the file.

read() reads the entire file all at once!

In [3]:
file = open('testfile.txt', 'r') 
print(file.read() )
file.close()

Hello, world! We have just created this text file! Now we are adding a new line to it.
And now we are adding another new line.
12345.
Okay, closing the file after this.


In [4]:
file = open('testfile.txt', 'r') 
print(file.read(1) )
file.close()

H


In [5]:
file = open('testfile.txt', 'r') 
print(file.read(5) )
file.close()

Hello


You can also read a file one line at a time using the readline() and readlines() functions.

readline() reads one line at a time and you can run it again and again to get the next line each time. It will return a string with a single line from the file. 

In [6]:
file = open('testfile.txt', 'r')
print(file.readline())

Hello, world! We have just created this text file! Now we are adding a new line to it.



In [7]:
print(file.readline())

And now we are adding another new line.



In [8]:
print(file.readline())

12345.



In [9]:
print(file.readline())

Okay, closing the file after this.


In [10]:
print(file.readline())
file.close()




readlines() returns a list where each element is a line in the file represented as a string. That's great, because we are pros at working with lists! 

In [13]:
if 'G' in ['G', 'A'] and 'C' in ['C', 'D']:
    print('here')

here


We could also go through all the lines in a file in the following way: 

In [None]:
file = open('testfile.txt', 'r')

for line in file:
    print(line)

file.close()    

Writing to a file in 'w' mode will overwrite any existing files with the same name. 

In [None]:
file = open('testfile.txt', 'w')

file.write('This will overwrite the file!\n') 
file.write('This will get added after the previous line.\n')

file.close()

In [None]:
file = open('testfile.txt', 'r')

print(file.readlines())

file.close()

But writing to a file in the 'a' mode will just append to the end of the file. 

In [None]:
file = open('testfile.txt', 'a')

file.write('But this will get added to the bottom of the existing file..\n') 
file.write('And so will this.')

file.close()

In [None]:
file = open('testfile.txt', 'r')

print(file.readlines())

file.close()

When a file is split into a list of lines, if there are any \n characters, they will be part of the string:

In [None]:
file = open('testfile.txt', 'r')

lines = file.readlines()

for line in lines:
    print([line])
    
file.close()    

You can get rid of the \n character by calling the function (actually, it's a method) strip() on a given line in the file. 

From documentation: https://www.programiz.com/python-programming/methods/string/strip

The strip([ chars ]) method returns a copy of the string with both leading and trailing characters removed (based on the string argument passed).

chars (optional) - a string specifying the set of characters to be removed.

If the chars argument is not provided, all leading and trailing whitespaces are removed from the string.

In [None]:
file = open('testfile.txt', 'r')

lines = file.readlines()

for line in lines:
    print([line.strip('\n')])
    
file.close()    

This can get a little confusing... watch out for this:

In [None]:
file = open('testfile.txt', 'r')

line = file.readline() 

print(line) #here the Python interpreter reads the string and prints it.
#it recognizes the \n character and prints a new line.
#the \n character is still part of the string line, though! 

print(line == 'This will overwrite the file!') #this will return False, even though print(line) looks like this
print(line == 'This will overwrite the file!\n') #this will return True.
    
#print([line])
    
    
file.close()    

The with statement is another way to open a file. It is nice because it automatically closes the file for you.

In [None]:
file = open('testfile.txt', 'w')

file.write('Captains log, Stardate Friday, July 31st, 2020.\n')
file.write('Working with real data today, as one often does in scientific research.\n')
file.write('Reading and writing to files is an important skill for any programmer.\n')

file.close()

with open('testfile.txt') as file: 
    #any files opened will be closed automatically after you are done. 
    lines = file.readlines() 
    print(lines)
    print()
    print(lines[0])

You can loop over the lines in a file like you normally would inside of the with statement.

In [None]:
with open('testfile.txt') as file: 
    #any files opened will be closed automatically after you are done. 
    for line in file:
        print(line)

And you can use it to process and manipulate the data in the file. 

In [None]:
with open('testfile.txt') as file: 
    #any files opened will be closed automatically after you are done. 
    for line in file:
        print([line])
        print([line.strip('\n')])
        print([line.split()])
        print('\n')

What happens if you want to work with multiple files, as you would if you wanted to analyze lots of patient medical histories and genome sequences?

In [None]:
file = open('log1.txt', 'w')
file.write('This is log 1.\n')
file.write('This is the second line of log 1.\n')
file.close()


file = open('log2.txt', 'w')
file.write('This is log 2.\n')
file.write('This is the second line of log 2.\n')
file.close()


file = open('log3.txt', 'w')
file.write('This is log 3.\n')
file.write('This is the second line of log 3.\n')
file.close()

Computers can only open files if you tell them where they are. Heretofore the computer has been assuming, since we didn't get it any special instructions, that the files we're creating and destroying are in the same folder as this jupyter notebook. That's fine for practicing, but if you want to work with LOTS of file, it can be convenient to keep them in separate folders. 

The address of a folder is called a path.

You can import the os module to get your current path (and do lots of other things, too...)

In [None]:
import os
path_to_files = os.getcwd()
print(path_to_files)

If you want to load a lot of files, one way to do this is to use the glob module. 

If you ask a computer to look for a file at /path/to/file/folder/filename.txt, it will open one file.

This is redundant if you'd like to open files that have similar names: you don't want to type each filename by hand!

Instead of writing pseudocode likethis:

open /path/to/file/folder/log1.txt
open /path/to/file/folder/log2.txt
open /path/to/file/folder/log3.txt

You can tell the computer to look for all files with filenames matching a pattern. For this, we can use the star (`*`) character.

`/path/to/file/folder/log*`

Here, the `*` says go to /path/to/file/folder/ and find for me all files that start with the word log, then have something else in their file name (can be anything, and this can be different for each file).

This will find the files log1.txt, log2.txt, log3.txt because they all start with log.

Suppose you had log1.txt, log2.txt, log3.txt but also log4.png, and you only wanted the .txt files. You could find just the txt files by typing `/path/to/file/folder/log*.txt`

Similarly, you could find all the .txt files by asking for `*.txt`

In [None]:
import glob

file_list = glob.glob(path_to_files + '/log*')
print(file_list)

print()

for filename in file_list:
    print(filename)

In [None]:
for filename in file_list:
    file  = open(filename, 'r')
    print(file.readline())
    file.close()

In [None]:
for filename in file_list:
    print("Opening ", filename)
    file  = open(filename, 'r')
    print("Reading ", filename, "\n")

    for l in file:
        print(l)
        
    file.close()
    print("Closing ", filename, "\n")

## Activities:

__Task:__ Create multiple files with similar names, i.e. log1.dat, log2.dat, log3.dat, etc. Write a few lines of information to each of them. This can be whatever you like.

__Task:__ Go into your computer documents and by hand, create a new folder for the files you just created inside of the Classroom Activities folder. Move all the files you created into this new folder.  

In other words, if before your files were in the same folder as this jupyter notebook, I'd like you to please create a new folder where the notebook is and move all the files in there. 

Before:
- /some/folder/this_notebook.ipynb
- /some/folder/log1.dat
- /some/folder/log2.dat

After:
- /some/folder/this_notebook.ipynb
- /some/folder/new_folder/log1.dat
- /some/folder/new_folder/log2.dat

__Task:__ Now write a few lines of code that:
1. correctly sets the path to the folder containing the files
2. opens all of the files contained therein (open them in in append mode) one by one using the * notation for pattern matching in a file name
3. for each file, reads in each line and prints it out. 
4. appends a new line to each file saying "This file was successfully opened and edited\n" 
5. don't forget to close each file when you are done! 