# I/O (input / output)

Here we are going to introduce some basics ways of reading and writing
files. This is useful when we require some large input for our program
or when we want to save result of our calculations to a place in the disk.

In [1]:
# we can open a file in python with the open() function and passing
# the file name as a string as the first arguments. We also in many
# cases need to specify the mode we want to open the file for now we
# will only focus on read, write and append. But you are free to
# read about the others.

# We create a new file called temporary_file.txt and specify
# we want to write to it.
new_file = open('temporary_file.txt', 'w')
print(new_file) # files have their own type (object) the _io.TextIOWrapper

<_io.TextIOWrapper name='temporary_file.txt' mode='w' encoding='UTF-8'>


In [2]:
# More correctly our variable is in fact an object or an instance of
# a class which we will not go into details at the moment however
# by we can see the possible functions related to this object by
# typing the object name followed by a dot and then pressing tab
# try this below

# press tab after the dot
new_file.

SyntaxError: invalid syntax (1714399965.py, line 8)

In [None]:
# One posibility is the write method lets try it

write_this = "I am writing to a file\n" # the \n is used to create a new line
new_file.write(write_this)

# if you check the file you will see that the is nothing written to it
# for efficiency reasons when writing to a file the data is stored in a
# buffer when the buffer limit is reached then it writes everything in
# the buffer to the file

In [None]:
# we can force it to write by flushing the buffer
new_file.flush()
# If we inspect the file we will now see that our line has been written.
# The buffer size can be modified when creating the file.

In [None]:
# Finally lets close our file since we are done writting to it
# Closing the file also flushes the buffer automatically
new_file.close()

In [None]:
new_file.write(write_this) # As you can see we can no longer write to it

In [None]:
# Alright lets put everything together
log_file = open('temporary_file.txt', 'w')
for i in range(101):
    log_file.write(str(i) + '\n')
log_file.close()

# We have written numbers from 0 to 100 to a file each on a separate line
# As you can I named this file the same as out previous example, however,
# the line we had before "I am writing to this file" disappeared. This
# is because the 'w' mode or writing mode replaces any file with this
# name when it creates it.

In [None]:
# But lets say we wrote to file at some point in our program we closed it
# and we know want to write to it again without losing our previous data
# the way to do this is with the 'a' mode or append.

# We have the same code as before but now I am opening and closing
# the file within the loop

for i in range(101):
    log_file = open('temporary_file.txt', 'w')
    log_file.write(str(i) + '\n')
    log_file.close()
    
# As expected we only get the last number in our file

In [None]:
# We have now change the mode from 'w' to 'a'

for i in range(101):
    log_file = open('temporary_file.txt', 'a')
    log_file.write(str(i) + '\n')
    log_file.close()

# As you can see even though we are closing the file we are not loosing
# what we write from before. However it is important to be careful
# since as we can see we get the number from 0 to 100 but we also get 
# an extra 100 at the beginning that we had from the previous cell. 

In [None]:
# Lets now read the content of our file

reading = open('temporary_file.txt', 'r')
lines = reading.readlines()
print(lines)

# With the readlines() we get a list of our lines. However for large
# files readlines() is not the best because it can fill up our memory
# in many cases is better to read line by line

In [None]:
# We can also read and write to files using the with keyword

lines = []
with open('temporary_file.txt', 'r') as f:
    for line in f:
        lines.append(line)
print(lines)

# Here we are accomplishing the same as before however the benefit is 
# that we can treat our file in chunks and do not worry about the memory.

In [None]:
# Since we are reading line by line we can also perform actions to each
# line for example the data we are getting is not completly useful as it
# is one long string and has the '\n' and we also lost our integer type
# since the number are now strings

# So lets do some operations to treat the data

lines = []
with open('temporary_file.txt', 'r') as f:
    for line in f:
        # The split method is very important when reading a file
        # specially when we have more than one column as it
        # creates a list of values separated by whitespace
        data = line.split()  
        lines.append(int(data[0])) # We take the only element in the list and convert back to int
print(lines)


In [None]:
# With the with keyword the file gets also closed automatically. Let
# see an example of writing to a file

with open('temporary_file.txt', 'w') as f:
    for i in range(101):
        f.write(str(i) + '\n')

### Exercise 1: Using the files inside of data called classroom_1.txt and classroom_2.txt find the average GPA of classroom 1 and classroom 2 and the average age of students with GPA above or equal to 3.8

# String formatting

In [None]:
# Many files have a strict formatting this means that the fields in the
# text file have specific lengths and other programs expect this lengths
# when reading the file so they have to be respected how can we 
# accomplish this?

# We will start with simple strings

word = 'formatting'
print(word)
print(f'{word}') # This is how we specify a formatted string without a format

# Now lets add the format
print(f'{word:>15s}') # the 15 refers to the # of colums we assign
# The > means that we want it right aligned

print(f'{word:<15s}') # Left aligned compared to the first ones it has white spaces to the right up to 15
print(f'{word:^15s}') # Centered

In [3]:
names = ['Michelle', 'Jacob', 'Mary']
last_names = ['Browning', 'Brown', 'Myers']

# Without formatting the last names are not aligned.
# If we save it to a file and try to add a comma in between the name
# and lastname it would not be as easy.

# Also imagine there is a program that reads names and lastnames and it 
# expects that the names from column 1 to 20 and last names from 21 to 40
# in our case say for the first one it would actually read:

# name = 'MichelleBrowning     '
# last_name = '                   '

for i in range(len(names)):
    print(names[i], last_names[i])

Michelle Browning
Jacob Brown
Mary Myers


In [4]:
# Lets fix this issue by giving 20 spaces to each string
for i in range(len(names)):
    print(f'{names[i]:20s}{last_names[i]:20s}')

Michelle            Browning            
Jacob               Brown               
Mary                Myers               


In [5]:
# We can also use string format for numeric values for integers there is
# nothing additional to learn with just use the letter d instead of s
# to indicate is a integer value

for i in range(len(names)):
    print(f'{i:10d}{names[i]:20s}') # by default numeric values are aligned right
# while strings are aligned left lets modify this to look better    

         0Michelle            
         1Jacob               
         2Mary                


In [6]:
for i in range(len(names)):
    print(f'{i:10d}{names[i]:>20s}')

         0            Michelle
         1               Jacob
         2                Mary


In [7]:
# In the case of floats, we can also specify the number of decimal places
# using a dot after the specified number of columns and we use an f
import numpy as np

for i in range(10):
    rand_num = 1000000 * np.random.random()
    print(rand_num)

569189.0029673155
876554.7416937583
752377.5417952172
901318.6527577641
954867.1951160504
86024.99579050305
89545.52026310869
807119.8223343416
16968.04783722117
766867.9729109536


In [8]:
for i in range(10):
    rand_num = 1000000 * np.random.random()
    print(f'{rand_num:12.2f}') # We are specifying it is a float of 12 colunms and 2 of them are decimal places

   109493.47
   844430.50
   585430.69
    16878.18
   238843.33
   352028.82
   356459.44
   968897.98
   651175.02
   545042.19


In [9]:
# Another way of formating is using the format() function

# We define our particular format before
fmt = '{:20s}{:20s}'
for i in range(len(names)):
    print(fmt.format(names[i], last_names[i])) # we call the format function 

Michelle            Browning            
Jacob               Brown               
Mary                Myers               


### Exercise 2: The GRO file format is a type of format used by GROMACS a MD engine to specify atomic coordinates. Read about this type of file https://manual.gromacs.org/archive/5.0.3/online/gro.html. Ignoring the velocities it consists of 7 fields with differnt sizes. The idea is to exploit this file format and the 3d visualization tools such as pymol that read this formats to graph the following function 

In [16]:
def function(grid, xs, ys):
    X, Y = np.meshgrid(xs, ys)
    z = X ** 2 - Y ** 2
    return z

In [19]:
# Hint sticking to the strict format of the gro file we want to write
# the above coordinates in the last 3 fields. For the first 4 fields we can
# write the following i, LIGAND, OW1, i where i is the index of our
# x,y,z array

grid = 50
xs = 10 * np.linspace(-1, 1, grid)
ys = 10 * np.linspace(-1, 1, grid)
z = 0.1 * function(grid, xs, ys)

out = open('test.gro', 'w')
out.write('Title 3d Plot\n')
out.write(f'{grid * grid:d}\n')

form = '{:5d}{:5s}{:>5s}{:5d}{:8.3f}{:8.3f}{:8.3f}\n'
counter = 1
for i in range(grid):
    for j in range(grid):
        out.write(form.format(counter, 'LIGAND', 'OW1', counter, xs[i], ys[grid - j - 1], z[i][j]))
        counter += 1
out.write(f'{100:8.3f}{100:8.3f}{100:8.3f}\n')
out.close()

# More ways of looping

In [12]:
# We have seen some ways on how to repeate an operation inside a loop
# however 

In [13]:
dictionaries
reading writing files
string formatting
timing functions
different ways to loop




write math function to pymol
lorentz attractor

counting with a dictionary
dynamic programming collatz

more advance mdanalysis (hydrogen bonds)

excersice write rmsd code output to a file

PCA

SyntaxError: invalid syntax (1285490197.py, line 2)