## IO: reading and writing files

The simplest method to read a datafile structured in columns is to use the numpy *loadtxt* function:

In [1]:
import numpy as np

In [2]:
data=np.loadtxt("data_files/files.txt")
print("Type of data: ", type(data))
print(data)

x=data[:,0]   # take all the elements of the first column
y=data[:,1]   # take all the elements of the second column

print("\nFirst column:  ", x)
print("Second column: ", y)

Type of data:  <class 'numpy.ndarray'>
[[-1.  1.]
 [ 0.  0.]
 [ 1.  1.]
 [ 2.  4.]
 [-2.  4.]
 [-3.  9.]
 [ 3.  9.]
 [ 4. 16.]
 [-4. 16.]]

First column:   [-1.  0.  1.  2. -2. -3.  3.  4. -4.]
Second column:  [ 1.  0.  1.  4.  4.  9.  9. 16. 16.]


Note that, as expected, for instance the first element of such 2-dimensional array (*first element* --> *first row*) is

In [3]:
data[0]

array([-1.,  1.])

To save data in a file, the numpy *savetxt* function can be used:

In [4]:
np.savetxt("data_files/my_x.out", data, fmt='%7.3f   %5.2e')

### Reading files (the Python way)

More *general* and *flexible* but more complicated...

In [5]:
f=open('data_files/files.txt', 'r')
print("Type of f: ", type(f))
print("f: ", f)

Type of f:  <class '_io.TextIOWrapper'>
f:  <_io.TextIOWrapper name='data_files/files.txt' mode='r' encoding='cp1252'>


In [6]:
data=f.read()
f.close()

print("Type of data", type(data))
print(data)

Type of data <class 'str'>
 -1.    1.
  0.    0.
  1.    1.
  2.    4.
 -2.    4.
 -3.    9.
  3.    9.
  4.   16.
 -4.   16.


Have a look to the first element of *data*...

In [7]:
data[0]

' '

that is: ```data[0]``` is a string of one character (the *space* character).

The real *nature* of *data* is revealed by printing it without a *print* command (that would show *data* with *format included*):

In [8]:
data

' -1.    1.\n  0.    0.\n  1.    1.\n  2.    4.\n -2.    4.\n -3.    9.\n  3.    9.\n  4.   16.\n -4.   16.'

What to do with that???

In [9]:
data_split=data.split()  # start by splitting it
size=len(data_split)

... to be explained later... 

In [10]:
data_array=[float(data_split[dd].strip()) for dd in range(size)]

In [11]:
print(data_array)

[-1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 2.0, 4.0, -2.0, 4.0, -3.0, 9.0, 3.0, 9.0, 4.0, 16.0, -4.0, 16.0]


Then *define* first column (*x*) and second column (*y*) starting from the *data_array*:

In [12]:
x=data_array[::2]   # slicing start:end:step 
y=data_array[1::2]

print("x: ", x)
print("y: ", y)

x:  [-1.0, 0.0, 1.0, 2.0, -2.0, -3.0, 3.0, 4.0, -4.0]
y:  [1.0, 0.0, 1.0, 4.0, 4.0, 9.0, 9.0, 16.0, 16.0]


Here is another way to work, apparently more complicated, but more *clear* and, yet, more *general* than the previous one:

In [13]:
f=open('data_files/files.txt', 'r')

line=[]

flag=True
while flag:
    file_line=f.readline().strip().split()  
    if not file_line:
        print("End Of File")
        flag=False
    else:
        line.append(file_line)
                
f.close()

End Of File


Note that when *readline* tries to read a line *beyond* the end of the file, it returns an *empty* list; in this case, ```not file_line``` is *True*, so that the boolean variable *flag* is set to *False* and, therefore, the *while* cycle is terminated.  

At the end of the *while* cicle, we have our variable *line* which is a list of lists: each sublist corresponds to a row of the original file. Note that the elements of such sublists are *strings*:

In [14]:
print("Type of line: ", type(line), "\nline: ", line)

Type of line:  <class 'list'> 
line:  [['-1.', '1.'], ['0.', '0.'], ['1.', '1.'], ['2.', '4.'], ['-2.', '4.'], ['-3.', '9.'], ['3.', '9.'], ['4.', '16.'], ['-4.', '16.']]


Now we will transform this list of lists of strings in a simple list of floats, and then we will define the *x* and *y* lists as the first column and the second column of the original file, respectively:

In [15]:
row=len(line)
col=len(line[0])

data=[]

for ir in range(row):
    for ic in range(col):
        data.append(float(line[ir][ic]))

print("data: ", data, "\n")               
x=data[::2]
y=data[1::2]

print("x: ", x)
print("y: ", y)        

data:  [-1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 2.0, 4.0, -2.0, 4.0, -3.0, 9.0, 3.0, 9.0, 4.0, 16.0, -4.0, 16.0] 

x:  [-1.0, 0.0, 1.0, 2.0, -2.0, -3.0, 3.0, 4.0, -4.0]
y:  [1.0, 0.0, 1.0, 4.0, 4.0, 9.0, 9.0, 16.0, 16.0]


### Writing in a file (Python way) 

Data, whatever *type* they are, must be *piped* in a file as *strings*. *Carriage returns*, if required, must be added...

In [16]:
f=open('data_files/file_from_python.out', 'w')

a=[0,1,2,3,4]
b=[5,6,7,8,9]

for ia, ib in zip(a,b):
    ias, ibs=str(ia), str(ib)
    ics=ias+"   "+ibs+"\n"
    f.write(ics)

f.close()

Here is another slightly different way that uses the function *writelines* to *pipe* a list of strings in a file:

In [17]:
f=open('data_files/file_from_python.out', 'w')

a=[0,1,2,3,4]
b=[5,6,7,8,9]

for ia, ib in zip(a,b):
    ls=[str(ia), "   ", str(ib), "\n"]
    f.writelines(ls)

f.close()

### Exercise

Merge a number *n* of data-files, each containg a single column of data (float numbers), in a single data-file containing *n* columns. The names of the files to be merged are written in the file *file_names.txt*

In [18]:
import numpy as np

path='data_files/'

f=open(path+'file_names.txt')

line=[]
flag=True
while flag:
      file_line=f.readline().strip()
      if not file_line:
         flag=False
      else:
         line.append(file_line)

f.close()

n_file=len(line)

print("Number of files to be read: ", n_file)
print("Files: ", line )

data=np.array([])

for ifile in line:
    col=np.loadtxt(path+ifile)
    data=np.append(data, col)

print("\nData array before 'reshuffling':\n",  data) 

l_data=data.size
n_row=int(l_data/n_file)
n_col=n_file

data=np.transpose(data.reshape(n_col, n_row))
print("\nData array after 'reshuffling':\n",data)

np.savetxt(path+'merged_file.out', data, fmt='%5.2f   '*n_file)

Number of files to be read:  3
Files:  ['file_1.dat', 'file_2.dat', 'file_3.dat']

Data array before 'reshuffling':
 [ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15.]

Data array after 'reshuffling':
 [[ 1.  6. 11.]
 [ 2.  7. 12.]
 [ 3.  8. 13.]
 [ 4.  9. 14.]
 [ 5. 10. 15.]]


### Exercise

By using Python functions only, change the *decimal* separator of a floating numbers in a file, from the *comma* (',') to the *dot* ('.'). The function to be used is *replace*, which is actually a method of any Python string:

In [19]:
number='56,9786'
number_dot=number.replace(',','.')
print(number, number_dot)

56,9786 56.9786


Note that we can convert *number_dot* (which is a string) in a floating number:

In [20]:
print("Type of number_dot:", type(number_dot))
float_number=float(number_dot)
print("Type of float_number:", type(float_number), "  Number: ", float_number)

Type of number_dot: <class 'str'>
Type of float_number: <class 'float'>   Number:  56.9786


But we cannot do the same for *number* (a string containing a *comma*):

In [21]:
try:
   float(number)
except ValueError:
   print("You cannot convert %s in a floating point number!" % number)   

You cannot convert 56,9786 in a floating point number!


Now do the exercise...

In [22]:
file_input='data_files/commas.dat'
file_out='data_files/dots.dat'

# Open the input file
f=open(file_input, 'r')

# Read each line from the file, by also stripping
# the carriage return at the end of each line
# and append the content in the 'line' array
line=[]
flag=True
while flag:
    file_line=f.readline().strip()
    if not file_line:
        flag=False
    else:
        line.append(file_line)
        
# close the input file        
f.close()

# From the created 'line' list, replace each comma in each
# element of such list and put the result in the 'new_list' list
new_list=[sd.replace(',','.') for sd in line]


# Open the output file and write in it the content
# of 'new_list', one element for row, by adding a
# 'newline' (\n) character to any element. An element
# of 'new_list' is the string representation of a pair of
# floating point numbers

f=open(file_out, 'w')
for iline in new_list:
    f.write(iline+'\n')
    
# Don't forget to close the file
f.close()


Same operations on the files can be done by using a *context manager* that will close those files automatically when they are no longer used; this is realized by means of the *with* manager: 

In [23]:
file_input='data_files/commas.dat'
file_out='data_files/dots.dat'

with open(file_input, 'r') as f:
     line=[]
     flag=True
     while flag:
           file_line=f.readline().strip()
           if not file_line:
              flag=False
           else:
              line.append(file_line)
        
new_list=[sd.replace(',','.') for sd in line]

with open(file_out, 'w') as f:
     for iline in new_list:
         f.write(iline+'\n')