# Reading in data from files

### Basic File Actions: Open, do something and close
Python allows us to open a file, perform an action on it (reading, writing, appending) and then close the file.  We chose the mode of the file to be consisten with the action that we want to take.  The available modes and actions are:
* r = read only mode
* a = append mode
* w = write mode 
* r+ = read and write mode

In [40]:
# ourName = open( "filename" , "mode")
fileToRead = open( "random.txt", "r" )
contents = fileToRead.read()
fileToRead.close()
print contents

0 10
1 21
2 32
3 43
4 54
5 65
6 76
7 87
8 98
9 109



This reads our entire file in at once.  We can also read in certain numbes of characters or lines of text.
* read: Reads in the next character - you can give it the next number of characters to read in
* readline: Reads in an entire line
* readilnes: Reads in the rest of the file, but splits up the input based on line
Note that later statements start from where you are in the file, not the beginning (or even the beginning of a line)

In [41]:
# ourName = open( "filename", "mode")
fileToRead = open( "random.txt", "r" )
print fileToRead.readline()
print fileToRead.read(3) 
theRest = fileToRead.readlines() 
print theRest
fileToRead.close()

0 10

1 2
['1\n', '2 32\n', '3 43\n', '4 54\n', '5 65\n', '6 76\n', '7 87\n', '8 98\n', '9 109\n']


We can also loop over the lines of a file, which is much more efficient than reading in 

In [42]:
fileToRead = open( "random.txt", "r" )
dataFormatted = []
for line in fileToRead:
    # Strip off the newline and split based on the space
    dataFormatted.append( line.strip('\n').split(' ') )
fileToRead.close()
print dataFormatted

[['0', '10'], ['1', '21'], ['2', '32'], ['3', '43'], ['4', '54'], ['5', '65'], ['6', '76'], ['7', '87'], ['8', '98'], ['9', '109']]


We can also write to a file and append to a file in a very similar fashion.  Note that the write will delete any file that is already there with that name.

In [43]:
# Write to a file
fileToWrite = open( "sampleWrite.txt", "w" )
fileToWrite.write( "Does this show up?" ) 
fileToWrite.close()

# Write to the same file
fileToWrite = open("sampleWrite.txt", "w" )
fileToWrite.write( "Is this the second line?" ) 
fileToWrite.write( "\nAdd another Line" )
fileToWrite.close()

# Append to the file
fileToAppend = open( "sampleWrite.txt", "a" )
fileToAppend.write( "\nAppended to the end" ) 
fileToAppend.close()

# Check to see what's there
fileToCheck = open( "sampleWrite.txt", "r" )
print fileToCheck.read()
fileToCheck.close()

Is this the second line?
Add another Line
Appended to the end


### Reading in regular data

The read commands we were doing before were very poweful, but they assume you will do all of your data prep and formatting as an additional step.  The numpy module comes with two excellent functions that allow you to read input files with known formatting in an easier manner: loadtxt and genfromtext. 

In [44]:
import numpy as np

# Read in similar to before
allData = np.loadtxt( "random.txt", delimiter=" " )
print "All of the data:", allData

# Break into columns
colA, colB = np.loadtxt( "random.txt", unpack=True )
print "Columns: ", colA,colB

# Read in a single column
firstCol = np.loadtxt( "random.txt", usecols=[0],unpack=True )
print "First Column: ", firstCol

# Change the type
secColSt = np.loadtxt( "random.txt", dtype=str, usecols=[1], unpack=True )
print "Second Column as strings:", secColSt

All of the data: [[  0.  10.]
 [  1.  21.]
 [  2.  32.]
 [  3.  43.]
 [  4.  54.]
 [  5.  65.]
 [  6.  76.]
 [  7.  87.]
 [  8.  98.]
 [  9. 109.]]
Columns:  [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.] [ 10.  21.  32.  43.  54.  65.  76.  87.  98. 109.]
First Column:  [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
Second Column as strings: ['10' '21' '32' '43' '54' '65' '76' '87' '98' '109']


Numpy's genfromtxt is used the same way, but has some additional error handling.   This can be useful if your data is incomplete, or has some errors. You can put anything you want for missing values - common choices are nan (not a number) or 0. Do something so that you can easily identify problems and either filter them out or know you are ok ignoring them.

In [45]:
import numpy as np
digitNum,piVal = np.genfromtxt("corruptData.dat",delimiter=",",unpack=True,missing_values=' ',filling_values=np.nan)
print digitNum
print piVal

[ 0.  1.  2.  3. nan  5.  6.  7.  8.  9. 10. 11. 12.]
[ 3. nan  1.  4.  1.  5.  9.  2.  6.  5.  3.  5.  9.]


## Lambda functions

Lambda functions are anonymous functions which are defined by the keywork lambda

In [46]:
import numpy as np
multByPiLmb = lambda x: x * np.pi
multByPiLmb(3)

9.42477796076938

Note that this is similar to:

In [47]:
import numpy as np
def multByPiFxn( input ):
    return input * np.pi

multByPiFxn(3)

9.42477796076938

However, the way that the lambda function is written allows us to use it in ways that regular expressions cannot be used. For example, let's say you wanted to transform a variable as it was being read in.

In [48]:
import numpy as np
colA, colB = np.loadtxt( "random.txt", unpack=True )
print "Original Columns: ", colA,colB

colC, colD = np.loadtxt( "random.txt", converters = {0: lambda s: int(s)+10}, unpack=True )
print "Updated Columns: ", colC,colD

Original Columns:  [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.] [ 10.  21.  32.  43.  54.  65.  76.  87.  98. 109.]
Updated Columns:  [10. 11. 12. 13. 14. 15. 16. 17. 18. 19.] [ 10.  21.  32.  43.  54.  65.  76.  87.  98. 109.]
