## Reading and using data from text files

There are different ways of reading data in from text files. These examples start from the assumption that you have a textfile with a list of data, possibly spanning multiple columns.

### numpy.loadtxt
This is the simplest method. It uses a numpy method called loadtxt to read in textfile data, as long as the data is regularly formatted.

In [8]:
import numpy as np

myData = np.loadtxt("test_rainfile_hourly.txt")

In [9]:
myData

array([[  3.2,   0.6,   0.6, ...,   0. ,   0. ,   0. ],
       [  0. ,   0. ,   0. , ...,   0. ,   0. ,   0. ],
       [  0. ,   0. ,   0. , ...,   0. ,   0. ,   0. ],
       ..., 
       [  0.1,   1.1,   1. , ...,  12.5,  12.5,   3.6],
       [  0. ,   0. ,   0. , ...,   0. ,   0. ,   0. ],
       [  0. ,   0. ,   0. , ...,   0. ,   0. ,   0. ]])

As you can see, the object `myData` is a 2-Dimensional array, filled with the values in the sample text data file. The `loadtxt` method has automatically created us an array with all of our data parsed into it from the text file.

## Extracting subsets of data
Now we are going to extract certain subsets of this (or these...) data. We can do this with something called array splicing - a built in feature of numpy arrays. Suppose we want the first column of our data:

In [10]:
myData[:,0]

array([  3.2,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,
         0. ,   0. ,   0. ,   0. ,   1.5,  15.8,   0. ,   0.3,   1. ,
         0. ,   0. ,   0. ,   0.1,   0. ,   0. ])

Some explanation. Numpy (and Python in general) stores array data in row-major order. So when you want to access a certain data element within an array, you specifiy the row first, and then the column. Note that Python starts counting array positions from zero!

So here we are using the wildcard `:` symbol to say we want all elements from the row dimension, and `0` to say we want the "zeroth-column". This gives us a view of the array's first column.

If we wanted the first row, we could just specifiy the following:

In [13]:
myData[0,:]

array([ 3.2,  0.6,  0.6,  0.6,  0.6,  0.6,  0.3,  0.3,  0.3,  0.3,  0.3,
        0.1,  0.1,  0.1,  0.1,  0.1,  0. ,  4.3,  2. ,  2. ,  2. ,  2. ,
        2. ,  2.3,  2.3,  2.3,  2.3,  2.3,  0.1,  0.1,  0.1,  0.1,  0.1,
        0. ,  4.3,  2. ,  2. ,  2. ,  2. ,  2. ,  2.3,  2.3,  2.3,  2.3,
        2.3,  0.1,  0.1,  0.1,  0.1,  0.1,  0. ,  4.3,  2. ,  2. ,  2. ,
        2. ,  2. ,  2.3,  2.3,  2.3,  2.3,  2.3,  0.1,  0.1,  0.1,  0.1,
        0.1,  0. ,  4.3,  2. ,  2. ,  2. ,  2. ,  2. ,  2.3,  2.3,  2.3,
        2.3,  2.3,  0.1,  0.1,  0.1,  0.1,  0.1,  0. ,  4.3,  2. ,  2. ,
        2. ,  2. ,  2. ,  2.3,  2.3,  2.3,  2.3,  2.3,  0.1,  0.1,  0.1,
        0.1,  0.1,  0. ,  5.5,  2. ,  2. ,  2. ,  2. ,  2. ,  0.3,  0.3,
        0.3,  0.3,  0.3,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  5.5,  2. ,
        2. ,  2. ,  2. ,  2. ,  0.3,  0.3,  0.3,  0.3,  0.3,  0. ,  0. ,
        0. ,  0. ,  0. ,  0. ,  5.5,  2. ,  2. ,  2. ,  2. ,  2. ,  0.3,
        0.3,  0.3,  0.3,  0.3,  0. ,  0. ,  0. ,  0

Similarly, if we want the second a third columns, we can do this:

In [15]:
myData[:,1:3]

array([[  0.6,   0.6],
       [  0. ,   0. ],
       [  0. ,   0. ],
       [  0. ,   0. ],
       [  0. ,   0. ],
       [  0. ,   0. ],
       [  0. ,   0. ],
       [  0. ,   0. ],
       [  0. ,   0. ],
       [  0. ,   0. ],
       [  0. ,   0. ],
       [  0. ,   0. ],
       [  0. ,   0. ],
       [  1.5,   0.4],
       [ 21.1,  21.1],
       [  1.7,   1.7],
       [  1.1,   1.1],
       [  1.3,   1. ],
       [  0. ,   0. ],
       [  0. ,   0. ],
       [  0. ,   0. ],
       [  1.1,   1. ],
       [  0. ,   0. ],
       [  0. ,   0. ]])

The syntax of the array splicing is perhaps slightly confusing at first. You may have thought (or forgot, like I did when writing this) that when you specify a range of elements in an array splice, the range is upto *but not including* the upper range bound. i.e. `myData[:,1:2]` would only give you one column of data.

#### Summary
The colon syntax (`:`) specifies a range. However, omitting any bounds on the range will return the whole set of data from that dimension.

### More splicing
You can extract a rectangular subset of data using the same notation: Here we are going to ask for four rows of data, and from those rows, two columns of data. (Don't worry that they are all zeros - that's just part of the dataset and is correct)

In [18]:
myData[2:6,1:3]

array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])

Lo and behold, we now have a 4x2 subset of the original `myData` array.