# Different methods for loading data
This notebook describes some various methods to read data from a file

## Imports

In [1]:
import numpy as np
import pandas as pd

### Setting some attributes

In [2]:
filename = "../data/ABCDE-data.csv"

## Using numpy
### numpy.loadtxt
Read from simple text file. Some useful options:
* ```name```: File, filename or **generator** to read.
* ```dtype```: setting the type of the resulting array (default being float)
* ```delimiter```: string used for separating values (default being white space)
* ```skiprows```: skip the first ```skiprows``` lines
* ```usecols```: read only the provided columns number, 0 being first column. Example: usecols = (1,3)

Typically ```numpy.loadtxt``` is to be used with data saved with ```numpy.savetxt```: setting the type of the resulting array (default being float)

In [4]:
data = np.loadtxt(filename, skiprows = 1, delimiter = ",")

In [10]:
print(data.dtype)
print(data[:5])

float64
[[  1.276  21.4    63.957 216.204 528.   ]
 [  1.002  21.95   61.697 204.484 514.   ]
 [  1.114  22.454  63.522 205.608 514.   ]
 [  1.133  22.494  61.59  206.565 501.   ]
 [  0.845  21.654  63.729 201.289 532.   ]]


### numpy.genfromtxt
Similar to ```numpy.loadtxt``` (share some options, typically the ones described previously for ```numpy.loadtxt```) but add some useful features such as:
* ```names``` : reading the name of the columns from the first row after ```skip_header``` rows.
* ```dtype``` : if None, the dtypes will be determined by the contents of each column, individually.
* ```missing_values``` : The set of string corresponding to missing data (e.g. "na").
* ```filling_values``` : The set of values to be used as default when the data are missing.

In [17]:
data = np.genfromtxt(filename, names = True, delimiter = ",")

In [16]:
print(data.dtype)
print(data[:5])

[('A', '<f8'), ('B', '<f8'), ('C', '<f8'), ('D', '<f8'), ('E', '<f8')]
[(1.276, 21.4  , 63.957, 216.204, 528.)
 (1.002, 21.95 , 61.697, 204.484, 514.)
 (1.114, 22.454, 63.522, 205.608, 514.)
 (1.133, 22.494, 61.59 , 206.565, 501.)
 (0.845, 21.654, 63.729, 201.289, 532.)]


In [18]:
data = np.genfromtxt(filename, names = True, dtype = None, delimiter = ",")

#### Note 'E' is now of type i4

In [19]:
print(data.dtype)
print(data[:5])

[('A', '<f8'), ('B', '<f8'), ('C', '<f8'), ('D', '<f8'), ('E', '<i4')]
[(1.276, 21.4  , 63.957, 216.204, 528)
 (1.002, 21.95 , 61.697, 204.484, 514)
 (1.114, 22.454, 63.522, 205.608, 514)
 (1.133, 22.494, 61.59 , 206.565, 501)
 (0.845, 21.654, 63.729, 201.289, 532)]
