# V3: Importing flat files using NumPy

## Why NumPy: 

* NumPy array: Standard for storing numerical data.
* Essential for other packages: e.g. scikit-learn

* Numpy has various in-built functions to help us to import as arrays: 

        * loadtxt()
        * genfromtxt()
        
        import numpy as np
        filename = 'MNIST.txt'
        data = np.loadtxt( filename, delimiter = ',') # by default delimiter is whitespace
        data
        
#### If our file is having header name as strings and it is having numeric data only then, we need to neglect the first row from it.
We will acheive it as:
    
    import numpy as np
    filename = 'MNIST.txt'
    data = np.loadtxt(filename, delimiter = ',', skiprows = 1 )
    print(data )
    
#### IF we want to choose the particular column of file and skip the first row then:
    
    import numpy as np
    filename = 'MNIST.txt'
    data = np.loadtxt(filename, delimiter =',', skiprows= 1, usecols = [0,2] ) # choosing first and third columns of data
    print(data)
    
#### Importing all data as string by setting dtype as 'str':
    import numpy as np
    filename = 'MNIST.txt'
    data = np.loadtxt( filename, delimiter = ',', dtype = str )
    
### Limitation of loadtxt() function in NumPy:
* It is good for most of the cases. But it tends to break when we have mix datatype in file. For ex. columns having string, float, etc. values in dataset.

#### Next chapter we will see how NumPy has data of mix types. But DataFrame is there ideal way to work with them.


## Example 1: Using NumPy to import flat files
In this exercise, you're now going to load the MNIST digit recognition dataset using the numpy function loadtxt() and see just how easy it can be:

    * The first argument will be the filename.
    * The second will be the delimiter which, in this case, is a comma.
    
You can find more information about the MNIST dataset here on the webpage of Yann LeCun, who is currently Director of AI Research at Facebook and Founding Director of the NYU Center for Data Science, among many other things.

### Steps: 
1. Fill in the arguments of np.loadtxt() by passing file and a comma ',' for the delimiter.
2. Fill in the argument of print() to print the type of the object digits. Use the function type().
3. Execute the rest of the code to visualize one of the rows of the data.


In [1]:
# Import package
import numpy as np

# Assign filename to variable: file
file = 'digits.csv'

# Load file as array: digits
digits = np.loadtxt(file, delimiter=',')

# Print datatype of digits
print(type(digits))
print('/n/n/n', digits)
# Select and reshape a row
im = digits[21, 1:]
im_sq = np.reshape(im, (28, 28))

# Plot reshaped data (matplotlib.pyplot already loaded as plt)
plt.imshow(im_sq, cmap='Greys', interpolation='nearest')
plt.show()


<class 'numpy.ndarray'>
/n/n/n [[ 12. 123.  12. 312.   3.  12.  12. 123.  12. 312.   3.  12.  12. 123.
   12. 312.   3.  12.  12. 123.  12. 312.   3.  12.  12. 123.  12. 312.
    3.]
 [ 12. 123.  12. 312.   3.  12.  12. 123.  12. 312.   3.  12.  12. 123.
   12. 312.   3.  12.  12. 123.  12. 312.   3.  12.  12. 123.  12. 312.
    3.]
 [ 12. 123.  12. 312.   3.  12.  12. 123.  12. 312.   3.  12.  12. 123.
   12. 312.   3.  12.  12. 123.  12. 312.   3.  12.  12. 123.  12. 312.
    3.]
 [ 12. 123.  12. 312.   3.  12.  12. 123.  12. 312.   3.  12.  12. 123.
   12. 312.   3.  12.  12. 123.  12. 312.   3.  12.  12. 123.  12. 312.
    3.]
 [ 12. 123.  12. 312.   3.  12.  12. 123.  12. 312.   3.  12.  12. 123.
   12. 312.   3.  12.  12. 123.  12. 312.   3.  12.  12. 123.  12. 312.
    3.]
 [ 12. 123.  12. 312.   3.  12.  12. 123.  12. 312.   3.  12.  12. 123.
   12. 312.   3.  12.  12. 123.  12. 312.   3.  12.  12. 123.  12. 312.
    3.]
 [ 12. 123.  12. 312.   3.  12.  12. 123.  12. 312.   3. 

ValueError: cannot reshape array of size 28 into shape (28,28)