## Importing Data with NumPy

### 1. Run the following cells:

In [3]:
import numpy as np

In [4]:
np.set_printoptions(suppress = True, linewidth = 150)

### 2. Use loadtxt() to import the same file and display its contents. 
   <b> Hint:</b> By default np.loadtxt() asusmes all the values will be numeric, so it crashes when it encounters text data. We can bypass this by specifying the datatype to NumPy strings when importing. 

In [5]:
lending_co_data_numeric_1 = np.loadtxt("..\Data\Lending-Company-Numeric-Data.csv",delimiter = ',')
type(lending_co_data_numeric_1)
print(lending_co_data_numeric_1.shape)
lending_co_data_numeric_1

(1043, 6)


array([[ 2000.,    40.,   365.,  3121.,  4241., 13621.],
       [ 2000.,    40.,   365.,  3061.,  4171., 15041.],
       [ 1000.,    40.,   365.,  2160.,  3280., 15340.],
       ...,
       [ 2000.,    40.,   365.,  4201.,  5001., 16600.],
       [ 1000.,    40.,   365.,  2080.,  3320., 15600.],
       [ 2000.,    40.,   365.,  4601.,  4601., 16600.]])

### 3. Use np.genfromtxt() to import the "Lending-Company-Total-Price.csv" file and display its contents.
    You can open the file in a text editor like Notepad++ to check its delimiter. 

In [10]:
lending_co_TP = np.genfromtxt('..\Data\Lending-Company-Total-Price.csv',delimiter = ',',dtype = str)
lending_co_TP

array([['LoanID', 'StringID', 'Product', ..., 'Location', 'Region', 'TotalPrice'],
       ['1', 'id_1', 'Product B', ..., 'Location 2', 'Region 2', '16600'],
       ['2', 'id_2', 'Product B', ..., 'Location 3', '', '16600'],
       ...,
       ['413', 'id_413', 'Product B', ..., 'Location 135', 'Region 1', '16600'],
       ['414', 'id_414', 'Product C', ..., 'Location 200', 'Region 6', '15600'],
       ['415', 'id_415', 'Product A', ..., 'Location 8', 'Region 2', '22250']], dtype='<U14')

### 4. Using the arguments of the np.genfromtxt() function, do the following data cleaning:
    A) Set the data type to strings. 
    B) Skip the first row of the dataset. 
    C) Skip the last 15 rows of the dataset. 
    D) Only pull data from the 2nd, 3rd and 5th columns. 
   (Note: You can do all of these at the same time.)

In [73]:
lending_co_TP = np.genfromtxt('..\Data\Lending-Company-Total-Price.csv',
                              delimiter = ',',
                              dtype = str,
                              skip_header = 1,
                              skip_footer = 15,
                              usecols = (1,2,4))
lending_co_TP

array([['id_1', 'Product B', 'Location 2'],
       ['id_2', 'Product B', 'Location 3'],
       ['id_3', 'Product C', 'Location 5'],
       ...,
       ['id_398', 'Product A', 'Location 29'],
       ['id_399', 'Product B', 'Location 73'],
       ['id_400', 'Product B', 'Location 53']], dtype='<U12')

### 5. Refer to the documentation of the np.genfromtxt() function and examine the following arguments and what they do:
    A) comments
    B) converters 
    C) missing_values
    D) excludelist
    E) deletechars
    D) replace_space 
    E) autostrip
   (You're <b>not</b> expected to provide any coding for this part. The cell below is provided for your convenience in case you want to try using the arguments as you go through them.)

In [78]:
lending_co_LT = np.genfromtxt('../Data/lending-co-LT.csv',
                              delimiter = ',',
                              comments = "P",
                              #usecols = (1,2,4),
                              converters = None,
                              missing_values ='',
                              filling_values = 'Hello',
                              excludelist = ['LoanID'],
                              deletechars = 'Loan',
                              replace_space = 'y',
                             autostrip  = True,
                             dtype = 'str')
lending_co_LT

array([['LoanID', 'StringID', ''],
       ['1', 'id_1', ''],
       ['2', 'id_2', ''],
       ...,
       ['1041', 'id_1041', ''],
       ['1042', 'id_1042', ''],
       ['1043', 'id_1043', '']], dtype='<U8')

### 6. Now that you understand what these arguments do, add them to the code you wrote a few cells ago. 
    A) comments
    B) converters 
    C) missing_values
    D) excludelist
    E) deletechars
    D) replace_space 
    E) autostrip
   <b>Note</b>: If you don't know what values to set, just refer to the documentation and set the defaults. 

In [49]:
lending_co_LT = np.genfromtxt('../Data/lending-co-LT.csv',
                              delimiter = ',',
                              comments = "P",
                              dtype = str,
                              converters = None,
                              missing_values ='',
                              filling_values = 'Hello',
                              excludelist = ['LoanID'],
                              deletechars = 'Loan',
                              replace_space = 'y',
                             autostrip  = True)
lending_co_LT

array([['LoanID', 'StringID', ''],
       ['1', 'id_1', ''],
       ['2', 'id_2', ''],
       ...,
       ['1041', 'id_1041', ''],
       ['1042', 'id_1042', ''],
       ['1043', 'id_1043', '']], dtype='<U8')

### 7. The first and last columns were the only ones with numeric data, so let's import only them.

In [79]:
lending_co_LT = np.genfromtxt('../Data/lending-co-LT.csv',
                              usecols = (0,-1),delimiter = ',')
lending_co_LT

array([[   nan,    nan],
       [    1., 16600.],
       [    2., 16600.],
       ...,
       [ 1041., 16600.],
       [ 1042., 15600.],
       [ 1043., 16600.]])

### 8. Alright, now that the data we're using only contains numbers, let's see how the inputs change if we change the datatype argument to:
    A) The default (not specify it)
    B) 32-bit integers (np.int32)
    C) 32-bit floats (np.float32)
    D) 64-bit complex numbers (np.complex64)
    E) Unicode (np.unicode)
    F) Objects (np.object)
    G) None

In [59]:
lending_co_LT = np.genfromtxt('../Data/lending-co-LT.csv',
                              usecols = (0,6),
                              delimiter = ',')
lending_co_LT

array([[   nan,    nan],
       [    1., 16600.],
       [    2., 16600.],
       ...,
       [ 1041., 16600.],
       [ 1042., 15600.],
       [ 1043., 16600.]])

In [60]:
lending_co_LT = np.genfromtxt('../Data/lending-co-LT.csv',
                              usecols = (0,6),
                              delimiter = ',',
                              dtype = np.int32)
lending_co_LT

array([[   -1,    -1],
       [    1, 16600],
       [    2, 16600],
       ...,
       [ 1041, 16600],
       [ 1042, 15600],
       [ 1043, 16600]])

In [61]:
lending_co_LT = np.genfromtxt('../Data/lending-co-LT.csv',
                              usecols = (0,6),
                              delimiter = ',',
                              dtype = np.float32)
lending_co_LT

array([[   nan,    nan],
       [    1., 16600.],
       [    2., 16600.],
       ...,
       [ 1041., 16600.],
       [ 1042., 15600.],
       [ 1043., 16600.]], dtype=float32)

In [62]:
lending_co_LT = np.genfromtxt('../Data/lending-co-LT.csv',
                              usecols = (0,6),
                              delimiter = ',',
                              dtype = np.complex64)
lending_co_LT

array([[   nan+0.j,    nan+0.j],
       [    1.+0.j, 16600.+0.j],
       [    2.+0.j, 16600.+0.j],
       ...,
       [ 1041.+0.j, 16600.+0.j],
       [ 1042.+0.j, 15600.+0.j],
       [ 1043.+0.j, 16600.+0.j]], dtype=complex64)

In [64]:
lending_co_LT = np.genfromtxt('../Data/lending-co-LT.csv',
                              usecols = (0,6),
                              delimiter = ',',
                              dtype = np.compat.unicode)
lending_co_LT

array([['LoanID', 'TotalPrice'],
       ['1', '16600.0'],
       ['2', '16600.0'],
       ...,
       ['1041', '16600.0'],
       ['1042', '15600.0'],
       ['1043', '16600.0']], dtype='<U10')

In [66]:
lending_co_LT = np.genfromtxt('../Data/lending-co-LT.csv',
                              usecols = (0,6),
                              delimiter = ',',
                              dtype = object)
lending_co_LT

array([[b'LoanID', b'TotalPrice'],
       [b'1', b'16600.0'],
       [b'2', b'16600.0'],
       ...,
       [b'1041', b'16600.0'],
       [b'1042', b'15600.0'],
       [b'1043', b'16600.0']], dtype=object)

In [67]:
lending_co_LT = np.genfromtxt('../Data/lending-co-LT.csv',
                              usecols = (0,6),
                              delimiter = ',',
                              dtype = None)
lending_co_LT

  lending_co_LT = np.genfromtxt('../Data/lending-co-LT.csv',


array([[b'LoanID', b'TotalPrice'],
       [b'1', b'16600.0'],
       [b'2', b'16600.0'],
       ...,
       [b'1041', b'16600.0'],
       [b'1042', b'15600.0'],
       [b'1043', b'16600.0']], dtype='|S10')

### 9. Setting the datatype to <i>None </i> means the function automatically chooses the datatype for each column of the text file, so let's see how this works for all the columns.

In [81]:
lending_co_LT = np.genfromtxt('../Data/lending-co-LT.csv',
                              delimiter = ',',
                              dtype = None)
lending_co_LT

  lending_co_LT = np.genfromtxt('../Data/lending-co-LT.csv',


array([[b'LoanID', b'StringID', b'Product', ..., b'Location', b'Region', b'TotalPrice'],
       [b'1', b'id_1', b'Product B', ..., b'Location 2', b'Region 2', b'16600.0'],
       [b'2', b'id_2', b'Product B', ..., b'Location 3', b'', b'16600.0'],
       ...,
       [b'1041', b'id_1041', b'Product B', ..., b'Location 23', b'Region 4', b'16600.0'],
       [b'1042', b'id_1042', b'Product C', ..., b'Location 52', b'Region 6', b'15600.0'],
       [b'1043', b'id_1043', b'Product B', ..., b'Location 142', b'Region 6', b'16600.0']], dtype='|S14')