# Getting Started with `NumPy` 


[**`NumPy`**](http://www.numpy.org/) is one of the fundamental packages for conducting any type of data science or data analysis project in `Python`. 
It seemlessly integrates with other packages like `scikit-learn` or `pandas` and provides an extensive mathematical library to perform analyses or build learning algorithms. 

Throughout this course (and the entire PSDS program) we will rely heavily on the `NumPy` package for data analysis. 
In this notebook we cover some of the data types that `NumPy` affords us as well as some of the methods that are commonly used.


------
 ## What we'll cover...
 
1. [Creating an Array](#array)
2. [Array Indexing](#indexing)
3. [Array Math](#math)
4. [Array Methods](#methods)
5. [Subsetting](#subset)
6. [Joining Arrays](#joining)

------


<a id='array'></a>
## 1. Creating an Array

One of the most popular objects that `NumPy` provides is the the `numpy.array`. These arrays come with a variety of methods that can be run on them, but before we begin to do anything interesting with arrays, we first need to learn how to create them. 

In [None]:
# Import the NumPy library and call it by a shorthand nickname 'np' using the 'as' statement
import numpy as np

one_d_data = np.array([10,20,30,40,50,60,70,80,90], dtype=np.int)

print(one_d_data)
print()

two_d_data = np.array([[10,20,30,40,50,60,70,80,90], 
                       [11,21,31,41,51,61,71,81,91], 
                       [12,22,32,42,52,62,72,82,92]
                      ]
                       , dtype=np.int)

print(two_d_data)


The `np.array()` method creates an array from our list of lists. In this case, it is a 2-dimensional array, which can be thought of as having one dimension of rows and one dimension of columns.  We can take a look at the shape (size) of this array by doing doing the following...

In [None]:
two_d_data.shape

`two_d_data.shape` returns a tuple object, where the first value is the number of items in the first dimension (number of rows) and the second is the number of items in the second dimension (number of columns). 
This (row, column) notation will come up over and over in `NumPy` as well as other data science libraries such as `pandas`, so be sure remember which is which.

### Alternative ways to creating arrays

We can also create our own arrays by simply specifying a list or tuple of values. Below are some of the ways we can construct arrays.

We can pass the `np.array()` method a list of values...

You can also create arrays with different data types as inputs. However, since all of the data types of a single array have to be of the same type, the output will coerce the `int`s to `strings` so each of the numeric values will be shown inside a pair of single quotes, e.g. '1'.

In [None]:
np.array(['x', 'y', 0, 1])

We can even create an array of 0s. The `.zeros()` method can take a tuple as an argument that specifies the number of rows and columns that we would like. Notice how the code below creates an array of all zeros with 4 rows and 5 columns.

In [None]:
np.zeros((4,5))

The `np.arange()` method will return an array with a sequence of evenly spaced intervals. If you only specify one number, an array of that length will be returned starting at the index of 0.

In [None]:
np.arange(6)

If you pass two integers, then it will return an array starting at the first value and going to, but not including, the second. You can also specify the data type that you wish for it to be.

In [None]:
np.arange(2, 10, dtype=np.float)

If you add a third number, that will then become the interval spacing. In the example below, the code creates an array that begins at 2, ends at 5 (but doesn't include it), and every value in between should be .5 greater than the value before it.

In [None]:
np.arange(2, 5, 0.5)

Finally, we can create an array of random values as well...

In [None]:
np.random.rand(3,3)

-----

<a id='indexing'></a>
## 2. Array Indexing

One of the major benefits of using `NumPy` is its indexing capabilities. Unlike the string of string method used above, we can easily extract entire rows **and** columns, or slices of them.

The `[]` square brackets allow us to access elements of the NumPy array.
In the case of one-dimensional arrays, a single numerical value in the `[]` will access the data element.
In Python, the array indexing, like `list` indexing, is **zero-based**. 
Therefore, the first element is [0], the second is [1], and so on.


In [None]:
myArray = np.array(['A','B','C','D'])

print("First element is index [0], which equals : ",myArray[0])
print("Fourth and last element is index [3], which equals : ",myArray[3])


If the array is two-dimensional, we can select data using the row index specified as the first number and the column as the second. 
For example, if we want to select the value at row-index 2 and column-index 4, we do this...

In [None]:
print(two_d_data)
print()

print("Element at row index 2 and column index 4 equals :", two_d_data[2,4])

Or we can take slices of the array by specifying a range of values. 
For example, if we wanted the second row of the fourth through seventh column, we would do this...

In [None]:
two_d_data[1, 3:7]

And if we wanted the entire second row, we would do this...

In [None]:
two_d_data[1,]

However, if you want to select an entire column, you will have to provide the `:` operator in the row-index position like so...

This syntax is simply a shortcut, or default, for the entire row.

In [None]:
print(two_d_data[:,4])

### Modifying the array

Indexing is particularly important when it comes to modifying values of the array. For these next couple of code blocks, we are going to be working with a copy of the `two_d_data` array. 
One might think that you could just create a copy by using the assignment operator, `=` like so:

```python 
two_d_data_copy = two_d_data
```

However, with `NumPy` arrays, any future modification you make to the `two_d_data_cp` will also modify the `two_d_data` array.<br><br>
So be careful! <br><br>
If you need to create a copy of the `two_d_data` array you should **always** use the `np.copy()` method. 

In [None]:
two_d_data_copy = np.copy(two_d_data)

Now, if we want to change the value at any particular point we could just select the value by its indexes and assign it a new value like so...

In [None]:
two_d_data_copy[2,5] = 65
print("The original value of two_d_data[2,5] was {} but now it is {}."
      .format(two_d_data[2,5],two_d_data_copy[2,5]))

We can also change the value of an entire slice. If we wanted to change the entire 4th column to the value 400, you can do that.

In [None]:
two_d_data_copy[:,3]= 400

print("The first few values of the 4th column (index = 3) of the two_d_data array were {} and now they are {}."
      .format(two_d_data[0:2,3],two_d_data_copy[0:2,3]))


---

<a id='math'></a>
## 3. Array Math

One of the major strengths of `NumPy` is it can easily perform a variety of mathematical operations on arrays. Let's take a look a the column that we will be modifying in the next few exercises... 

In [None]:
two_d_data_copy = np.copy(two_d_data)
two_d_data_copy[:,3]

If we wanted to add 10 to every value in this column, all we would have to do is specify the column, the operator `+` and then `10`

In [None]:
two_d_data_copy[:,3] + 10

Similarly with subtraction...

In [None]:
two_d_data_copy[:,3] - 10

Multiplication!

In [None]:
two_d_data_copy[:,3] * .01

...or divide by 100.

In [None]:
two_d_data_copy[:,3] / 100

We can also perform operations between arrays. Take for example, multiplying column 4 by column 4...

In [None]:
two_d_data_copy[:,3] * two_d_data_copy[:,3]

----
<a id='methods'></a>
## 4. Array Methods

`NumPy` also provides a variety of convenient array methods to make our life easier. For example, if we wanted to find the sum of all of the values in the 4th column, we can simply use the `sum()` method.

In [None]:
# Methods are invoked on an array using the '.' notation then the function name

a = two_d_data[:,3]
a.sum() 

We may also want to know the range of values using the `min()` and `max()` functions.
Note, we chain the subsetting to the function call.

In [None]:
two_d_data[:,3].min()

In [None]:
two_d_data[:,3].max()

The `two_d_data` array currently has 3 rows and 9 columns. There are situations in which you may want to transpose the array so that the rows become the columns and the columns become the rows. In order to do that, you simply call the `.transpose()` method.

In [None]:
two_d_data_tr = two_d_data.transpose()
two_d_data_tr.shape

----
<a id='subset'></a>
## 5. Subsetting

We may also want to subset the array based on the values meeting  particular conditions. 
We can use  logical operators to return an array of `bool`s, which can then be used to subset our array.

In this example, we will scan all rows and return the rows such that the fourth element is greater than or equal to 41.

In [None]:
two_d_data[:,3] >= 41

We can now name this array something, and then use it as our row subsetter like so...

In [None]:
subset_rows = two_d_data[:,3] >= 41
two_d_data[subset_rows,:]

As we can see, only 2 rows (the last two of the original three row) are returned from the array and note that the value of the 4th item in each list is >= 41. 

We can also specify multiple conditions.

In [None]:
subset_rows = (two_d_data[:,3] >= 41) & (two_d_data[:,7] >= 82)

two_d_data[subset_rows,:]

When joining logical conditions:
  * use the `&` symbol to signify conjunction (and)
  * use the `|` symbol to signify conjunction (or) <br><br> 
Be careful when combining `&` and `|`.


---

<a id='joining'></a>
## 6. Joining Arrays

We can also join arrays. 
NumPy has numerous methods to join arrays, the most common are:
  * hstack - Horizontal stack
  * vstack - Vertical stack
  * dstack - depth stack arrays, i.e., along z-axis

In [None]:
# define a few 1x3 arrays

abc = np.array(['a','b','c'])
dfg = np.array(['d','f','g'])
hij = np.array(['h','i','j'])
klm = np.array(['k','l','m'])

In [None]:
# Notice that the argument is (abc, dfg), which is a tuple group of two variables

np.hstack(
        (abc, dfg)
        )

In [None]:
# We can use any size tuple, not just two elements

np.hstack(
        (abc, dfg,hij,klm)
        )

In [None]:
# Vertically stacking

mat1 = np.vstack((hij, klm))  # Assigned to a variable for later use

print(mat1)

In [None]:
# Vertically Stacking another

mat2 = np.vstack((abc, dfg))
print(mat2)

# Depth stacking creating a 3-D array

np.dstack((mat2, mat1))

---

### Digging Deeper
1. https://docs.scipy.org/doc/numpy-dev/user/quickstart.html
2. http://cs231n.github.io/python-numpy-tutorial/#numpy

---

# SAVE YOUR NOTEBOOK!!