# Understanding Memory Efficiency of Numpy Arrays

In [1]:
import numpy as np
import pandas as pd
import random
import sys

In [2]:
my_list = [10,12,14]
my_list

[10, 12, 14]

<div class="alert alert-block alert-info">
<p><b>`sys.getsizeof()`</b></p>

<p>Returns the number of bytes occupied by a container (or any object) in the memory. It is a good practice to often check the size of the containers that are created in a program. <b> Make sure you `import sys` to use this function</b></p>

<p> [More info](https://docs.python.org/3/library/sys.html#sys.getsizeof)</p>
</div>

In [3]:
print(sys.getsizeof(my_list))

88


Add my notes here!

In [4]:
my_numpy = np.array(my_list)
my_numpy

array([10, 12, 14])

In [5]:
print(sys.getsizeof(my_numpy))

108


In [6]:
my_list = [x for x in range(0,10000)]

In [7]:
print(sys.getsizeof(my_list))

87624


In [8]:
my_numpy = np.array(my_list)
my_numpy

array([   0,    1,    2, ..., 9997, 9998, 9999])

In [9]:
print(sys.getsizeof(my_numpy))

40096


### Python lists are not memory efficient

As you saw in the above example, whenever the list is of a descent size, Python List was occupying a lot more memory (more bytes) than a Numpy Array. This is one of the main reasons to use Numpy Arrays. 

# Time Efficiency: Numpy Universal Functions (UFuncs) to the Rescue

## The Slow Python Lists 

We saw earlier that the python lists are **not memory efficient**, but we'll also see that they are  **not time efficient** when performing operations on a large number of data elements. 

This is **very bad news** for us, since that's pretty much the core of what we do as data scientists. Thankfully, NumPy provides us a way to perform repetitive operations with lightning speed.

<div class="alert alert-block alert-info">

Python has a <strong>huge</strong> community of developers and users who create awesome libraries like NumPy and give them away for free.

</div> 


Before we show how awesome NumPy is, let's show how bad the problem can be in normal Python. We'll start by using an example that is similar to your textbook.


### Reciprocals with Python Lists

In [10]:
# Define a function that will take an argument (parameter) called `lst`
# It will return another list with the reciprocal values
def compute_reciprocals_list(lst):
    
    #Create an empty list that gets appended with reciprocal values one at a time
    output = []
    
    # For each element 'elem' in the 'lst', compute the reciprocal and append it to 
    # the output list
    for elem in lst:
        output.append(1/elem)
    
    return output

list_one = [1,2,3,4,5,6,7]
print(compute_reciprocals_list(list_one))

[1.0, 0.5, 0.3333333333333333, 0.25, 0.2, 0.16666666666666666, 0.14285714285714285]


In [11]:
list_one = range(1,10)
%timeit -n 1 compute_reciprocals_list(list_one)

1 loop, best of 3: 1.2 µs per loop


#### Timing Code Execution: `%timeit` 

When dealing with large amounts of data, you are going to want to learn how to make your code run fast. To be able to make it faster, you have to be able to see how long the various parts of your code take to execute.

IPython (Jupyter) makes this extremely easy to do with the **`%timeit` magic command.**

<div class="alert alert-block alert-info">
`%timeit -n 1`: means that you are asking Jupyter to run it once and report the time it took to run it 
</div>


In [12]:
%timeit?

<div class="alert alert-block alert-info">
<h5> Measures of execution time </h5>
<p>$ ns $ - Nano second, it is equal to 1/1,000,000,000 of a second (1 billionth of a second)</p>
<p>$\mu s$ - Micro second, it is equal to 1/1,000,000 of a second ( 1 millionth of a second)</p>
<p>$ ms$ - Milli second, it is equal to 1/1000 of a second ( 1 thousandth of a second) </p>
<p>$ s$ - Second</p>
</div>

### Reciprocals with Numpy Arrays and For loops

In [13]:
# Define a function that will take an argument (parameter) called `array_one`
# It will return another array with the reciprocal values
def compute_reciprocals_numpy(array_one):
    
    # Create an `output` array that starts with the same number 
    # of elements that are in the `array_one` parameter.
    output = np.empty(len(array_one)) 
    
    # For each item in the `array_one` parameter
    # Retrieve its value and index.
    for index, value in enumerate(array_one):
        
        # Update the same index position in the `output` object
        # With 1.0 divided by the current interation value
        output[index] = 1.0 / value
        
    # Return the updated `output` array.    
    return output

array_one = np.arange(1,10)
compute_reciprocals_numpy(array_one)

array([ 1.        ,  0.5       ,  0.33333333,  0.25      ,  0.2       ,
        0.16666667,  0.14285714,  0.125     ,  0.11111111])

In [14]:
array_one = np.arange(1,10)
%timeit -n 1 compute_reciprocals_numpy(array_one)

1 loop, best of 3: 16.5 µs per loop


In [15]:
big_list = [random.randint(1,100) for x in range(1,1000000)]
%timeit -n 1 compute_reciprocals_list(big_list)

1 loop, best of 3: 115 ms per loop


In [16]:
big_array = np.random.randint(1, 100, size=1000000)
%timeit -n 1 compute_reciprocals_numpy(big_array)

1 loop, best of 3: 1.65 s per loop


### Wait! Why is a NumPy array slower than a list? 

You cold be thinking, why are NumPy arrays slower than a Python List? This is because, though we have created a memory efficient NumPy array, but we are still using it in the traditional loops and **LOOPS ARE SLOW**. 

## UFuncs to the Rescue <a name="ufuncs"></a>

The NumPy package has **UFuncs**, or **Universal Functions** which can dramatically improve the speed operations on array elements. They are also referred to a **vectorized** operations.

Basically, these functions push the loop processing into the C code that lies underneath Python/NumPy so that the operations are performed much faster than normal.

This only works because all the data elements of an array are of the same type.

In [17]:
array_one = np.arange(1,10)

print(array_one)
print(compute_reciprocals_numpy(array_one))

[1 2 3 4 5 6 7 8 9]
[ 1.          0.5         0.33333333  0.25        0.2         0.16666667
  0.14285714  0.125       0.11111111]


In [18]:
# UFunc / Vectorized Version
# This notation is as if you are saying take 1 and divide each by each element 
# of `array_one` and store the results
print(1 / array_one)

[ 1.          0.5         0.33333333  0.25        0.2         0.16666667
  0.14285714  0.125       0.11111111]


In [19]:
simple_var = 10
1/ simple_var

0.1

In [20]:
simple_list = [10,20,30]
1/simple_list

TypeError: unsupported operand type(s) for /: 'int' and 'list'

In [21]:
big_array = np.random.randint(1, 100, size=1000000)

# Now time the UFunc approach
# Remember, the other way took a looooong time.
%timeit -n 1 (1.0 / big_array)

1 loop, best of 3: 3.08 ms per loop


### Takeaway: Loops are a big NO! NO!

As you could see there is dramatic improvements in terms of speed by using vecotrized functions (UFuncs) instead of loops. This is a very important when working with large datasets. Hence, **avoid writing loops and use the built-in functions in NumPy** to improve the speed. 

### Arithmetic UFuncs
As we just demonstrated, there is a UFunc for division operations. It probably will not surprise you then to discover that all the normal Python arithmetic operations are replicated with UFuncs.

Here are some examples:

In [22]:
simple_int_array = np.arange(1, 6)
simple_int_array

array([1, 2, 3, 4, 5])

In [23]:
# Add 5 to each array element
simple_int_array + 5

array([ 6,  7,  8,  9, 10])

In [24]:
# Subtract each element from 10
# Notice the somewhat subtle difference here.  It's important.
10 - simple_int_array
#simple_int_array -10

array([9, 8, 7, 6, 5])

In [25]:
# You can perform multiple operations.
# Standard math order of operations is followed

# Raise each element to the 3rd power and subtract 10
simple_int_array ** 3 - 10

array([ -9,  -2,  17,  54, 115], dtype=int32)

In [26]:
my_list = [1,2,3]
my_list**3 -10

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

#### An Alternative Syntax
In additional to using standard mathematical operators (i.e. `+, -, *, /, **`) you can also accomplish the same thing by invoking the UFuncs by their names.

For example:

In [27]:
# Add 3.5 to each element of our `simple_int_array`
# Notice how the ints are "upcasted" to floats?
np.add(3.5, simple_int_array)


array([ 4.5,  5.5,  6.5,  7.5,  8.5])

In [28]:
# Divide each array element by 4
np.divide(simple_int_array, 4)

array([ 0.25,  0.5 ,  0.75,  1.  ,  1.25])

In [29]:
# And notice that the order of parameters is important
# When dividing and substracting...
np.divide(4, simple_int_array)

array([ 4.        ,  2.        ,  1.33333333,  1.        ,  0.8       ])

#### Summary Table
Here is the summary table of common arithmetic UFuncs availble to you.


| Operator      | Equivalent ufunc    | Description |                         
|---------------|---------------------|---------------------------------------|
|``+``          |``np.add``           |Addition (e.g., ``1 + 1 = 2``)         |
|``-``          |``np.subtract``      |Subtraction (e.g., ``3 - 2 = 1``)      |
|``-``          |``np.negative``      |Unary negation (e.g., ``-2``)          |
|``*``          |``np.multiply``      |Multiplication (e.g., ``2 * 3 = 6``)   |
|``/``          |``np.divide``        |Division (e.g., ``3 / 2 = 1.5``)       |
|``//``         |``np.floor_divide``  |Floor division (e.g., ``3 // 2 = 1``)  |
|``**``         |``np.power``         |Exponentiation (e.g., ``2 ** 3 = 8``)  |
|``%``          |``np.mod``           |Modulus/remainder (e.g., ``9 % 4 = 1``)|

#### Operations between Two NumPy Arrays
In your textbook, it talks about how you have invoke arithmetic UFuncs with `scalar` values or other arrays.

For those who are not programming experts, a **scalar** value simply means that is in an object with a single value -- like a number. This is opposed to a **container**-type object like a `list` or `ndarray` that holds multiple values.

Let's see how you can use UFuncs where both objects are arrays.



In [30]:
# Let's create to new arrays.
# One will have the numbers 1-5 and the other 6-10
one_to_five = np.arange(1, 6)
six_to_ten = np.arange(6, 11)

print(one_to_five, six_to_ten)

[1 2 3 4 5] [ 6  7  8  9 10]


In [31]:
# Now lets add them together.
# Notice how it takes the 1st element of both and adds them together
# then the second and so on...
np.add(one_to_five, six_to_ten)

array([ 7,  9, 11, 13, 15])

In [32]:
# The same thing will happen with other operations.
# Here will we divide each element of `one_to_five` by `six_to_ten`
one_to_five / six_to_ten

array([ 0.16666667,  0.28571429,  0.375     ,  0.44444444,  0.5       ])

<div class="alert alert-block alert-warning">
<h5>Important Note!</h5>

<p>Being able to perform mathematical operations between two arrays is a really powerful tool.  But, take note that this only works when you have two arrays of the same size and shape.</p>

</div>

In [33]:
# Shape mismatched arrays will cause problems...
np.arange(5) + np.arange(10)

ValueError: operands could not be broadcast together with shapes (5,) (10,) 

## Many more mathematical operations

* **`np.abs`**: get the absolute value
* **`np.sin`, `np.cos`, `np.tan`**: trignometric operations
* **`np.power`, `np.exp`, `np.exp2`**: exponent operations
* **`np.log`, `np.log2`, `np.log10`**: logorithmic operations

# Array Aggregation with NumPy

We can use NumPy to compute summary statistics for the data in question. In the following, we will see some important summary statistics performed using NumPy functions

## `np.sum`

In [34]:
# Let's get our familiar int array with 1 to 10 in it.
simple_int_array = np.arange(1, 11)
simple_int_array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [35]:
# Here's how easy it is to get the sum all of element values.
np.sum(simple_int_array)

55

<div class="alert alert-block alert-warning">
<h3>NumPy Aggregations vs. Built-in Aggregations</h3>
<p>We have seen in an earlier class that Python has a built-in standard `sum` method ( as well as functions like `min` & `max`)</p>

<p>
However, it is important to note that you will almost always want to use the NumPy versions of this functions. <b> The standard Python versions won't have the speed advantages of the NumPy ones </b> and then do not always support multi-dimensional arrays.
</p>

</div>

In [36]:
big_array = np.random.rand(1000000)
%timeit -n 1 sum(big_array)
%timeit -n 1 np.sum(big_array)

1 loop, best of 3: 122 ms per loop
1 loop, best of 3: 822 µs per loop


<div class="alert alert-block alert-info">
<p>
Though not discussed here **practice `np.sum` on two-dimensional arrays by using optional axis parameter**. PDSH Page 60 and [online resources](https://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html)
</p>
</div>

### Aggregation functions available in NumPy

| Equivalent ufunc    | Description |                         
|---------------------|---------------------------------------|
|``np.sum``           |Compute sum of elements                |
|``np.prod``          |Compute product of elements            |
|``np.mean``          |Compute mean of elements               |
|``np.std``           |Compute standard deviation of elements |
|``np.var``           |Compute variance of elements           |
|``np.min``           |Find the minimum value                 |
|``np.max``           |Find the maximum value                 |
|``np.argmin``        |Find the index of minimum value        |
|``np.argmax``        |Find the index of maximum value        |
|``np.median``        |Compute median of elements             |
|``np.percentile``    |Compute rank based stats of elements   |
|``np.any``           |Evaluate whether any elements are true |
|``np.all``           |Evaluate whether all elements are true |

## Revisit: ND Football Roster Example

In [37]:
# At this point you don't have to know the details of following data loading. 
# However, understand that it is loading the weights of all the athletes
nd_player_weights = np.array(pd.read_csv('./data/nd-football-2017-roster.csv')['Weight'])
nd_player_names = np.array(pd.read_csv('./data/nd-football-2017-roster.csv')['Name'])
nd_player_heights = np.array(pd.read_csv('./data/nd-football-2017-roster.csv')['Height'])

## Activity:

Use NumPy `nd_player_weights`,`nd_player_heights` and `nd_player_names` arrays to compute the following details. Also, these arrays are aligned in such a way that the $i^{th}$ indexed element in one array corresponds to $i^{th}$ indexed element in another array. 

* Average weight, average height
* The median weight, height
* Variance of weights, heights
* Name of lightest player (**Hint**: Use np.argmin)
* Height of heaviest player (**Hint**: Use np.argmax)

In [38]:
print(np.mean(nd_player_weights), np.mean(nd_player_heights))

237.133333333 73.7333333333


In [39]:
print(np.median(nd_player_weights), np.median(nd_player_heights))

226.5 74.0


In [40]:
print(np.var(nd_player_weights), np.var(nd_player_heights))

2094.82666667 6.24


In [41]:
np.argmin(nd_player_weights)

18

In [42]:
np.min(nd_player_weights)

175

In [43]:
nd_player_weights[18]

175

In [44]:
nd_player_names[18]

'Shaun Crawford'

In [45]:
nd_player_names[np.argmin(nd_player_weights)]

'Shaun Crawford'

In [46]:
nd_player_heights[np.argmax(nd_player_weights)]

74

# NumPy Array Comparisons & Masking

Now, we will learn another set of Numpy functions that will compare the value of each element to a given condition and (generally) return a new array specifying if each element did or did not meet that condition.

## Available Numpy Comparison Functions
You can invoke Numpy's comparison functions either through an operator or by an explicit function call. You need to be familiar with both styles as you will see both in other people's code. 

Here are the available functions:

| Operator    | Equivalent ufunc    |
|---------------|---------------------|
|``==``         |``np.equal``         |
|``!=``         |``np.not_equal``     |
|``<``          |``np.less``          |
|``<=``         |``np.less_equal``    |
|``>``          |``np.greater``       |
|``>=``         |``np.greater_equal`` |

In [47]:
# Which players weigh more than 200lbs?
# I'll use the operator syntax this time.
nd_player_weights > 200

array([ True, False,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True, False,  True, False,  True, False, False,
       False,  True,  True,  True,  True, False, False, False, False,
       False, False, False, False,  True,  True,  True, False,  True,
        True, False,  True,  True,  True, False,  True,  True, False,
       False,  True,  True, False,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True, False,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool)

Interesting. It returns a new array that is full of `boolean` values. If the value is `true` at a given index, it means that specific player's weight was over 200 lbs.

<div class="alert alert-block alert-info">
<p>
For our purposes here, a boolean just means it is either true or false.
</p>
</div> 

In [48]:
# Which players are not 6ft (72 in) tall?
# This time I'll explicitly call the UFunc.
np.not_equal(nd_player_heights, 72)
# The above statement is equivalent to nd_player_heights != 72

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True, False,  True, False, False, False,  True,  True,  True,
        True,  True,  True,  True,  True, False,  True,  True,  True,
        True,  True, False, False,  True,  True,  True, False,  True,
        True,  True,  True,  True, False,  True, False,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool)

### Comparison UFuncs + `np.sum`, `np.all`, or `np.any`
Above, we answered the question, *which players weigh more than 200 lbs?* Now we will combine that information with additional functions to answer the following:

In [49]:
# Are any of the players > 200 lbs?
# The `np.any` function will return true if any array values are `True`.
np.any(nd_player_weights > 200)

True

In [50]:
# How many players weigh > 200 lbs?
np.sum(nd_player_weights > 200)

68

<div class="alert alert-block alert-info">
<h5>Where is `np.sum` getting a number from?</h5>
<p>
In an earlier tutorial we learned that the `np.sum` aggregate function adds all the values of an array together. But, there are no numeric values in an array full of `True/False` so where do these come from?
</p>
<p>
Turns out, that in Python the boolean `True` value has a corresponding numeric value of `1`. So, each time `np.sum` encounters `True` in the boolean array, it adds a `1` to its running total.
</p>
</div> 

In [51]:
# Are ALL of the players > 200 lbs?
# The `np.all` function returns true if ALL the array values are true.
np.all(nd_player_weights > 200)

False

### Comparison UFuncs + Bitwise Boolean Operators
This one might be a little bit confusing at first, so we'll start with a practical example.

Let's say that we wanted to know which players were between 72 and 75 inches tall? **Bitwise boolean operators** allow us to combine & join comparisons together and get the net result.

Let's demonstrate.

In [52]:
# Which plays are between 72 and 75 inches tall?
(nd_player_heights >= 72) & (nd_player_heights <= 75)

array([False, False,  True,  True,  True, False,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True, False, False,
       False,  True,  True,  True, False,  True, False, False, False,
       False, False,  True,  True,  True,  True, False,  True,  True,
        True, False, False,  True,  True,  True,  True, False, False,
       False,  True, False,  True,  True,  True,  True,  True, False,
        True,  True,  True,  True, False,  True,  True,  True, False,
        True, False, False, False, False, False, False, False,  True,
       False, False, False, False, False, False,  True,  True, False,
       False,  True, False, False,  True,  True, False, False, False], dtype=bool)

<div class="alert alert-block alert-danger">
<h5>Parenthesis are Important Here</h5>
<p>
The parenthesis here are important because of 
<a href="https://docs.python.org/3/reference/expressions.html#operator-precedence" target="_blank">
Python's operator precedence rules</a> which would lead to the following evaluation if I hadn't included the parenthesis: `player_heights >= (72 & player_heights) <= 75`
</p>
<p>
This would obviously have a different result. So, be mindful to use parathesis to force the correct order of operations when combining UFuncs with bitwise boolean operators.
</p>
</div> 

In [53]:
# Ok, now let's bring back in `np.sum` to get a 
# count of the players that match this criteria
np.sum((nd_player_heights >= 72) & (nd_player_heights <= 75))

47

In [54]:
np.sum((nd_player_heights >= 72) & (nd_player_weights <= 200))

9

What we've done here is utilize a couple of **bitwise boolean operators**. When used, these operators evaluate each element of the two arrays being compared. For each element, it evaluates whether the two values match the operator condition, and then returns `True` or `False` for that element pair accordingly.

Yes, that is a mouthful of a sentence. So, practice how this works with a smaller set of arrays. But first, here is the full list of operators:

| Operator   | Equivalent ufunc  |
|------------|-------------------|
|`&`         |np.bitwise_and   |
|&#124;      |`np.bitwise_or`    |
|`^`         |`np.bitwise_xor`   |
|`~`         |`np.bitwise_not`   |

## Activity:

Use NumPy `nd_player_weights`,`nd_player_heights` and `nd_player_names` arrays to compute the following details. Also, these arrays are aligned in such a way that the $i^{th}$ indexed element in one array corresponds to $i^{th}$ indexed element in another array. 

* How many players are above 72 inches in Height? 
* Are there any players between 250 lbs to 260 lbs? 
* How many players are either above 75 inches or below 70 inches? 
* How many players are not below 250 lbs?  

In [55]:
np.sum(nd_player_heights>72)

62

In [56]:
print("The number of players above 72 inches is {}".format(np.sum(nd_player_heights>72)))

The number of players above 72 inches is 62


In [57]:
np.any( (nd_player_weights>=250)& (nd_player_weights<=260)   )

True

In [58]:
np.sum( (nd_player_heights>75) | (nd_player_heights<70)   )

30

In [59]:
np.sum(nd_player_weights<250)

60

#### Special Note for `bitwise.not (~)`
Up above, I said that the bitwise boolean operators evaluate two arrays. Well, in the case of `bitwise.not`, that isn't true.

Unlike the other bitwise operators, this one simply reverses the values in a boolean array.

## Comparison UFuncs as Array Masks

Here is a brief review of what we've covered:
1. Seen how the comparison ufuncs (np.equal, np.less, np.greater, etc) generate boolean arrays that indicate whether a given element of an array meets (or doesn't meet) the condition of the function.

1. We then showed how you could pass these boolean areas to `np.sum`, `np.all`, and `np.any` to derive additional information on your data set.

1. Finally, we demonstrated how you could logically compare two boolean arrays with the **bitwise** operators to perform multistep data comparisons.

For the last segment of this tutorial, we are going to demonstrate using comparison functions to return the original items of the array that is being evaluated instead of a boolean array.

#### Array Masking
In the last lecture, we showed how you could select data from an array using index or slice notation. Here we will introduce another data selection technique called **masking**.

Basically, it looks a lot like slice notation. In case you've forgotten what that looks like, here is a reminder.

In [60]:
simple_int_array = np.arange(10)
simple_int_array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [61]:
# Slice elements indexed 5, 6, 7 of our simple_int_array
simple_int_array[5:8]

array([5, 6, 7])

The difference with a mask is that instead of putting `[start:stop:step]` inside the brackets, you actually invoke a comparison function.

In [62]:
# Let's return all the values of simple_int_array that are less than 7
simple_int_array[simple_int_array < 7]

array([0, 1, 2, 3, 4, 5, 6])

**This way of masking is very important and used a lot.** This  this works:
1. The comparison UFunc inside the brackets is evaluated first. 
1. It returns a boolean array where the first 7 elements have `True` value, and the rest have `False`.
1. For each index of the boolean array with a `True` value, the corresponding index of the original array is returned.

## Activity:

* Names of all players above 78 inches? 
* Names of all players below 250 lbs? 

In [63]:
nd_player_names[ nd_player_heights > 78 ]

array(['Mike McGlinchey'], dtype=object)

In [64]:
nd_player_names[ nd_player_weights < 250 ]

array(['Dexter Williams', 'C.J. Sanders', "Te'von Coney",
       'Montgomery VanGorder', 'Nyles Morgan', 'Equanimeous St. Brown',
       'Nick Watkins', 'Brandon Wimbush', 'Chris Finke', 'Ian Book',
       'Devin Studstill', 'C.J. Holmes', 'Nolan Henry', 'Isaiah Robertson',
       'Troy Pride Jr.', 'Justin Yoon', 'Shaun Crawford', 'Jalen Elliott',
       'Asmar Bilal', 'Drue Tranquill', 'Mick Assaf', 'Nick Coleman',
       'Brandon Garcia', 'Austin Webster', 'Ashton White', 'Julian Love',
       'Nicco Fertitta', 'Sam Kohler', 'Kevin Stepherson', 'D.J. Morgan',
       'Josh Adams', 'Tony Jones Jr.', 'Grant Hammann', 'Donte Vaughn',
       'Robert Regan', 'Deon McIntosh', 'Christopher Schilling',
       'Kier Murphy', 'Brett Segobiano', 'Temitope Agoro',
       'Jimmy Thompson', 'Julian Okwara', 'Jeff Riney', 'Brian Ball',
       'Jamir Jones', 'Jonathan Jones', 'Matt Bushland', 'Chris Bury',
       'Greer Martini', 'Brandon Hutson', 'Devyn Spruell', 'John Shannon',
       'Miles Boykin

#### The below code is getting names of players whose height is above 72 inches and whose weight is below 200 pounds

In [65]:
(nd_player_heights >= 72) & (nd_player_weights <= 200)

array([False, False, False, False, False, False, False, False, False,
       False, False, False,  True, False,  True, False, False, False,
       False, False, False, False, False,  True, False, False, False,
       False, False,  True,  True, False, False, False,  True, False,
       False, False, False, False, False,  True, False, False, False,
       False, False, False,  True, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False,  True, False,
       False, False, False, False, False, False, False, False, False], dtype=bool)

In [66]:
nd_player_names[(nd_player_heights >= 72) & (nd_player_weights <= 200)]

array(['Devin Studstill', 'Nolan Henry', 'Nick Coleman', 'Sam Kohler',
       'Kevin Stepherson', 'Grant Hammann', 'Temitope Agoro',
       'Matt Bushland', 'Arion Shinaver'], dtype=object)

## np.unique

Returns the sorted unique elements of an array. [More info](https://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html)

In [67]:
sample_array = np.array([1,2,2,1,2,3,2,23,2,1,3,2])
np.unique(sample_array)

array([ 1,  2,  3, 23])