# CSC271 Lecture Notes

### In this lession:
1. [Comparison operators and Masks](#comparison): comparing with arrays and generating masks
2. [Data Filtering and Selection](#filtering): filtering and selection using masks
3. [Conditional Assignments](#conditional): conditional assignments using masks
4. [`where` function](#where): where as an alternative to conditional assignments

<a id="comparison"></a>
## 1. Comparison operators and Masks

NumPy has a set of comparison operators and corresponding (equivalent) functions that can be applied to arrays.

<div class="alert alert-block alert-info">

### Comparision operators/functions
<table>
  <thead>
    <tr>
      <th>Operator</th>
      <th>NumPy Function</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>==</td>
      <td>equal</td>
      <td>Equal to</td>
    </tr>
    <tr>
      <td>!=</td>
      <td>not_equal</td>
      <td>Not equal to</td>
    </tr>
    <tr>
      <td>&lt;</td>
      <td>less</td>
      <td>Less than</td>
    </tr>
    <tr>
      <td>&lt;=</td>
      <td>less_equal</td>
      <td>Less than or equal to</td>
    </tr>
    <tr>
      <td>&gt;</td>
      <td>greater</td>
      <td>Greater than</td>
    </tr>
    <tr>
      <td>&gt;=</td>
      <td>greater_equal</td>
      <td>Greater than or equal to</td>
    </tr>
  </tbody>
</table>
</div>


When applied to a NumPy ndarray, these operators produce an array of Boolean values.

In [2]:
import numpy as np

ages = np.array([16, 18, 46, 12, 97, 8, 32])

We call the array of Booleans generated from the conditional expression a **mask**. We can call on NumPy's aggregate functions or  array methods to summarize the mask:


<div class="alert alert-block alert-info">

<table>
  <thead>
    <tr>
      <th><code>array</code> Method</th>
      <th>Function</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code>sum</code></td>
      <td><code>np.sum()</code></td>
      <td>Return the sum of all array elements</td>
    </tr>
    <tr>
      <td><code>any</code></td>
      <td><code>np.any()</code></td>
      <td>Return True if and only if <strong>any</strong> array element is True</td>
    </tr>
    <tr>
      <td><code>all</code></td>
      <td><code>np.all()</code></td>
      <td>Return True if and only if <strong>all</strong> array elements are True</td>
    </tr>
  </tbody>
</table>




Masks are commonly used for:
- data filtering and selection
- conditional assignment

<a id="filtering"></a>
## 2. Data Filtering and Selection

Data filtering involves selecting a subset of the data. A common workflow for doing this is:
- use a condition to generate a mask
- select only the array items that meet that condition by usimg the mask.

For example, let's filter to get only the ages of adults (at least 18 years old):

In [2]:
adult_age = ages[ages >= 18]
print(adult_age)

[18 46 97 32]


It is also possible to combine multiple conditions. For example, we can select teenagers, which we'll define as people between the ages of 13 to 18 inclusive.

We'd like to apply two conditions:
- `ages >=13`
- `ages <= 18`

Let's look at the output of filtering based on each of those conditions:

In [4]:
ages = np.array([16, 18, 46, 12, 97, 8, 32])

combined_mask = (ages >= 13) & (ages <= 18)
print(ages[combined_mask])

[16 18]


We need to combine the two arrays to identify when both conditions are met. We'd like to apply the Boolean and operator to each pair of items as follows:

```
mask1:   [ True   True   True  False   True  False   True ]
mask2:   [ True   True  False   True  False   True  False ]
          ------------------------------------------------
         [ True   True  False  False  False  False  False ]
```


Python's Boolean operators (`and`, `or`, and `not`) can only be applied to single Boolean values, not to arrays of Booleans. Instead, we can use bitwise operators that apply to each corresponding pair of elements.

<div class="alert alert-block alert-info">

### Bitwise operators

<table>
  <thead>
    <tr>
      <th>Operator</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code>&amp;</code></td>
      <td>Produce <code>True</code> if and only if <strong>both</strong> mask elements are <code>True</code></td>
    </tr>
    <tr>
      <td><code>|</code></td>
      <td>Produce <code>True</code> if <strong>either</strong> mask element is <code>True</code></td>
    </tr>
    <tr>
      <td><code>~</code></td>
      <td>Produce <code>True</code> if the mask element is <code>False</code> and <code>False</code> if it is <code>True</code></td>
    </tr>
  </tbody>
</table>

</div>

In [3]:
ages = np.array([16, 18, 46, 12, 97, 8, 32])



<div class="alert alert-block alert-info">
<h4>Tips for combining multiple comparisons:</h4>

1. Chained comparisions don't work (e.g., `13 <= ages <=18`). You need to write the expression out completely: `(ages >= 13) & (ages <= 18)`

2. When combining multiple conditions, there must be parentheses around each condition. 
- The bitwise operators have higher precedence than the comparison operators.
- Without the parentheses, the expression `ages >= 13 & ages <= 18` is equivalent to `ages >= (13 & ages) <= 18`. 
</div>


<a id="conditional"></a>
## 3. Conditional Assignments

A mask can also be used for conditional assignments. 

For example, let's imagine that we have a Python list and we want to replace all negative values with 0.

In [4]:
# Python list version.

data = [3, -1, 5, -7, 2]

# TODO

print(data)


[3, -1, 5, -7, 2]


Now, we'll do the same using a NumPy array and masking:

In [5]:
import numpy as np

data = np.array([3, -1, 5, -7, 2])

# TODO

print(data)

[ 3 -1  5 -7  2]


<a id="where"></a>
### 4. `where` function

An alternative to the conditional assignment above is to use masking in combination with NumPy's `where` function.

In [5]:
data = np.array([3, -1, 5, -7, 2])

data = np.where (data < 0, 0, data)

print(data)

[3 0 5 0 2]


This is useful in cases where you want to assign one value if the mask element is `True` and another if the mask element is `False`.

For example, back to our `ages` example, we can produce categories of ages:

In [None]:
ages = np.array([16, 18, 46, 12, 97, 8, 32])

data = np.where(ages >= 18, 'adult', 'child')
# where (condition, value if true, value if false)

print(data)

['child' 'adult' 'adult' 'child' 'adult' 'child' 'adult']


We can also nest these calls, to further categories adults into `'adult'` and `'senior'`:

In [None]:
ages = np.array([16, 18, 46, 12, 97, 8, 32])

data = np.where(ages >= 18, np.where(ages >= 65, 'senior', 'adult'), 'child')

print(data)

['child' 'adult' 'adult' 'child' 'senior' 'child' 'adult']
