In [1]:
%pip install numpy



In [1]:
import numpy as np

# Filtering Data With Logical Indexing

Sometimes you want to remove certain values from your dataset.  In Numpy, this can be done with **Logical Indexing**, and in normal Python this is done with an **If Statement**

### Step 1: Create a Logical Numpy Array

We can convert all of the values in an array at once with a single logical expression.  This is broadcasting, the same as is done with the math operations we saw earlier:

```python
>>> data = np.array([1, 2, 3, 4, 5])
>>> data < 3
[True, True, False, False, False]
```

**Exercises**: Make arrays of True/False values that answer the following questions about the dataset below for each element.

In [2]:
import numpy as np

list_of_values = [3, 7, 10, 2, 1, 7, np.nan, 20, -5]
data = np.array(list_of_values)

*Example*: Where are the values that are greater than zero?

In [3]:
data > 0

array([ True,  True,  True,  True,  True,  True, False,  True, False])

Where are the values that are less than four?

In [4]:
data < 4

array([ True, False, False,  True,  True, False, False, False,  True])

Where are the values that are equal to 7?

In [5]:
data == 7

array([False,  True, False, False, False,  True, False, False, False])

Where are the values that are greater or equal to 7?

In [6]:
data >= 7

array([False,  True,  True, False, False,  True, False,  True, False])

Where are the values that are not equal to 7?

In [7]:
data != 7

array([ True, False,  True,  True,  True, False,  True,  True,  True])

## Step 2: Filter with Logical Indexing

If an array of True/False values is used to *index* another array, and both arrays are the same size, it will return all of the values that correspond to the True values of the indexing array:

```python
>>> data = np.array([1, 2, 3, 4, 5])
>>> mask = data > 3
>>> mask
[False, False, False, True, True]
>>> data[mask]
[4, 5]
```

Both steps can also be done in a single expression.  Sometimes this can make things clearer!


```python
>>> data[data > 3]
[4, 5]
```


**Exercises**:  Using the data below, extract only the values that corresspond to each question

In [6]:
data = np.array([3, 1, -6, 8, 20, 2, np.nan, 7, 1, np.nan, 9, 7, 7, -7])
data

array([ 3.,  1., -6.,  8., 20.,  2., nan,  7.,  1., nan,  9.,  7.,  7.,
       -7.])

*Example*: The values that are less than 0

In [7]:
data[data < 0]

array([-6., -7.])

The values that are greater than 3

In [10]:
data[data > 3]

array([ 8., 20.,  7.,  9.,  7.,  7.])

The values not equal to 7

In [11]:
data[data != 7]

array([ 3.,  1., -6.,  8., 20.,  2., nan,  1., nan,  9., -7.])

The values equal to 20

In [12]:
data[data == 20]

array([20.])

### Statistics on Filtered Data



**Exercises**: Using the following dataset, have Python to calculate the answers to the questions below:

In [8]:
data = np.array([3, 1, -6, 8, 20, 2, 7, 1, 9, 7, 7, -7])
data

array([ 3,  1, -6,  8, 20,  2,  7,  1,  9,  7,  7, -7])

*Example*: How many values are greater than 4?  

In [15]:
len(data[data > 4])

6

How many values are equal to 7?

In [16]:
len(data[data == 7])

3

What is the mean value of the positive numbers?

In [17]:
np.mean(data[data > 0])

6.5

What is the mean value of the negative numbers?

In [18]:
np.mean(data[data < 0])

-6.5

What is the median value of the values that are greater than 5?

In [19]:
np.median(data[data > 5])

7.5

What proportion of the values are positive?  (hint: sum and len, or mean)

In [20]:
np.mean(data[data > 0])

6.5

What proportion of the values are less than or equal to 8?

In [21]:
np.mean(data[data <= 8])

2.3

## Modifying Data Using Logical Indexing

## Using Logical Indexing to Link Two Different Variables in a Dataset