# Data 6: Booleans and Predicates

* Booleans
* Boolean Predicates with `where` Table method

In [None]:
from datascience import *
import numpy as np
import warnings
warnings.filterwarnings("ignore")

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
plt.rcParams["patch.force_edgecolor"] = True

import seaborn as sns

## Boolean predicates with `where` Table method

### SAT Data

Today we will be working with a dataset showing aggregated (average) SAT scores by state ([source 1](https://commonwealthfoundation.org/2014/12/22/sat-scores-by-state-2014/), [source 2](https://reports.collegeboard.org/sat-suite-program-results/data-archive)).

**Note**: This data is from 2014, so the total score is out of 2400 (over three sections each out of 800) instead of 1600.

In [None]:
sat = Table.read_table('data/sat2014-lecture.csv')
sat

Add a `**Combined**` column that sums up the three sections' scores.

In [None]:
sat = sat.with_columns(
    'Combined',
    sat.column('Critical Reading') + \
        sat.column('Math') + \
        sat.column('Writing')
)

Recall the table methods and properties we can use to learn more above our data and even create new data:

In [None]:
sat.sort("State", descending=True).take(np.arange(5))

## `.where`

We've already seen how we can use `tbl.where()` to find rows that _exactly_ match what we're looking for. For example:

In [None]:
sat.where('State', 'California')

But `tbl.where` is also capable of so much more! The second argument in `.where` can accept a **predicate**, which tells Python what condition to match rows on. See the [Data 6 Python Reference](https://data6.org/notes/reference).

In [None]:
sat.where("Combined", are.above(1800))

Note that `are.equal_to(z)` is the same as just passing in `z` itself as the second argument.

In [None]:
sat.where("State", are.containing("Dakota"))

In [None]:
sat.where("Math", are.between(580, 600))

### Method Chaining: Multiple Conditions

We can match rows to multiple conditions/predicates by chaining `where` method calls together. For example, we can look for states where the participation rate is above 20% and the average combined SAT score is above 1500.

In [None]:
sat.where("Participation Rate", are.above(20)).where("Combined", are.above(1500)) # Filter the `sat` table to find states where participation is above 20% and combined score is above 1500

In [None]:
# better formatting (note parentheses)
(
    sat.where("Participation Rate", are.above(20))
        .where("Combined", are.above(1500))
)

**Task**

Filter the `sat` table to find states where participation is below 10% and combined score is between 1200 and 1400.

In [None]:
... 

We can have multiple different values to match to if we put then in an array and then use `are.contained_in`.

In [None]:
deep_south = np.array(['Alabama', 'Georgia',
                       'Louisiana', 'Mississippi',
                       'South Carolina'])

**Task**

Filter the `sat` table to include only the states listed in the `deep_south` array.

In [None]:
...

**Task**

Find the states in the deep south with participation lower than 10% and combined score greater than or equal to 1600.

In [None]:
...

**Just for fun:** consider the scatter plot of all states' participation rates and combined SAT scores. Does this scatter plot imply that **lower participation _causes_ higher SAT scores? If not, what might be going on here?**

In [None]:
import plotly.express as px

px.scatter(data_frame = sat.to_df(), 
           x = 'Combined', 
           y = 'Participation Rate', 
           hover_data = {'State': True},
           title = 'SAT (2014) Participation Rate by state')

## Booleans

In [None]:
3 > 1 + 1

In [None]:
3 < -1 * 2

In [None]:
1 < 1 + 1 < 3

In [None]:
s = "Data " + "6"
s == "Data 6"

In [None]:
# is age at least age_limit?
age_limit = 21
age = 17
age >= age_limit

Note: Password checkers are a bit more secure than the below, to be clear...

In [None]:
# is password_guess equal to true_password?
true_password = 'qwerty1093x!'
password_guess = 'QWERTY1093x!'
true_password == password_guess

### Comparison Operators

In [None]:
3 == 3

In [None]:
'hello' != 'howdy'

In [None]:
-3 > -2

In [None]:
-3 < -2

In [None]:
"apple" >= "banana"

### Be careful about *equality* vs. *assignment*...

`=` and `==` have very different meanings in Python.

In [None]:
# set x equal to 5
x = 5

In [None]:
# is x equal to 5?
x == 5

In [None]:
x = "some other value" # reset

In [None]:
# valid. what does it do?
y = x == 5

In [None]:
x

In [None]:
y

### Comparisons across types

#### Equality across types

In [None]:
17 == '17'

In [None]:
'zebra' != True

In [None]:
True == 1.0

#### Inequality across types

In [None]:
banana = 10
'apple' >= banana

In [None]:
'alpha' >= 5 

In [None]:
5 > True

In the above cell, the boolean value is being type cast into an integer value, 1!