# Probability: AND

## Pandas with AND

In the last section, we saw how to counts in a data frame based on a given condition. Now, we are interested in getting counts of two (or possibly more) events separated by the word *and*.

We will look at each of the three methods of doing this using our `assessment_scores.csv` file.

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('Datasets/assessment_scores.csv')

In each of our methods, we will be examining the probability of randomly selecting a student that is from Ohio (OH) **and** scored more than 550 on the *math10* assessment:

$$P(\text{from Ohio and scored more than 550 on math10})$$

So, this question involves two conditional statements; both of which must be true:

1. The student is from Ohio (OH).
2. The student scored more than 550 on the *math10* assessment.

We will also store the denominator as a variable, since we will be using it frequently.

In [None]:
denom = len(df)

In [None]:
denom

### Method 1: Using `df[df["column"]]`

A couple of things to note using this method:

* We need to wrap each part of the conditional statement in parentheses.
* We use an ampersand, &, to represent the word AND.

In [None]:
numer1 = df[
    (df["state"] == "OH") & (df["math10"] > 550)
]

In [None]:
len(numer1) / denom

### Method 2: Using the Dot Operator

Recall from the previous section that the dot operator method is very similar in structure to Method 1. 

In [None]:
numer2 = df[
    (df.state == "OH") & (df.math10 > 550)
]

In [None]:
len(numer2) / denom

### Method 3: Using a Query

Using the `df.query()` method is just a *little* bit different from the above two methods. This time, instead of using an ampersand to represent the word AND, we are going to use `and` directly.

In [None]:
numer3 = df.query(
    'state == "OH" and math10 > 550'
)

In [None]:
len(numer3) / denom

### Extracting a Range of Dates 

We can even use either of the above three methods to get a sense of how many students were born in a given month. 

For instance, let's say we are asked to find the probability that a randomly selected student is born in July.

*Note*: July has 31 days

In order for a student to be born in July, their *dob* needs to be greater than or equal to July 1 **and** less than or equal to July 31.

In [None]:
numer1July = df[
    (df['dob'] >= "07/01/2006") & (df['dob'] <= "07/31/2006")
]

In [None]:
len(numer1July) / denom

In [None]:
numer2July = df[
    (df.dob >= "07/01/2006") & (df.dob <= "07/31/2006")
]

In [None]:
len(numer2July) / denom

In [None]:
numer3July = df.query(
    'dob >= "07/01/2006" and dob <= "07/31/2006"'
)

In [None]:
len(numer3July) / denom

## Conditional Probability

The denominator is going to be the count of items (students) that follow either of the following phrases:

* If 
* Given that
* Suppose

For our example, we will look at the probability that a student receives a 350 or higher on the *hist10* exam given that their birthday is in July.

$$P(\text{hist10 $\geq$ 350 | birthday in July})$$

* Numerator is the number of students with hist10 scores $\geq$ 350 **and** born in July.
* Denominator is the number of students born in July

### Method 1

In [None]:
denom = df[
    (df['dob'] >= "07/01/2006") & (df['dob'] <= "07/31/2006")
]

In [None]:
numer = df[
    (df['hist10'] >= 350) & (df['dob'] >= "07/01/2006") & (df['dob'] <= "07/31/2006")
]

In [None]:
len(numer) / len(denom)

### Method 2

In [None]:
denom = df[
    (df.dob >= "07/01/2006") & (df['dob'] <= "07/31/2006")
]

In [None]:
numer = df[
    (df.hist10 >= 350) & (df.dob >= "07/01/2006") & (df['dob'] <= "07/31/2006")
]

In [None]:
len(numer) / len(denom)

### Method 3

In [None]:
denom = df.query(
    'dob >= "07/01/2006" and dob <= "07/31/2006"'
)

In [None]:
numer = df.query(
    'hist10 >= 350 and dob >= "07/01/2006" and dob <= "07/31/2006"'
)

In [None]:
len(numer) / len(denom)

### Exercise 1

Find the probability that a randomly selected student 