# Filtering Data Based on Criteria
In this lesson, we will use a simple, small dataset of weather projections in Chapel Hill for Thursday, March 25th, through Saturday, April 3rd, where each row is the projection for the next day in that timeframe.

Our analysis goal is to find the average temperatures on days where it is unlikely (less than 30%) to rain.

We will consider approaching this problem from a column-oriented perspective.

First, let's consider our data set.

In [20]:
col_data: dict[str, list[float]] = {
    "high": [77, 84, 78, 79, 65, 67, 74, 61, 55, 61],
    "low":  [67, 51, 64, 45, 43, 53, 56, 37, 34, 42],
    "rain": [.3, .2, .4, .8, 0., .2, .4, .5, .1, .1]
}

col_data


{'high': [77, 84, 78, 79, 65, 67, 74, 61, 55, 61],
 'low': [67, 51, 64, 45, 43, 53, 56, 37, 34, 42],
 'rain': [0.3, 0.2, 0.4, 0.8, 0.0, 0.2, 0.4, 0.5, 0.1, 0.1]}

## Produce a mask based on criteria


In [21]:
def less_than(col: list[float], threshold: float) -> list[bool]:
    result: list[bool] = []
    for item in col: 
        result.append(item < threshold)
        # the above line is the same since the expression will always evaluate to true or false
        # if item < threshold:
        #     result.append(True)
        # else:
        #     result.append(False)
    return result

# example tsting call
# less_than(col_data["rain"], 0.3)
no_rain_mask: list[bool] = less_than(col_data["rain"], 0.3)
print(no_rain_mask)
# less_than(col_data["high"], 65)

[False, True, False, False, True, True, False, False, True, True]


In [30]:
print(col_data["rain"])
print(col_data["high"])
print(col_data["low"])

[0.3, 0.2, 0.4, 0.8, 0.0, 0.2, 0.4, 0.5, 0.1, 0.1]
[77, 84, 78, 79, 65, 67, 74, 61, 55, 61]
[67, 51, 64, 45, 43, 53, 56, 37, 34, 42]


# Masked function

Takes in a column and a list of masks (bool values), returns only the values in the input column where the corresponding mask value is True

In [23]:
def masked(col: list[float], mask: list[bool]) -> list[float]:
    result: list[float] = []

    # for in loop so that the index value we are looking at is within the range of the mask

    for i in range(len(mask)):
        if mask[i]:
            result.append(col[i])  # then append the item at the same index of our COLUMN
    return result

# test call
# print(col_data["rain"])
# print(no_rain_mask)
# print(col_data["high"])
# masked(col_data["high"], no_rain_mask)

highs_of_no_rain_days: list[float] = masked(col_data["high"], no_rain_mask)
print(highs_of_no_rain_days)

[84, 65, 67, 55, 61]


# Compute the average

In [24]:
def mean(col: list[float]) -> float:
    return sum(col) / len(col)  # built in sum 

mean(highs_of_no_rain_days)

66.4

## with these hlper functions, we can perform many analyses!!!

In [28]:
# What is the chance of rain when the low is less than 50 degrees

less_than_50_mask: list[bool] = less_than(col_data["low"], 50)
rain_below_50_deg: list[float] = masked(col_data["rain"], less_than_50_mask)
average = mean(rain_below_50_deg)
print(average)


0.30000000000000004


In [29]:
def not_mask(mask: list[bool]) -> list[bool]:
  result: list[bool] = []
  for item in mask:
    result.append(not item)
  return result

mask_a: list[bool] = less_than(col_data["high"], 80)
mask_b: list[bool] = not_mask(mask_a)

values: list[float] = masked(col_data["low"], mask_b)
print(mean(values))

51.0
