# Filtering Data Based on Criteria

In this lesson, we will use a simple, small dataset of weather projections in Chapel Hill for Thursday, March 25th, through Saturday, April 3rd, where each row is the projection for the next day in that timeframe.

Our analysis goal is to find the average temperatures on days where it is unlikely (less than 30%) to rain.

We will consider approaching this problem from a column-oriented perspective.

First, let's consider our data set.

In [24]:
col_data: dict[str, list[float]] = {
    "high": [77, 84, 78, 79, 65, 67, 74, 61, 55, 61],
    "low":  [67, 51, 64, 45, 43, 53, 56, 37, 34, 42],
    "rain": [.3, .2, .4, .8, 0., .2, .4, .5, .1, .1]
}

print(col_data)

{'high': [77, 84, 78, 79, 65, 67, 74, 61, 55, 61], 'low': [67, 51, 64, 45, 43, 53, 56, 37, 34, 42], 'rain': [0.3, 0.2, 0.4, 0.8, 0.0, 0.2, 0.4, 0.5, 0.1, 0.1]}


## Produce a 'mask' based on criteria

In [11]:
def less_than(col: list[float], threshold: float) -> list[bool]:
    return list(num < threshold for num in col)


no_rain_mask: list[bool] = less_than(col_data["rain"], 0.3)
print(no_rain_mask)
print('\n')
print(less_than(col_data["high"], 65))

[False, True, False, False, True, True, False, False, True, True]


[False, False, False, False, False, False, False, True, True, True]


# Masked Function

Takes in a column and a list of masks (bool values), returns only the values in the input column where the corresponding mask is True.


In [14]:
def masked(col: list[float], mask: list[bool]) -> list[float]:
    return list(col[i] for i in range(0, len(col)) if mask[i])

masked(col_data["high"], no_rain_mask)

[84, 65, 67, 55, 61]

In [17]:
def mean(col: list[float]) -> float:
   return sum(col) / len(col)

In [23]:
def not_mask(mask: list[bool]) -> list[bool]:
  return list(not item for item in mask)
  
mask_a: list[bool] = less_than(col_data["high"], 80)
mask_b: list[bool] = not_mask(mask_a)

values: list[float] = masked(col_data["low"], mask_b)
print(mean(values))

51.0
