# Filtering Data Based on Criteria

In this lesson, we will use a simple, small, old dataset of weather projections in Chapel Hill for Thursday, March 25th, through Saturday, April 3rd, of 2022. Each row is the projection for the next day in that timeframe.

Our analysis goal is to find the average temperatures on days where it is unlikely (less than 30%) to rain.

We will consider approaching this problem from a column-oriented perspective.

First, let's consider our data set.

In [9]:
col_data: dict[str, list[float]] = {
    "high": [77, 84, 78, 79, 65, 67, 74, 61, 55, 61],
    "low":  [67, 51, 64, 45, 43, 53, 56, 37, 34, 42],
    "rain": [.3, .2, .4, .8, 0., .2, .4, .5, .1, .1]
}

col_data

{'high': [77, 84, 78, 79, 65, 67, 74, 61, 55, 61],
 'low': [67, 51, 64, 45, 43, 53, 56, 37, 34, 42],
 'rain': [0.3, 0.2, 0.4, 0.8, 0.0, 0.2, 0.4, 0.5, 0.1, 0.1]}

### Produce a "Mask" Based on Criteria

In [16]:
def less_than(column: list[float], threshold: float) -> list[bool]:
 # Look through each of the list values belonging to each dictionary key to see if rain is less than 30%
    result: list[bool] = []
    for item in column:
        result.append(item < threshold)
        # if item < threshold:
        #   result.append(True)
        # else: 
        #   result.append(False)
        # commented out because there is a better way to code
    return result 


no_rain: list[bool] = less_than(col_data["rain"], 0.3)
print(no_rain)

[False, True, False, False, True, True, False, False, True, True]


#### Masking the Values 
* the true and false results from the list masks the actual percetages 
* correspond the true and false results to the temperature in high and low temperatures 

In [19]:
# Masked function: 
# - takes in a column and a list of masks(bool values)
def masked (column: list[float], masks: list[bool]) -> list[float]:
    no_rain_temps: list[float] = []
    for item in range(len(masks)):
        if masks[item]:
            no_rain_temps.append(column[item])
    return no_rain_temps


print(col_data["rain"])
print(no_rain)
print(col_data["high"])
hightemp_norain: list[float] = masked(col_data["high"], no_rain)
print(hightemp_norain)

[0.3, 0.2, 0.4, 0.8, 0.0, 0.2, 0.4, 0.5, 0.1, 0.1]
[False, True, False, False, True, True, False, False, True, True]
[77, 84, 78, 79, 65, 67, 74, 61, 55, 61]
[84, 65, 67, 55, 61]


#### Compute the Average of High Days with No Rain 

In [20]:
def mean(column: list[float]) -> float:
    return sum(column) / len(column)

mean(hightemp_norain)

66.4

#### With the functions provided, many analyses can be made:

In [22]:
cold_days: list[bool] = less_than(col_data["low"], 50)
print(cold_days)
colddays_yesrain: list[float] = masked(col_data["rain"], cold_days)
print(mean(colddays_yesrain))

[False, False, False, True, True, False, False, True, True, True]
0.30000000000000004
