In [None]:
import pandas as pd

open_seven_days_df = pd.read_parquet("../../data/pandas/open_seven_days_df.parquet")

A theme in this course will be learning transformations across languages— the ability to select the proper tool for the job depends on a knowledge of _what tools exist_.

In this lesson, we'll cover select + filter operations in Pandas.

Creating columns in Pandas is as simple as assigning those columns through the syntax

```python
dataframe['column_name'] = column_value
```

In [None]:
import numpy as np

open_seven_days_df["closed_open"] = np.where(
    open_seven_days_df["standardHours.thursday"] == "Closed", "Closed", "Open"
)
open_seven_days_df["is_closed"] = np.where(
    open_seven_days_df["standardHours.thursday"] == "Closed", True, False
)

Now you might be saying "when are we assigning a single value to a column vs. performing a calculation on a column?" and that would be a great question! The answer lies in _vectorization_— the process of performing calculations on entire columns at once. 

Certain operations can be vectorized and act on other columns, while others need to be _applied_ row-by-row. We'll talk about applying row-wise functions later in the course, but for now we'll focus on vectorized operations. 

In [None]:
open_seven_days_df.columns

In [None]:
open_seven_days_df["open_closed"] = (
    "Today, the park is: " + open_seven_days_df["closed_open"]
)

open_seven_days_df["open_closed"]

It's also possible to select in Pandas using `iloc` and `loc`. As the name suggest, one is for selecting an _index_, the other a _column_

In [None]:
# this gets the first row of the dataframe
open_seven_days_df.iloc[0:1]

In [None]:
# this gets the rows of the dataframe with index 6, which happens to be the first row :)
open_seven_days_df.loc[6:7]

Filtering in pandas is most easily accomplished by supplying conditions when selecting data, for example

In [None]:
parks_df = pd.read_parquet("../../data/nps/nps_public_data_parks.parquet")

parks_df[parks_df["fullName"] == "Zion National Park"]

We can pass any number of boolean operations to successively filter a dataframe this way

In [None]:
parks_df[parks_df["states"].str.contains("UT") & parks_df["states"].str.contains("AZ")]

In [None]:
parks_df[
    (parks_df["states"].str.contains("UT") & parks_df["states"].str.contains("AZ"))
    | parks_df["states"].str.contains("WY")
]

In [None]:
parks_df[
    (parks_df["longitude"] < -140)
    & (parks_df["latitude"] > 60)
    & (parks_df["designation"] == "National Park")
]

We can select entire columns through a familiar notation & combine with our filtering, too

In [None]:
parks_df[["fullName", "states"]]

In [None]:
parks_df[
    (parks_df["longitude"] < -140)
    & (parks_df["latitude"] > 60)
    & (parks_df["designation"] == "National Park")
][["fullName", "states"]]