## Create a subset of global-streetscapes dataset

In [1]:
# --------------------------------------
import warnings

warnings.filterwarnings("ignore")

# --------------------------------------
import ibis
ibis.options.interactive = True

# --------------------------------------
import streetscapes as scs

### Create or load the subset

In [None]:
# Directory containing CSV files
data_dir = scs.conf.CSV_DIR

# Directory containing Parquet files
parquet_dir = scs.conf.PARQUET_DIR

# Name of the subset to create
subset = "amsterdam_side"

Load the entire dataset. We are going to progressively extract subsets from it below.

In [None]:
df_all = scs.load_subset()

### Subset dataset

In this case we are choosing images of Amsterdam, during the day with a viewing direction from the side. First, we filter by city.

In [None]:
df_ams = df_all[df_all["city"] == "Amsterdam"]

Show a data excerpt.

In [None]:
df_ams.head()

Filter the remainder by lighting condition. First, we check what options there are in the data.

In [None]:
df_ams[["lighting_condition"]].distinct()

Filter by lighting condition (here, we use `day`).

In [None]:
df_day = df_ams[df_ams["lighting_condition"] == "day"]
df_day.columns

Finally, filter by view direction (we use `side` here).

In [None]:
df_side = df_day[df_day["view_direction"] == "side"]
df_side.columns

Check how many rows are left after filtering.

In [None]:
df_side.count()

### Create dataframe to download images

Only keep the information needed to download the images and save to a csv file. 

In [None]:
df_to_download = df_side[["uuid", "source", "orig_id"]]
df_to_download.head()

In [None]:
df_to_download.to_parquet(parquet_dir / f"{subset}.parquet")

In [None]:
df_ams = ibis.read_parquet(parquet_dir / f"{subset}.parquet")

In [None]:
df_ams.head()

We can achieve the same outcome by using a Streetscapes function. For now, we can specify basic conditions using the `operator` module, such as `equal to` (`operator.eq`), `greater / less than` (`operator.gt` / `operator.lt`) and so forth. A missing operator is implicitly interpreted as `operator.eq`. We are working on more sophisticated filtering options.

In [None]:
# Define the criteria for creating the subset
criteria = {
    "city": "Amsterdam", # Equivalent to "city": (operator.eq, "Amsterdam")
    "view_direction": "side",
    "lighting_condition": "day",
}

# Define the columns to keep in the subset
columns = ["uuid", "source", "orig_id", "lat", "lon"]

# Create or load the subset
df_city = scs.load_subset(
    subset,
    criteria=criteria,
    columns=columns,
    recreate=True,
    save=False,
)

Make sure that the number of columns match what we obtained above.

In [None]:
df_city.count()

Show a data excerpt.

In [None]:
df_city.head()