# Pipeline - School Roads - Street Images

The purpose of this notebook is to use pre-gathered lat/lon + bearing points from roads around schools to gather street-level images. These images will in turn be used to determine whether school warning signage and lighting are present around those schools. This data in turn is used to help automatically determine the iRAP 5-star attribute code for school signage.

This notebook is part of a larger series of other notebooks which are orchestrated using the "pipeline_schoolroads.ipynb" notebook found under <this repo>/task7-feature-extraction-using-aerial-level-data/code/pipeline directory.

## Imported Data

The assumption is that incoming data will be in parquet format and include colunms indicating latitude, longitude, and bearing for use in gathering street-level images.

## Exported Data

Data exported by this notebook will assume the format of .jpg images with a naming format of <lat>_<lon>_<bearing>.jpg.

In [9]:
# Parameters cell used to indicate parameters which will be used at runtime.
# Note: the below is a default parameter value which is overridden when the
# notebook is executed as part of a pipeline via Prefect + Papermill

name = "usa"
link = "https://nces.ed.gov/programs/edge/data/EDGE_GEOCODE_PUBLICSCH_1819.zip"
unzip = True
target = "EDGE_GEOCODE_PUBLICSCH_1819.xlsx"
lat_colname = "LAT"
lon_colname = "LON"
credentials = {}

In [10]:
import os

import google_streetview.api
import google_streetview.helpers
import pandas as pd
import requests
from box import Box
from IPython.display import Image

In [11]:
# read in our source school data
df = pd.read_parquet("{}/data/{}_school_road_points.parquet".format(os.getcwd(), name))
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 215 entries, 0 to 214
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   road_name    215 non-null    object 
 1   lon          215 non-null    float64
 2   lat          215 non-null    float64
 3   road_length  215 non-null    float64
 4   bearing      215 non-null    float64
dtypes: float64(4), object(1)
memory usage: 8.5+ KB


In [12]:
# attempt to gather gstreetview results
def gather_google_streetview_meta(lat, lon, bearing, creds):
    try:
        params = [
            {
                "size": "640x640",  # max 640x640 pixels
                "location": "{lat},{lon}".format(lat=lat, lon=lon),
                "heading": str(bearing),
                "key": creds,
            }
        ]

        # Create a results object
        results = google_streetview.api.results(params)
        # only append to list if result status was ok
        if results.metadata[0]["status"] == "OK":
            return results
    except:
        return pd.NA

In [13]:
# work with limited subset because of personal google credentials limitation
# note: remove this block in the future
df = df[:100].copy()

In [14]:
df["google_streetview_meta"] = df.apply(
    lambda row: gather_google_streetview_meta(
        row["lat"], row["lon"], row["bearing"], credentials["google"]["key"]
    ),
    axis=1,
)
df["google_streetview_meta"].head()

0    <google_streetview.api.results object at 0x000...
1    <google_streetview.api.results object at 0x000...
2    <google_streetview.api.results object at 0x000...
3    <google_streetview.api.results object at 0x000...
4    <google_streetview.api.results object at 0x000...
Name: google_streetview_meta, dtype: object

In [15]:
# make a data dir for dumping the image downloads to per data source
data_dir = "{}/data/{}_schoolroads_street_images".format(os.getcwd(), name)
if not os.path.isdir(data_dir):
    os.mkdir(data_dir)

In [16]:
# download images using results from meta lookups
def download_google_streetview_image(google_streetview_api_results, data_dir):

    if google_streetview_api_results.metadata[0]["status"] != "OK":
        return None

    else:
        url = google_streetview_api_results.links[0]

        # use filename format lat_lon_bearing.jpg
        filename = "{}_{}.jpg".format(
            google_streetview_api_results.params[0]["location"].replace(",", "_"),
            google_streetview_api_results.params[0]["heading"],
        )
        filepath = "{}/{}".format(data_dir, filename)

        # only download files if we don't already have them to conserve requests with Google acct
        if not os.path.isfile(filepath):
            # get image from url
            img_data = requests.get(url).content

            # write image to filepath
            with open(filepath, "wb") as handler:
                handler.write(img_data)

        return filepath

In [None]:
df["google_streetview_image"] = df.apply(
    lambda row: download_google_streetview_image(
        row["google_streetview_meta"], data_dir
    )
    if row["google_streetview_meta"] != None
    else None,
    axis=1,
)
df["google_streetview_image"].head()

In [18]:
# show a count for the number of files in the data_dir
len(
    [
        name
        for name in os.listdir(data_dir)
        if os.path.isfile("{}/{}".format(data_dir, name))
    ]
)

197

In [19]:
df.drop(columns="google_streetview_meta").to_parquet(
    "{}/data/{}_school_road_points_streetview.parquet".format(os.getcwd(), name)
)