# Retrival of Google Street View Images (GSV) using the Street View Static API
In this notebook, we download GSV images on the coordiantes provided by the datasets from the [INHABIT](https://gis.usgs.gov/inhabit/) database and  [EDDMaps](https://www.eddmaps.org/distribution/viewmap.cfm?sub=2425) database.  
Both datasets provide information about Kudzu locations.  

Before running the code, make sure the `inhabit.xlsx` and `eddmaps.csv` are inside the `/data` folder. If not, you can download them from
[here](https://drive.google.com/drive/folders/1CM5DLkSH5283FnFY7Vh0yJ4qKHmk1sgd?usp=drive_link) and copy them into the folder.

Make sure to add you `API key` and your `base folder`.

## 1. Data Preprocesisng

### 1.1 Install Libraries

In [None]:
# If using Windows run this
!pip install utm pandas numpy

# If using MacOS run "conda install -c conda-forge utm pandas numpy"

In [None]:
# using Python
import requests
import utm
import csv
import pandas as pd
import numpy as np
import os

### 1.3 Create Dataset

In [None]:
your_directory = "YOUR_BASE_DIRECTORY_GOES_HERE"
output_dir = 'output'

In [None]:
# Join paths
output_folder = os.path.join(your_directory, output_dir)
print("Joined path:", output_folder)

# Create output folder if it doesn't exists
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

#### INHABIT Dataset

In [None]:
file_name = 'data/inhabit.xlsx'
file_path = os.path.join(your_directory, file_name)
print("Joined path:", file_path)

In [None]:
# All Data Dataframe
df  = pd.read_excel(file_path, sheet_name='all_data') 
df.to_numpy()
print(df)

#### EDDMaps Dataset

In [None]:
new_data = pd.read_csv('data/eddmaps.csv')
new_data = new_data.rename(columns={'Latitude': 'Y', 'Longitude': 'X'})
print(new_data)

#### Join Datasets

In [None]:
df = pd.merge(df, new_data, on=['X', 'Y'], how='outer')
print(df)

In [None]:
# Drop duplicates based on all columns
df = df.drop_duplicates()

# Reset index if needed
df.reset_index(drop=True, inplace=True)

print(df)

In [None]:
# Check for missing or NaN values in your DataFrame for location coordinates
missing_rows = df[df[['Y', 'X']].isnull().any(axis=1)]
print(f"Missing or invalid locations: {len(missing_rows)}")

# Optionally, you can drop rows with missing or NaN values to ensure complete processing
df_cleaned = df.dropna(subset=['Y', 'X'])

## 2. Download GSV Images

We access coordinates from the DataFrame `df` using the `loc` method, e.g., `df.loc[0, "X"].`

##### *Longitude*

In [None]:
df.loc[0, "X"]

##### *Latitude*

In [None]:
df.loc[0, "Y"]

### 2.1 Define API Metadata

Make sure to set your API key.


In [None]:
ind=6
meta_base = 'https://maps.googleapis.com/maps/api/streetview/metadata?'
pic_base = 'https://maps.googleapis.com/maps/api/streetview?'
api_key = 'YOUR_API_KEY_GOES_HERE'

Street View Static API Street checks the status of the metadata of the image. In this code we only download the images with `status = 'ok'` as  images with other status don't have imagenery.  
[Street View Static API Documentation.](https://developers.google.com/maps/documentation/streetview/metadata#status-codes
)   

In [None]:
# Define your views
views = [0, 90, 180, 270]

# Define image counter
img_count = 0
valid_img_count = 0

# Set the path for checkpoint file
checkpoint_file = 'image_status_checkpoint.csv'

# Load the already processed locations (if exists)
processed_images = set()

if os.path.exists(checkpoint_file):
    with open(checkpoint_file, 'r') as file:
        reader = csv.reader(file)
        for row in reader:
            latitude, longitude, view, status = row[:4]
            processed_images.add((latitude, longitude, view))
            img_count += 1  # Increment total image count
            if status == "OK":
                valid_img_count += 1  # Increment valid image count for "OK" status images

#### Check for the amout of locations processed

In [None]:
print(len(df)*4)

In [None]:
print(f'img_count = {img_count}')
print(f'valid_img_count = {valid_img_count}')

### 2.2 Download Images 

In [None]:
# Open the checkpoint file once, and use append mode to write every processed point
with open(checkpoint_file, 'a', newline='') as checkpoint:
    checkpoint_writer = csv.writer(checkpoint)

    # Loop through your DataFrame
    for ind, row in df.iterrows():
        for view in views:
            latitude = df.loc[ind, 'Y']
            longitude = df.loc[ind, 'X']
            location = f"{latitude},{longitude}"

            # Skip if this image was already processed
            if (str(latitude), str(longitude), str(view)) in processed_images:
                #print(f"Skipping already processed image {latitude}_{longitude}_{view}")
                continue

            img_count += 1  # Increment total image count for each new image

            # Define metadata request parameters
            meta_params = {'key': api_key, 'location': location}

            # Make metadata request
            meta_response = requests.get(meta_base, params=meta_params)

            # Initialize pic_response to None
            pic_response = None
            status = "UNKNOWN"  # Default status if no metadata is received

            try:
                # Check if metadata request was successful
                if meta_response.status_code == 200:
                    meta_data = meta_response.json()
                    status = meta_data.get('status', 'UNKNOWN')  # Get status, or default to 'UNKNOWN'
                    print(f"{img_count}_Image_{latitude}_{longitude}_{view} Status: {status}")

                    # Only download and save images if status is "OK"
                    if status == "OK":
                        # Define picture request parameters
                        pic_params = {
                            'key': api_key,
                            'location': location,
                            'heading': view,
                            'size': "512x512",
                            'fov': "120",
                        }

                        # Make picture request
                        pic_response = requests.get(pic_base, params=pic_params)

                        # Check if picture request was successful
                        if pic_response.status_code == 200:
                            # Define the image filename based on the coordinates
                            valid_img_count += 1
                            image_filename = f"{output_folder}/image_{latitude}_{longitude}_{view}.jpg"

                            # Save the downloaded image with the coordinates as the filename
                            with open(image_filename, 'wb') as file:
                                file.write(pic_response.content)
                            print(f"{valid_img_count}_Image saved: {image_filename}")
                        else:
                            print(f"Error downloading image for location: {location}_{view}")
                else:
                    print(f"Error fetching metadata for location: {location}_{view}")

                # Write the status of every image (whether "OK" or not)
                checkpoint_writer.writerow([latitude, longitude, view, status])
                checkpoint.flush()  # Ensure data is immediately written to file

                # Add the current image to the processed set
                processed_images.add((str(latitude), str(longitude), str(view)))

            except Exception as e:
                print(f"Error processing metadata or image: {e}")

            finally:
                # Close the response connections
                meta_response.close()
                if pic_response:
                    pic_response.close()  # Close only if pic_response was initialized

### 2.4 Number of Images Checker
If after running the cell above, no more images are being downloaded. -->  Run the cell below.  
If `img_count` doesn't match with `number of possible locations` or the `number of checked images`. --> Go to file CoordinatesCheck.ipynb.

In [None]:
print(f'img_count = {img_count}')

print(f'Number of possible images = {len(df)*4}')

file_path = 'image_status_checkpoint.csv'
check_df = pd.read_csv(file_path)
print(f'Number of checked images = {len(check_df)}')

## 3. Images Quality Checker

Define function to count files in folder.

In [None]:
def count_files(folder_path):
    # Initialize a counter variable
    file_count = 0

    # Iterate through each file in the folder
    for file_name in os.listdir(folder_path):
        # Check if the path is a file (not a directory)
        if os.path.isfile(os.path.join(folder_path, file_name)):
            file_count += 1

    return file_count

Count how many images we obtained and how many images had `status=ZERO_RESULTS`.

In [None]:
# Call the function to count the files
num_ok_images = count_files(output_folder)
num_bad_files = img_count - num_ok_images

# Print the number of files in the folder
print(f"Number of good images: {num_ok_images}")
print(f"Number of bad images: {num_bad_files}")
print(f"Number of images: {num_ok_images+num_bad_files}")


Check the amount of locations we lost.

In [None]:
num_points = num_bad_files/4
print("Number of lost points:", num_points)