# Download Images using the Google Street View API
In this notebook, we download images where the dataset says its supposed to be.

Before running the code, make sure to download the following files to your Google Drive:
[/excel file path](https://)

This file contains the coordiantes of 20% of the points where the kudzu plant is supposed to be. The first 10% depend of the gHM and the other 10% depends of TWI.

## 1. Data preprocesisng

## 1.1 Install utm library

In [1]:
# If using Windows run this
!pip install utm

# If using MacOS run "conda install -c conda-forge utm" in the terminal


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
# using Python
import requests
import utm
import csv
import pandas as pd
import numpy as np
import os

In [3]:
your_directory = "/home/student/Desktop/KudzuClassification/ObjectDetection"
output_dir = 'output'

In [4]:
# Join paths
output_folder = os.path.join(your_directory, output_dir)
print("Joined path:", output_folder)

# Create output folder if it doesn't exists
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

Joined path: /home/student/Desktop/KudzuClassification/ObjectDetection/output


In [33]:
file_name = 'gps_twi_ghm.xlsx'
file_path = os.path.join(your_directory, file_name)
print("Joined path:", file_path)

Joined path: /home/student/Desktop/KudzuClassification/ObjectDetection/gps_twi_ghm.xlsx


Use pandas to read a EXCEL file named `sampling_1.5percent_basedon_ghM_or_TWI.xlsx` into a DataFrame called `df`

In [34]:
# All Data Dataframe
data_df  = pd.read_excel(file_path, sheet_name='all_data') 
data_df.to_numpy()
print(data_df)

               X          Y       gHM       TWI
0     -86.060254  37.201708  0.000000  7.725325
1     -80.328543  38.294871  0.000000  6.619407
2     -83.486841  35.518393  0.000000  6.786229
3     -83.662781  35.555762  0.000000  4.754903
4     -83.331794  35.708465  0.000000  4.955486
...          ...        ...       ...       ...
24085 -84.408286  33.731479  0.952147  6.811067
24086 -95.358267  29.764465  0.953498  4.499810
24087 -83.920285  35.967406  0.954781  5.097647
24088 -83.920285  35.967406  0.954781  5.097647
24089 -73.934428  40.743897  0.955685  6.394989

[24090 rows x 4 columns]


In [35]:
df = data_df
print(df)

               X          Y       gHM       TWI
0     -86.060254  37.201708  0.000000  7.725325
1     -80.328543  38.294871  0.000000  6.619407
2     -83.486841  35.518393  0.000000  6.786229
3     -83.662781  35.555762  0.000000  4.754903
4     -83.331794  35.708465  0.000000  4.955486
...          ...        ...       ...       ...
24085 -84.408286  33.731479  0.952147  6.811067
24086 -95.358267  29.764465  0.953498  4.499810
24087 -83.920285  35.967406  0.954781  5.097647
24088 -83.920285  35.967406  0.954781  5.097647
24089 -73.934428  40.743897  0.955685  6.394989

[24090 rows x 4 columns]


In [36]:
new_data = pd.read_csv('cleaned_data.csv')

In [37]:
new_data = new_data.rename(columns={'Latitude': 'Y', 'Longitude': 'X'})
print(new_data)

             Y         X
0     32.22668 -84.71198
1     40.38670 -73.98484
2     35.17294 -79.38491
3     35.01225 -79.52738
4     34.89793 -79.58489
...        ...       ...
6202  35.05172 -83.20240
6203  34.00546 -84.35153
6204  31.52195 -83.42056
6205  33.97158 -83.36531
6206  31.22127 -82.34601

[6207 rows x 2 columns]


In [38]:
df = pd.merge(df, new_data, on=['X', 'Y'], how='outer')
print(df)

               X          Y  gHM       TWI
0     -86.060254  37.201708  0.0  7.725325
1     -86.060254  37.201708  0.0  7.725325
2     -80.328543  38.294871  0.0  6.619407
3     -83.486841  35.518393  0.0  6.786229
4     -83.662781  35.555762  0.0  4.754903
...          ...        ...  ...       ...
30292 -83.202400  35.051720  NaN       NaN
30293 -84.351530  34.005460  NaN       NaN
30294 -83.420560  31.521950  NaN       NaN
30295 -83.365310  33.971580  NaN       NaN
30296 -82.346010  31.221270  NaN       NaN

[30297 rows x 4 columns]


In [39]:
# Drop duplicates based on all columns
df = df.drop_duplicates()

# Reset index if needed
df.reset_index(drop=True, inplace=True)

print(df)

               X          Y  gHM       TWI
0     -86.060254  37.201708  0.0  7.725325
1     -80.328543  38.294871  0.0  6.619407
2     -83.486841  35.518393  0.0  6.786229
3     -83.662781  35.555762  0.0  4.754903
4     -83.331794  35.708465  0.0  4.955486
...          ...        ...  ...       ...
28089 -83.202400  35.051720  NaN       NaN
28090 -84.351530  34.005460  NaN       NaN
28091 -83.420560  31.521950  NaN       NaN
28092 -83.365310  33.971580  NaN       NaN
28093 -82.346010  31.221270  NaN       NaN

[28094 rows x 4 columns]


In [40]:
# Check for missing or NaN values in your DataFrame for location coordinates
missing_rows = df[df[['Y', 'X']].isnull().any(axis=1)]
print(f"Missing or invalid locations: {len(missing_rows)}")

# Optionally, you can drop rows with missing or NaN values to ensure complete processing
df_cleaned = df.dropna(subset=['Y', 'X'])

Missing or invalid locations: 0


## We joined both data sets into one and now we can proceed

You access coordinates from the DataFrame `df` using the `loc` method, e.g., `df.loc[0, "X"].`

In [18]:
import requests
import os
import csv

In [19]:
df.loc[0, "X"]

-86.0602542323586

In [20]:
df.loc[0, "Y"]

37.2017078478203

You define the base URL for metadata and picture requests to the Google Street View API.
You set your API key.


In [21]:
ind=6
meta_base = 'https://maps.googleapis.com/maps/api/streetview/metadata?'
pic_base = 'https://maps.googleapis.com/maps/api/streetview?'
#api_key = 'your_api_key_goes_here'

Create an output folder inside our project folder

Code for downloading images. It checks teh status of the metadata of the image to only download the imaegs with `status = 'ok'` as  images with other status don't have imagenery. Documentation: https://developers.google.com/maps/documentation/streetview/metadata#status-codes


In [22]:
# Define your views
views = [0, 90, 180, 270]

# Define image counter
img_count = 0
valid_img_count = 0

# Set the path for checkpoint file
checkpoint_file = 'image_status_checkpoint.csv'

# Load the already processed locations (if exists)
processed_images = set()

if os.path.exists(checkpoint_file):
    with open(checkpoint_file, 'r') as file:
        reader = csv.reader(file)
        for row in reader:
            latitude, longitude, view, status = row[:4]
            processed_images.add((latitude, longitude, view))
            img_count += 1  # Increment total image count
            if status == "OK":
                valid_img_count += 1  # Increment valid image count for "OK" status images

### Check for what locations are left

In [24]:
print(len(df)*4)

112376


In [25]:
print(f'img_count = {img_count}')
print(f'valid_img_count = {valid_img_count}')

img_count = 112377
valid_img_count = 41894


In [26]:
# Open the checkpoint file once, and use append mode to write every processed point
with open(checkpoint_file, 'a', newline='') as checkpoint:
    checkpoint_writer = csv.writer(checkpoint)

    # Loop through your DataFrame
    for ind, row in df.iterrows():
        for view in views:
            latitude = df.loc[ind, 'Y']
            longitude = df.loc[ind, 'X']
            location = f"{latitude},{longitude}"

            # Skip if this image was already processed
            if (str(latitude), str(longitude), str(view)) in processed_images:
                #print(f"Skipping already processed image {latitude}_{longitude}_{view}")
                continue

            img_count += 1  # Increment total image count for each new image

            # Define metadata request parameters
            meta_params = {'key': api_key, 'location': location}

            # Make metadata request
            meta_response = requests.get(meta_base, params=meta_params)

            # Initialize pic_response to None
            pic_response = None
            status = "UNKNOWN"  # Default status if no metadata is received

            try:
                # Check if metadata request was successful
                if meta_response.status_code == 200:
                    meta_data = meta_response.json()
                    status = meta_data.get('status', 'UNKNOWN')  # Get status, or default to 'UNKNOWN'
                    print(f"{img_count}_Image_{latitude}_{longitude}_{view} Status: {status}")

                    # Only download and save images if status is "OK"
                    if status == "OK":
                        # Define picture request parameters
                        pic_params = {
                            'key': api_key,
                            'location': location,
                            'heading': view,
                            'size': "512x512",
                            'fov': "120",
                        }

                        # Make picture request
                        pic_response = requests.get(pic_base, params=pic_params)

                        # Check if picture request was successful
                        if pic_response.status_code == 200:
                            # Define the image filename based on the coordinates
                            valid_img_count += 1
                            image_filename = f"{output_folder}/image_{latitude}_{longitude}_{view}.jpg"

                            # Save the downloaded image with the coordinates as the filename
                            with open(image_filename, 'wb') as file:
                                file.write(pic_response.content)
                            print(f"{valid_img_count}_Image saved: {image_filename}")
                        else:
                            print(f"Error downloading image for location: {location}_{view}")
                else:
                    print(f"Error fetching metadata for location: {location}_{view}")

                # Write the status of every image (whether "OK" or not)
                checkpoint_writer.writerow([latitude, longitude, view, status])
                checkpoint.flush()  # Ensure data is immediately written to file

                # Add the current image to the processed set
                processed_images.add((str(latitude), str(longitude), str(view)))

            except Exception as e:
                print(f"Error processing metadata or image: {e}")

            finally:
                # Close the response connections
                meta_response.close()
                if pic_response:
                    pic_response.close()  # Close only if pic_response was initialized

### Number of Images Checker
If Img Count doesn't match with number of possible locations. Go to file CoordinatesCheck.ipynb

In [27]:
print(f'Number of possible images = {len(df)*4}')

Number of possible images = 112376


In [28]:
file_path = 'image_status_checkpoint.csv'
check_df = pd.read_csv(file_path)
print(f'Number of checked images = {len(check_df)}')

Number of checked images = 112376


### Images number checker

Define function to count files in folder

In [29]:
def count_files(folder_path):
    # Initialize a counter variable
    file_count = 0

    # Iterate through each file in the folder
    for file_name in os.listdir(folder_path):
        # Check if the path is a file (not a directory)
        if os.path.isfile(os.path.join(folder_path, file_name)):
            file_count += 1

    return file_count

Count how many images we obtained and how many images had ZERO_RESULTS

In [30]:
# Call the function to count the files
num_ok_images = count_files(output_folder)
num_bad_files = img_count - num_ok_images

# Print the number of files in the folder
print(f"Number of good images: {num_ok_images}")
print(f"Number of bad images: {num_bad_files}")
print(f"Number of images: {num_ok_images+num_bad_files}")


Number of good images: 41956
Number of bad images: 70421
Number of images: 112377


Check the number of images we obtained

In [31]:
num_points = num_bad_files/4
print("Number of lost points:", num_points)

Number of lost points: 17605.25
