# Image Data Cleaning: Rotation Sorting Code

## Introduction & Objectives <a name="intro"></a>

The purpose of this notebook was twofold:

1. Sort through a directory composed of several files containing Google Street View (GSV) images for a given city.

2. Identify and delete images with rotation components that are deemed undesirable by the project.

This process was specific to our project and may therefore not be applicable to other contexts. 

The folder structure was the following:

`City_A
    File_1.zip
        lattitude_longitude_verticalrotation_horizontalrotation.jpg
        lattitude_longitude_verticalrotation_horizontalrotation.jpg
        lattitude_longitude_verticalrotation_horizontalrotation.jpg
        ...
    File_2.zip
        lattitude_longitude_verticalrotation_horizontalrotation.jpg
        lattitude_longitude_verticalrotation_horizontalrotation.jpg
        lattitude_longitude_verticalrotation_horizontalrotation.jpg
        ...
    File_3.zip
        lattitude_longitude_verticalrotation_horizontalrotation.jpg
        lattitude_longitude_verticalrotation_horizontalrotation.jpg
        lattitude_longitude_verticalrotation_horizontalrotation.jpg
        ...
    ...`

The workflow is as follows:

1. [Getting Set-Up](#sec1)
2. [Deleting Undesirable Images](#sec2)

## Getting Set-Up <a name="sec1"></a>

Set the vertical rotations that you would like to keep among all GSV images. Rotations should be set as pairs of verticle and horizontal rotations within the list called `rotation_pairs` using the following syntax: `(vetical rotation, horizontal rotation)`.

In [19]:
# Import libraries
import zipfile, os, glob
import pandas as pd
import geopy.distance
import shutil

# Set working directory
working_directory = '/Users/lucamartial/Desktop/Cambridge Practicum/CV Project/Image exploration/Round 1'
os.chdir(working_directory)

# Choosing pairs of vertical and horizontal rotations to keep
rotation_pairs = [(90,0), (270,0)]

# Creating list of rotations to keep in format readable by os
rotations = []
for i in rotation_pairs:
    rotations.append('_'.join(map(str, i)) + '.jpg')
    
# Converting list to tuple for os processing
rotations = tuple(rotations)

## Deleting Undesirable Images <a name="sec2"></a>

Once all archived files containing GSV images for a city have been downloaded and unzipped, run the following code to delete all rotations that are deemed undesirable:

In [21]:
# Walk current directory
for (root, dirs, files) in os.walk(working_directory):
  for name in files:
    if not(name.endswith(rotations)): # If image name does not end with rotations to keep, delete image
      os.remove(os.path.join(root, name))