 Organizing files in different folders for testing and training. 

In [34]:
# from https://stackoverflow.com/questions/38511444/python-download-files-from-google-drive-using-url
import requests

def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)    

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None

def save_response_content(response, destination):
    """
    response = filename for input
    destination = filename for output
    """    
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)

Download the google drive file into a zipped folder on your computer called ```NWPU_images.zip```. This should be 405 MB

In [35]:
file_id = '14kkcuU6wd9UMvjaDrg3PNI-e_voCi8HL'
destination = 'NWPU_images.zip'
download_file_from_google_drive(file_id, destination)

#### Using system commands to work with the files

Unzip the folder (this may take a few minutes) as a new folder called ```images```

In [36]:
import zipfile
def unzip_nwpu(f):
    """
    f = file to be unzipped
    """    
    with zipfile.ZipFile(f, 'r') as zip_ref:
        zip_ref.extractall()

In [37]:
unzip_nwpu(destination)

Load file sysem utilities for moving and deleting files (```os``` and ```shutil```).

In [38]:
import shutil, os

Rename the ```images``` directory

In [39]:
try:
    os.rename('images','nwpu_images')
except:
    pass

Remove non-lake directories that we won't need. First find all subdirectories (except the first, which is the parent directory)

In [40]:
subdirecs = [x[0] for x in os.walk('nwpu_images')][1:]

then get a list of all subdirectories that do not contain the word "lake"

In [41]:
to_delete = [s for s in subdirecs if 'lake' not in s]

Use ```shutil.rmtree``` to delete the imagery

In [42]:
for k in to_delete:
    shutil.rmtree(k, ignore_errors=True) 

Finally, rename the subdirectory, for consistency with dataset 1. It will become apparent why we use the ```data``` subdirectory in Part 3

In [43]:
os.rename('nwpu_images'+os.sep+'lake','nwpu_images'+os.sep+'data')

In [6]:
import shutil, glob, os

## create a directory to move the images into. It is wrapped in a "try:except" loop 
## in case you have run this cell before and want to avoid errors
try:
    os.mkdir('nwpu_images'+os.sep+'data'+os.sep+'Testing')
except:
    pass

In [7]:
## create a directory to move the images into. It is wrapped in a "try:except" loop 
## in case you have run this cell before and want to avoid errors
try:
    os.mkdir('nwpu_images'+os.sep+'data'+os.sep+'Training')
except:
    pass

Split training and testing data using regular expression.  Files labelled 001 to 599 are part of training and ones labelled 600 to 700 are part of testing

In [26]:
import shutil, glob, os, re
# cycle through each jpg image in the current directory
print (os.getcwd())
os.chdir('c:\\Users\\rdubey\\Desktop\\DeepLearningSatelliteImage\\Buscombe_liveProject_Feb2020\\2_Data\\nwpu_images\\data')
print (os.getcwd())
f = []
path = os.getcwd()
f = os.listdir(path)



try:
    for file in f:
        #print (f)
        #print (len(f))
        
        #move to the new directory
        if re.match (r'lake+_+[0-5]+\d\d+\.+jpg', file):
            shutil.copy(file,'c:\\Users\\rdubey\\Desktop\\DeepLearningSatelliteImage\\Buscombe_liveProject_Feb2020\\2_Data\\nwpu_images\\data\\Training' )
            #print (f)
        else: 
            shutil.copy(file, 'c:\\Users\\rdubey\\Desktop\\DeepLearningSatelliteImage\\Buscombe_liveProject_Feb2020\\2_Data\\nwpu_images\\data\\Testing' )
except:
    pass        

c:\Users\rdubey\Desktop\DeepLearningSatelliteImage\Buscombe_liveProject_Feb2020\2_Data\nwpu_images\data
c:\Users\rdubey\Desktop\DeepLearningSatelliteImage\Buscombe_liveProject_Feb2020\2_Data\nwpu_images\data


##### Finally, a note about labels
If you have developed your own geospatial data sets in this way, you may find the [European Commission’s Global Surface Water Explorer](https://global-surface-water.appspot.com/) high-resolution label (“ground-truth”) data very useful. Using this dataset will likely require some familiarity with GIS such as [QGIS](https://qgis.org/en/site/) or geospatial processing such as [GDAL for python](https://pypi.org/project/GDAL/)