<a href="https://colab.research.google.com/github/fellowship/platform-demos3/blob/master/InriaAerialImages/Preprocessing_Slidingwindowpatches256_withoverlap%26padding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Inria aerial satellite Images

The main aim of this challenge is to be able to classify pixels as 'building' or 'not building'. This is a clear case of semantic segmentation. We have180 aerial images from various cities with a resolution of 5000 X 5000, along with the segmented masks of 5000 X 5000. Our goal is to classify each of the pixels in the test image and generate a masks for the test images too.

In this notebook, we are going to download the dataset which provided by Inria and Since each image is of a high dimension and we only have 180 training images, we are slicing the original image into 256X256 size with 6 pixels of overlap. And save these sliced images into drive for further use in training the model.

## Download dataset

In [0]:
#Downloading dataset
!curl -k https://files.inria.fr/aerialimagelabeling/getAerial.sh | bash

In [0]:
!rm -rf /content/aerialimagelabeling*
!rm -rf /content/NEW2-AerialImageDataset.zip

In [0]:
!curl -s https://course.fast.ai/setup/colab | bash
!pip install PyDrive

Updating fastai...
Done.
Collecting PyDrive
[?25l  Downloading https://files.pythonhosted.org/packages/52/e0/0e64788e5dd58ce2d6934549676243dc69d982f198524be9b99e9c2a4fd5/PyDrive-1.3.1.tar.gz (987kB)
[K     |████████████████████████████████| 993kB 2.9MB/s 
Building wheels for collected packages: PyDrive
  Building wheel for PyDrive (setup.py) ... [?25l[?25hdone
  Stored in directory: /root/.cache/pip/wheels/fa/d2/9a/d3b6b506c2da98289e5d417215ce34b696db856643bad779f4
Successfully built PyDrive
Installing collected packages: PyDrive
Successfully installed PyDrive-1.3.1


In [0]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [0]:
#import required modules
import tarfile,re,os,pathlib,shutil

import fastai
from fastai.vision import *
import torch
import torchvision.transforms.functional as F

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

## Preprocess the data

In [0]:
img_path = Path('/content/AerialImageDataset/train/images')
mask_path = Path('/content/AerialImageDataset/train/gt')
test_path = Path('/content/AerialImageDataset/test/images')
valid_path = Path('/content/AerialImageDataset/valid')

images_patch_path = Path('/content/AerialImagePatches/train')
masks_patch_path = Path('/content/AerialImagePatches/masks')
tests_patch_path = Path('/content/AerialImagePatches/test')
valids_patch_path = Path('/content/AerialImagePatches/valid')

Split train to valid by moving the first 5 images from each city to valid folder

In [0]:
#move first five images from each city for valid hold outs

def train_valid_split(src: str, dst: str):
  if not os.path.isdir(dst):
    dst.mkdir(parents=True, exist_ok=True)
  for file in os.listdir(src):
    if re.match("[a-zA-Z]*[1-5]{1}\.tif", file) or re.match("[a-zA-Z]*-\w[1-5]{1}\.tif", file):
      shutil.move(src/file, dst/file)
        
train_valid_split(img_path,valid_path)

Clean up any previously created folder for AerialImagePatches

In [0]:
!rm -rf AerialImagePatches

Functions to generate patches of 256 X 256 with overlap of 6 pixels from 5k X 5k images and save it for training model.

In [0]:
#variables declaration
orig_images = ImageList.from_folder(img_path)
orig_valids = ImageList.from_folder(valid_path)
orig_masks = ImageList.from_folder(mask_path)
orig_tests = ImageList.from_folder(test_path)

get_patch_file_name = lambda pth,x,y: pth/f'{x.stem.split("/")[-1]}_{y}.tif'

channels = 3
size = 256
step = 250
pad = 3 # As we using overlap, without padding the image we loose some pixels and difficult to stitch back into original image

In [0]:
def generate_patches(image, i, dest_img_path, channels:int, size:int, step:int):
  img = PIL.Image.open(image) #open the image
  img = img = F.pad(img, (pad, pad, pad, pad)) # add padding to the image
  img = pil2tensor(img.convert("RGB"), np.float32).div_(255) # convert to tensor
  patches = img.data.unfold(0, channels, channels).unfold(1, size, step).unfold(2, size, step)# generate patches
  cnt = 1
  for i in range(patches.shape[0]):
    for j in range(patches.shape[1]):
      for k in range(patches.shape[2]):
        patch = patches[i][j][k]
        imgpil = F.to_pil_image(patch)
        filname = get_patch_file_name(dest_img_path, image, cnt) #Make proper naming convention to use it later while stitching at the end of prediction
        cnt = cnt+1
        imgpil.save(filname) #save the sliced image

# create smaller patch sets 
sets = [(orig_images,images_patch_path,channels,size,step),(orig_valids,valids_patch_path,channels,size,step),(orig_masks, masks_patch_path,channels,size,step),(orig_tests,tests_patch_path,channels,size,step)]
for img_lst,dst_path,channels,size,step in sets:
  if not dst_path.exists():
    dst_path.mkdir(parents=True, exist_ok=True)
  parallel(partial(generate_patches, dest_img_path=dst_path, channels=channels, size=size, step=step), img_lst.items)

In [0]:
len(get_image_files(img_path)),len(get_image_files(mask_path)),len(get_image_files(valid_path)),len(get_image_files(test_path))

(155, 180, 25, 180)

In [0]:
len(get_image_files(images_patch_path)),len(get_image_files(masks_patch_path)), len(get_image_files(valids_patch_path)), len(get_image_files(tests_patch_path))

(62000, 72000, 10000, 72000)

In [0]:
train_source_dir = '/content/AerialImagePatches/train'
valid_source_dir = '/content/AerialImagePatches/valid'
mask_source_dir = '/content/AerialImagePatches/masks'
test_source_dir = '/content/AerialImagePatches/test'

In [0]:
# To create tar file
def make_tarfile(output_filename, source_dir):
  with tarfile.open(output_filename, "w:gz") as tar:
    tar.add(source_dir, arcname=os.path.basename(source_dir))

In [0]:
make_tarfile('train.tar', train_source_dir)

In [0]:
make_tarfile('masks.tar', mask_source_dir)

In [0]:
make_tarfile('valid.tar', valid_source_dir)

In [0]:
make_tarfile('test.tar', test_source_dir)

In [0]:
# Create GoogleDrive instance with authenticated GoogleAuth instance.
drive = GoogleDrive(gauth)

def upload_to_drive(file, title):
  uploaded = drive.CreateFile({'title': title})
  uploaded.SetContentFile(file)
  uploaded.Upload()
  print('Uploaded file %s with ID %s'%(file, uploaded.get('id')))

In [0]:
upload_to_drive("train.tar","train.tar")

Uploaded file train.tar with ID 1O2zzW0MiL_nM8mP7YH3lBXusdvEMhq8o


In [0]:
upload_to_drive("masks.tar","masks.tar")

Uploaded file masks.tar with ID 1aWbwESwvfSbncf6QDUXOcXn2MNK3mTb6


In [0]:
upload_to_drive("valid.tar","valid.tar")

Uploaded file valid.tar with ID 1D80DpCfFFI4dY-B-7KtDogCEboPCBV3_


In [0]:
upload_to_drive("test.tar","test.tar")

Uploaded file maskoriginal.tar with ID 12Udu0DJKurINRh0xhc9fFojm0N79lQNp


In [0]:
#remove unwated files
!rm -rf AerialImageDataset