# Tutorial for Using Split-Raster for Deep Learning

This demo we will split a large image into small tiles. It is useful for deep learning and computer vision tasks. The package can also be used to split a large image into small tiles for other applications.

For example, we have a large image of size 1000-by-1000, and we want to split it into 256-by-256 tiles. The `SplitRaster` package successfully generate 16 256x256 images tiles with automatic padding on the edges. You can adjust the tile size and the overlap of the tiles for your own applications.

## Known Issues: OSGEO / GDAL

The osgeo module is part of the GDAL library, which is a translator library for raster and vector geospatial data formats.
You can install GDAL using pip, but it has some system dependencies. On a Mac, you can install these using Homebrew:
```bash
brew install gdal
```

then, you can install the Python GDAL package:

```bash
pip install GDAl
```
Please note that installing GDAL can be complex due to its system dependencies. If you encounter issues, you may need to consult the GDAL documentation or seek help from the community.

## Setup Env with Conda/MiniConda

Setup your local or cloud environment for this demo.

```Bash
conda create -n split_raster_py310 python=3.10 -y
conda activate split_raster_py310
conda install gdal -y
conda install ipykernel -y
pip install --upgrade pip
pip install splitraster
``` 

This demo we use the python 3.10, but the package is compatible with python 3.7, 3.8, 3.9, 3.10, 3.11, 3.12. 

In [1]:
# Clean the output folder
!rm -rf ../data/processed/RGB_TIF
!rm -rf ../data/processed/GT_TIF


In [2]:
from splitraster import geo

input_image_path = "../data/raw/TIF/RGB5k.tif"
gt_image_path = "../data/raw/TIF/GT5k.tif"

save_path = "../data/processed/RGB_TIF"
save_path_gt = "../data/processed/GT_TIF"

crop_size = 256
repetition_rate = 0 # <----- change this value to 0.5 for 50% overlap
overwrite = True # <----- change this value to False for no overwrite demo

n = geo.split_image(input_image_path, save_path, crop_size,
                   repetition_rate=repetition_rate, overwrite=overwrite)
print(f"{n} tiles sample of {input_image_path} are added at {save_path}")


n = geo.split_image(gt_image_path, save_path_gt, crop_size,
                   repetition_rate=repetition_rate, overwrite=overwrite)
print(f"{n} tiles sample of {gt_image_path} are added at {save_path_gt}")



Input Image File Shape (D, H, W):(3, 5000, 5000)
crop_size=256, stride=256
Padding Image File Shape (D, H, W):(3, 5120, 5120)


Generating: 100%|[32m██████████[0m| 400/400 [00:00<00:00, 2127.55img/s]


400 tiles sample of ../data/raw/TIF/RGB5k.tif are added at ../data/processed/RGB_TIF
Input Image File Shape (D, H, W):(1, 5000, 5000)
crop_size=256, stride=256
Padding Image File Shape (D, H, W):(1, 5120, 5120)


Generating: 100%|[32m██████████[0m| 400/400 [00:00<00:00, 2673.49img/s]

400 tiles sample of ../data/raw/TIF/GT5k.tif are added at ../data/processed/GT_TIF





In [3]:
!ls ../data/processed/RGB_TIF

0001.tif 0051.tif 0101.tif 0151.tif 0201.tif 0251.tif 0301.tif 0351.tif
0002.tif 0052.tif 0102.tif 0152.tif 0202.tif 0252.tif 0302.tif 0352.tif
0003.tif 0053.tif 0103.tif 0153.tif 0203.tif 0253.tif 0303.tif 0353.tif
0004.tif 0054.tif 0104.tif 0154.tif 0204.tif 0254.tif 0304.tif 0354.tif
0005.tif 0055.tif 0105.tif 0155.tif 0205.tif 0255.tif 0305.tif 0355.tif
0006.tif 0056.tif 0106.tif 0156.tif 0206.tif 0256.tif 0306.tif 0356.tif
0007.tif 0057.tif 0107.tif 0157.tif 0207.tif 0257.tif 0307.tif 0357.tif
0008.tif 0058.tif 0108.tif 0158.tif 0208.tif 0258.tif 0308.tif 0358.tif
0009.tif 0059.tif 0109.tif 0159.tif 0209.tif 0259.tif 0309.tif 0359.tif
0010.tif 0060.tif 0110.tif 0160.tif 0210.tif 0260.tif 0310.tif 0360.tif
0011.tif 0061.tif 0111.tif 0161.tif 0211.tif 0261.tif 0311.tif 0361.tif
0012.tif 0062.tif 0112.tif 0162.tif 0212.tif 0262.tif 0312.tif 0362.tif
0013.tif 0063.tif 0113.tif 0163.tif 0213.tif 0263.tif 0313.tif 0363.tif
0014.tif 0064.tif 0114.tif 0164.tif 0214.tif 0264.tif 0314.tif 0

## Random Sampling Code

If you want to create a small data set at the early stage for exploaration. Use the random sampling code, you can use the following code. The following code shows to geneate a 20 tiles (256x256) from the 1000x1000 image.

In [4]:
# Clean the output folder
!rm -rf ../data/processed/Rand/RGB_TIF
!rm -rf ../data/processed/Rand/GT_TIF


In [5]:
from splitraster import geo
input_image_path = "../data/raw/TIF/RGB5k.tif"
gt_image_path = "../data/raw/TIF/GT5k.tif"

input_save_path = "../data/processed/Rand/RGB_TIF"
gt_save_path = "../data/processed/Rand/GT_TIF"

n = geo.random_crop_image(input_image_path, input_save_path,  gt_image_path, gt_save_path, crop_size=500, crop_number=20, img_ext='.png', label_ext='.png', overwrite=True)

print(f"{n} sample paris of {input_image_path, gt_image_path} are added at {input_save_path, gt_save_path}.")

Generating: 100%|[32m██████████[0m| 20/20 [00:00<00:00, 692.43img/s]

20 sample paris of ('../data/raw/TIF/RGB5k.tif', '../data/raw/TIF/GT5k.tif') are added at ('../data/processed/Rand/RGB_TIF', '../data/processed/Rand/GT_TIF').





In [6]:
!ls ../data/processed/Rand/RGB_TIF

0001.png 0004.png 0007.png 0010.png 0013.png 0016.png 0019.png
0002.png 0005.png 0008.png 0011.png 0014.png 0017.png 0020.png
0003.png 0006.png 0009.png 0012.png 0015.png 0018.png


In [8]:
# print the current time
from datetime import datetime
print(f"Latest run time {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

Latest run time 2024-03-23 16:59:37


---