# Unpack the SEN12MSCR Dataset in the required format

Stand alone notebook to download and correctly format the SEN12MSCR dataset for the DSen2-CR Model. Running this notebook in full will satisfy everything you need for getting and reformatting the dataset correctly.  However, it may be a better strategy to run the commands under "Get dataset from online" and "Decompress the dataset into current folder" within a terminal instead for ease and speed.

## Setup

Set the kernel of this notebook to the dsen2cr_env conda environment:

Open a terminal with the dsen2cr_env environment activated and run:
```conda install -c anaconda ipykernel```\
```python -m ipykernel install --user --name dsen2cr_env```

Set the kernel to dsen2cr_env using the button in the top right corner.

In [5]:
#Please update this folder. It is the folder that the data will be distracted to. It should be the same folder set as input_data_folder variable in dsen2cr_main.py
input_data_folder = '/path/to/input_data_folder'

## Get dataset from online.
Note: these commands may run significantly quicker in parallel, with each being done in separate terminal at the same time.

In [None]:
!wget --no-check-certificate -O ROIs1158_spring_s1.tar.gz "https://dataserv.ub.tum.de/s/m1554803/download?path=%2F&files=ROIs1158_spring_s1.tar.gz" 

In [None]:
!wget --no-check-certificate -O ROIs1158_spring_s2_cloudy.tar.gz "https://dataserv.ub.tum.de/s/m1554803/download?path=%2F&files=ROIs1158_spring_s2_cloudy.tar.gz" 

In [None]:
!wget --no-check-certificate -O ROIs1158_spring_s2.tar.gz "https://dataserv.ub.tum.de/s/m1554803/download?path=%2F&files=ROIs1158_spring_s2.tar.gz" 

In [None]:
!wget --no-check-certificate -O ROIs1868_summer_s1.tar.gz "https://dataserv.ub.tum.de/s/m1554803/download?path=%2F&files=ROIs1868_summer_s1.tar.gz" 

In [None]:
!wget --no-check-certificate -O ROIs1868_summer_s2_cloudy.tar.gz "https://dataserv.ub.tum.de/s/m1554803/download?path=%2F&files=ROIs1868_summer_s2_cloudy.tar.gz" 

In [None]:
!wget --no-check-certificate -O ROIs1868_summer_s2.tar.gz "https://dataserv.ub.tum.de/s/m1554803/download?path=%2F&files=ROIs1868_summer_s2.tar.gz" 

In [None]:
!wget --no-check-certificate -O ROIs1970_fall_s1.tar.gz "https://dataserv.ub.tum.de/s/m1554803/download?path=%2F&files=ROIs1970_fall_s1.tar.gz" 

In [None]:
!wget --no-check-certificate -O ROIs1970_fall_s2_cloudy.tar.gz "https://dataserv.ub.tum.de/s/m1554803/download?path=%2F&files=ROIs1970_fall_s2_cloudy.tar.gz" 

In [None]:
!wget --no-check-certificate -O ROIs1970_fall_s2.tar.gz "https://dataserv.ub.tum.de/s/m1554803/download?path=%2F&files=ROIs1970_fall_s2.tar.gz" 

In [None]:
!wget --no-check-certificate -O ROIs2017_winter_s1.tar.gz "https://dataserv.ub.tum.de/s/m1554803/download?path=%2F&files=ROIs2017_winter_s1.tar.gz" 

In [None]:
!wget --no-check-certificate -O ROIs2017_winter_s2_cloudy.tar.gz "https://dataserv.ub.tum.de/s/m1554803/download?path=%2F&files=ROIs2017_winter_s2_cloudy.tar.gz" 

In [None]:
!wget --no-check-certificate -O ROIs2017_winter_s2.tar.gz "https://dataserv.ub.tum.de/s/m1554803/download?path=%2F&files=ROIs2017_winter_s2.tar.gz"

## Decompress the dataset into current folder
Add v to the options to see verbose output from the tar command ( !tar xzvf instead of !tar xzf)\
Note: these commands may run significantly quicker in parallel, with each being done in separate terminal at the same time.

In [None]:
!tar -xzf ROIs1158_spring_s1.tar.gz

In [None]:
!tar -xzf ROIs1158_spring_s2_cloudy.tar.gz

In [None]:
!tar -xzf ROIs1158_spring_s2.tar.gz

In [None]:
!tar -xzf ROIs1868_summer_s1.tar.gz

In [None]:
!tar -xzf ROIs1868_summer_s2_cloudy.tar.gz

In [None]:
!tar -xzf ROIs1868_summer_s2.tar.gz

In [None]:
!tar -xzf ROIs1970_fall_s1.tar.gz

In [None]:
!tar -xzf ROIs1970_fall_s2_cloudy.tar.gz

In [None]:
!tar -xzf ROIs1970_fall_s2.tar.gz

In [None]:
!tar -xzf ROIs2017_winter_s1.tar.gz

In [None]:
!tar -xzf ROIs2017_winter_s2_cloudy.tar.gz

In [None]:
!tar -xzf ROIs2017_winter_s2.tar.gz

## Create new folders to move data into

In [None]:
import os
import shutil
s1_folder = os.path.join(input_data_folder, "s1")
!mkdir -p {s1_folder}
s2_cloudy_folder = os.path.join(input_data_folder, "s2_cloudy")
!mkdir -p {s2_cloudy_folder}
s2_cloudFree_folder = os.path.join(input_data_folder, "s2_cloudFree")
!mkdir -p {s2_cloudFree_folder}

## Rename and move all data into the input_data_folder

In [None]:
import os
import shutil

s1_data = ["ROIs1158_spring_s1", "ROIs1868_summer_s1", "ROIs1970_fall_s1", "ROIs2017_winter_s1"]

#unpack all files within the folders in the s1_data list
for season_sentinel_folder in s1_data:
    for sentinel_ROI_folder in os.listdir(season_sentinel_folder):
        for filename in os.listdir(os.path.join(season_sentinel_folder,sentinel_ROI_folder)):
            if(filename != ".ipynb_checkpoints"):
                #extract information from filename
                season_id, season, sentinel, ROI, patch = filename.split("_")
                patch = patch[1:]
                #create new filename
                new_filename = f"{season_id}_{season}_{ROI}_p{patch}"
                #rename from old filename to new filename, and put into correct folder
                to_rename = os.path.join(season_sentinel_folder, sentinel_ROI_folder, filename)
                new_name =  os.path.join(input_data_folder, "s1",new_filename) #CHECK THIS
                shutil.move(to_rename, new_name)

In [None]:
s2_cloudy_data = ["ROIs1158_spring_s2_cloudy", "ROIs1868_summer_s2_cloudy", "ROIs1970_fall_s2_cloudy", "ROIs2017_winter_s2_cloudy"]

#unpack all files within the folders in the s2_cloudy_data list
for season_sentinel_folder in s2_cloudy_data:
    for sentinel_ROI_folder in os.listdir(season_sentinel_folder):
        for filename in os.listdir(os.path.join(season_sentinel_folder, sentinel_ROI_folder)):
            if(filename != ".ipynb_checkpoints"):
                #extract information from filename
                season_id, season, sentinel1, sentinel2, ROI, patch = filename.split("_")
                patch = patch[1:]
                #create new filename
                new_filename = f"{season_id}_{season}_{ROI}_p{patch}"
                #rename from old filename to new filename, and put into correct folder
                to_rename = os.path.join(season_sentinel_folder, sentinel_ROI_folder, filename)
                new_name =  os.path.join(input_data_folder, "s2_cloudy", new_filename) #CHECK THIS
                shutil.move(to_rename, new_name)

In [None]:
s2_data = ["ROIs1158_spring_s2", "ROIs1868_summer_s2", "ROIs1970_fall_s2", "ROIs2017_winter_s2"]

#unpack all files within the folders in the s2_data list
for season_sentinel_folder in s2_data:
    for sentinel_ROI_folder in os.listdir(season_sentinel_folder):
        for filename in os.listdir(os.path.join(season_sentinel_folder, sentinel_ROI_folder)):
            if(filename != ".ipynb_checkpoints"):
                #extract information from filename
                season_id, season, sentinel, ROI, patch = filename.split("_")
                patch = patch[1:]
                #create new filename
                new_filename = f"{season_id}_{season}_{ROI}_p{patch}"
                #rename from old filename to new filename, and put into correct folder
                to_rename = os.path.join(season_sentinel_folder,sentinel_ROI_folder, filename)
                new_name =  os.path.join(input_data_folder, "s2_cloudFree", new_filename) #CHECK THIS
                shutil.move(to_rename, new_name)

# Ensure all files have been downloaded, extracted, and moved
Get number of files in s1, s2_cloudy, and s2_cloudFree folders

In [None]:
import subprocess
cmd_p1 = subprocess.Popen(["ls", s1_folder], stdout=subprocess.PIPE)
cmd_p2 = subprocess.Popen(["wc","-l"], stdin=cmd_p1.stdout, stdout = subprocess.PIPE)
len_s1 = cmd_p2.stdout.name
cmd_p1.stdout.close()
cmd_p2.stdout.close()

cmd_p1 = subprocess.Popen(["ls", s2_cloudy_folder], stdout=subprocess.PIPE)
cmd_p2 = subprocess.Popen(["wc","-l"], stdin=cmd_p1.stdout, stdout = subprocess.PIPE)
len_s2_cloudy = cmd_p2.stdout.name
cmd_p1.stdout.close()
cmd_p2.stdout.close()

cmd_p1 = subprocess.Popen(["ls", s2_cloudFree_folder], stdout=subprocess.PIPE)
cmd_p2 = subprocess.Popen(["wc","-l"], stdin=cmd_p1.stdout, stdout = subprocess.PIPE)
len_s2_cloudFree = cmd_p2.stdout.name
cmd_p1.stdout.close()
cmd_p2.stdout.close()

print("Number of s1, s2_cloudy, and s2_cloudFree files: ", len_s1, " ", len_s2_cloudy, " ", len_s2_cloudFree)

Check whether there are the correct number of files

In [None]:
if (len_s1 != 122218 or len_s2_cloudy != 122218 or len_s2_cloudFree != 122218):
    print("errors with downloading, extracting, and moving data")
else:
    print("Data Processed successfully! You may remove all of the generated files and folders.")