<a href="https://colab.research.google.com/github/PuChan-HCI/myweb/blob/main/DeepLearning-with-Pytorch/00_useful_goods.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Useful Goods
 Here is a collection of stories that will eventually come in handy. Topics will include file processing, dataset creation, and automation of processing.

## 01. Download a zip file and extract it to a designated directory
 You can mount your own Google Drive to read the data, but you can also download the zip file from the URL and unzip it without going through it.

In [1]:
from io import BytesIO
from urllib.request import urlopen
from zipfile import ZipFile

zipurl =  'https://puchan-hci.github.io/myweb/DeepLearning-with-Pytorch/data/00/sample.zip'
with urlopen(zipurl) as zipresp:
  with ZipFile(BytesIO(zipresp.read())) as zfile:
    zfile.extractall('./data')

 After successful downloading and unzipping, the file will appear in the file list on the left. <br>
<img src="https://puchan-hci.github.io/myweb/DeepLearning-with-Pytorch/img/00_1.jpg" width="30%">

##02. Create a new folder and its sub-folder

In [67]:
import sys, os
import shutil

# コピー先フォルダがあれば終了．無ければ作成する関数
# Function to create a folder and its subfolders by checking for the existence of such folders.
def check_dir(new_dirpath, new_sub_folder):

    # コピー先の親フォルダの存在有無の確認
    # Check for the existence of the parent folder of the destination folder
    if os.path.isdir(new_dirpath):
        print(new_dirpath + " already exists")
        pass
    else:
        os.mkdir(new_dirpath)
        print(new_dirpath + " created")

    # コピー先のサブフォルダの存在有無の確認
    # Check for the existence of sub-folders in the destination folder.
    new_sub_dirpath_list = [new_dirpath, new_sub_folder]
    new_sub_dirpath = os.path.join(*new_sub_dirpath_list)
    if os.path.isdir(new_sub_dirpath):
        print(new_sub_dirpath + " already exists")
        exit("exit")
    else:
        os.mkdir(new_sub_dirpath)

    return new_sub_dirpath

## 03.  Function to recursively search in the specified folder and copy files if the specified file extension is found.

In [68]:
# 指定フォルダ内を再帰的に探索し，指定拡張子があった場合にファイルをコピーする関数
def copy_files(check_dirpath, check_file, new_sub_dirpath):

    # コピー元フォルダがない場合に強制終了する
    if not os.path.isdir(check_dirpath):
        print("not found " + check_dirpath)
        exit("exit")
    else:
        # 再帰的にフォルダ探索
         # Recursively search for folders
        for dirpath, dirnames, filenames in os.walk(check_dirpath):
            for file in filenames:
                filepath = os.path.join(dirpath, file)
                # ファイルの拡張子を確認しマッチすれば処理する
                # Check file extensions and process if they match
                if file.endswith(check_file):
                    # ファイルをコピーする
                    # Copy files
                    shutil.copy2(filepath, new_sub_dirpath)
                    new_file_path = os.path.join(new_sub_dirpath, file)
                    print("copied -> ", new_file_path)

### Let's run it.
<img src="https://puchan-hci.github.io/myweb/DeepLearning-with-Pytorch/img/00_2.jpg" width="30%">

In [None]:
# 対象の画像とそのフォルダ
image_data = 'jpg'
image_folder = './data'

# 新しいフォルダ名（データセットフォルダ）
new_path = "./animal"
val_folder="val"
train_folder="train"

# コピー先フォルダがあれば終了。無ければ作成する関数
val_dirpath = check_dir(new_path, val_folder)
train_dirpath = check_dir(new_path, train_folder)
print(val_dirpath)
print(train_dirpath)

# 指定フォルダ内を再帰的に探索し、指定拡張子があった場合にファイルをコピーする関数
copy_files(image_folder, image_data, val_dirpath)
copy_files(image_folder, image_data, train_dirpath)


## 04. Separate multiple images in a folder by randomly dividing them into halves and copying each to a different folder.

In [None]:
import glob
import os
import random
import shutil

# 探索するフォルダのフルパスを指定
# Specify the full path of the file to search
folder_path = './data/*.jpg'

# フォルダ内の.jpgファイル一覧を取得
#  Get a list of .jpg files in the folder
file_list = glob.glob(folder_path)

# ファイルをランダムに半分に分け
#  Randomly split a file in half.
random.shuffle(file_list)
half_point = len(file_list) // 2
first_half = file_list[:half_point]
second_half = file_list[half_point:]

# それぞれのファイルを別のフォルダにコピー
#  Copy each file into a separate folder
new_path = 'dataset'
destination_folder1 = 'val'
destination_folder2 = 'train'
val_dirpath = check_dir(new_path, destination_folder1)
train_dirpath = check_dir(new_path, destination_folder2)

for selected_file in first_half:
    selected_filename = os.path.split(selected_file)[1]
    print('First half: ', selected_filename)
    shutil.copy(selected_file, os.path.join(new_path,destination_folder1, selected_filename))

for selected_file in second_half:
    selected_filename = os.path.split(selected_file)[1]
    print('Second half: ',selected_filename)
    shutil.copy(selected_file, os.path.join(new_path,destination_folder2, selected_filename))

In [71]:
!ls
!rm -rf  animal dataset

animal	data  dataset  sample_data
