# 04. PyTorch Custom Datasets

- A custom dataset is a collection of data relating to(관련된) a specific problem you're working on.
- a custom dataset can be comprised of(구성될 수 있다) almost anything.
- 어떤걸 만들려고 한다면 그거에 맞는 커스텀 데이터셋이 필요하다다
- PyTorch includes many existing functions to load in various custom datasets in the TorchVision, TorchText, TorchAudio and TorchRec domain libraries.



## 0. Importing PyTorch and setting up device-agnostic(기기 독립적) code

In [None]:
import torch
from torch import nn  # 뉴럴 네트워크 신경망

torch.__version__

'2.7.1+cu126'

In [2]:
# Setup device-agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"


## 1. Get data

In [None]:
import requests  #요청하다, 요구하다 #웹사이트에서 데이터(파일, 텍스트)를 가지고 오는데 유용용
import zipfile
from pathlib import Path

# Setup path to data folder
data_path = Path("data/'")
image_path = data_path / "pizza_steak_sushi"   #경로 만들어 놓기기

# Setup path to data folder
if image_path.is_dir():   #디렉토리가 있냐?
    print(f"{image_path} directory exists")
else:
    print(f"Did not find {image_path} directory, creating one....")
    image_path.mkdir(parents = True, exist_ok = True) #디렉토리 만들기
    
    # Download pizza, steak, sushi data
    with open(data_path / "pizza_steak_sushi.zip", "wb") as f:  #파일을 새로 만들기 위한 준비 작업 # 아직 pizza_steak_sushi.zip은 없어
                                                                #"wb" write binary 이진 모드로 쓰기(zip,이미지, 영상을 작성 시), "빈 파일" 열어두기
        request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip") # get 함수로 주소의 데이터 요청하기
        print("Downloading pizza,steak,sushi data...")                                                                   #GitHub에 있는 .zip 파일을 "GET 요청"으로 받아온다
        f.write(request.content)  #request.content 안에 "pizza_steak_sushi.zip" 이 있다.
        # as f 열린 파일 객체에 이름 부여하기
        
    # Unzip pizza, steak, sushi data # 압축된 덩어리를 풀기 위해 zipfile.ZipFile를 사용
    with zipfile.ZipFile(data_path / "pizza_steak_sushi.zip", 'r') as zip_ref:
        print("Unzipping pizza, steak, sushi data...")
        zip_ref.extractall(image_path)   #아카이브(보관소)에 저장된 모든 member를 가지고 온다는데? #zip 안에 있는 모든 파일을 한꺼번에 불러와줌 


Did not find data\'\pizza_steak_sushi directory, creating one....
Downloading pizza,steak,sushi data...
Unzipping pizza, steak, sushi data...


## 2. 데이터 준비

In [8]:
## 하위디렉토리의 수, 하위디렉토리의 이미지 수, 각 하위 디렉토리의 이름 walk하기기

import os 
def walk_through_dir(dir_path):
    """
  Walks through dir_path returning its contents.
  Args:
    dir_path (str or pathlib.Path): target directory
  
  Returns:
    A print out of:
      number of subdiretories in dir_path
      number of images (files) in each subdirectory
      name of each subdirectory
  """
    for dirpath, dirnames, filenames in os.walk(dir_path):
        print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'.")
     

In [None]:
walk_through_dir(image_path) # image_path 로 했으니  zip 데이터는 불러오지 않는다.

There are 2 directories and 0 images in 'data\'\pizza_steak_sushi'.
There are 3 directories and 0 images in 'data\'\pizza_steak_sushi\test'.
There are 0 directories and 25 images in 'data\'\pizza_steak_sushi\test\pizza'.
There are 0 directories and 19 images in 'data\'\pizza_steak_sushi\test\steak'.
There are 0 directories and 31 images in 'data\'\pizza_steak_sushi\test\sushi'.
There are 3 directories and 0 images in 'data\'\pizza_steak_sushi\train'.
There are 0 directories and 78 images in 'data\'\pizza_steak_sushi\train\pizza'.
There are 0 directories and 75 images in 'data\'\pizza_steak_sushi\train\steak'.
There are 0 directories and 72 images in 'data\'\pizza_steak_sushi\train\sushi'.


In [10]:
# Setup train and testing paths
train_dir = image_path / "train"
test_dir = image_path / "test"

train_dir, test_dir

(WindowsPath("data/'/pizza_steak_sushi/train"),
 WindowsPath("data/'/pizza_steak_sushi/test"))

### 2.1 Visualize an image