# ImageNet Dataset Setup Notebook

This notebook is dedicated to the initial setup and preparation of the ImageNet dataset for machine learning tasks. It encompasses the following key processes:

1. **Downloading the Dataset**: Automates the download of the ImageNet dataset (both training and validation parts) from the official source.

2. **Extracting the Dataset**: Methodically extracts the downloaded dataset, which is initially in compressed tar file format, into a structured directory format suitable for machine learning models. This includes creating separate directories for each class in the training set.

3. **File Path Extraction**: Iterates through the extracted directories to compile a comprehensive list of file paths for all images. This list is crucial for efficient data loading during the model training process.

4. **Saving File Paths**: Saves the generated list of image file paths to a file on Google Drive. This enables easy and quick access to the dataset in future sessions or in other notebooks, particularly in model training and validation stages.

Overall, this notebook is intended to streamline the data handling aspect of working with the large-scale ImageNet dataset, ensuring that subsequent stages of the project, such as model training and evaluation, can proceed smoothly and efficiently.


In [None]:
from google.colab import drive
# Will provide you with an authentication link
drive.mount('/content/drive')

In [None]:
import os

target_dir = '/content/drive/MyDrive/AnomalyDetection/Datasets/ImageNet/TrainValTar'
os.makedirs(target_dir, exist_ok=True)

In [None]:
!wget -c https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_train.tar -P {target_dir}

In [None]:
!wget -c https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar -P {target_dir}

In [None]:
%cd scripts/
!chmod +x extract_imagenet.sh
!./extract_imagenet.sh