In [11]:
import os
import subprocess


# TACO Dataset Import and Preparation

This notebook automates the setup process for using the TACO dataset. It includes steps to:
1. Clone the official TACO GitHub repository.
2. Install necessary dependencies.
3. Download the dataset.
4. Verify the dataset and its structure.

The TACO dataset is commonly used for trash classification and object detection tasks. This notebook ensures a smooth setup process for subsequent analysis and modeling.


## Step 1: Clone the TACO GitHub Repository

To use the TACO dataset, the repository must first be cloned from GitHub. This step uses the `git` command to create a local copy of the repository.

If `git` is not installed or accessible in the system path, the process will fail, and the user will be prompted to install Git.


In [12]:
# Step 1: Clone the TACO GitHub repository
repo_url = "https://github.com/pedropro/TACO.git"
if not os.path.exists("TACO"):
    try:
        # Use full path to Git if necessary
        git_command = "git"
        if os.name == 'nt':
            git_command = "C:\Program Files\Git\git-bash.exe"  # Update this path if Git is installed elsewhere on Windows
        
        result = subprocess.run([git_command, "clone", repo_url], check=True, capture_output=True, text=True)
        print(result.stdout)
    except FileNotFoundError:
        print("Error: Git is not installed or not found in the system path. Please install Git and ensure it is accessible.")
    except subprocess.CalledProcessError as e:
        print("Error during Git clone:", e.stderr)

## Step 2: Install Dependencies

The repository contains a `requirements.txt` file that specifies the necessary Python dependencies. This step installs these dependencies using `pip`.

If `pip` is not installed, the user will be notified and must install it to proceed.


## Step 3: Install COCO API

The TACO dataset relies on the COCO (Common Objects in Context) API for annotation parsing and visualization. This step installs the COCO API directly from its GitHub repository.


In [None]:
# Step 2: Change into the TACO directory
if os.path.exists("TACO"):
    os.chdir("TACO")

# Step 3: Install dependencies
requirements_path = "requirements.txt"
if os.path.exists(requirements_path):
    try:
        subprocess.run(["pip", "install", "-r", requirements_path], check=True)
    except FileNotFoundError:
        print("Error: Pip is not installed or not found in the system path. Please install Pip and ensure it is accessible.")

## Step 4: Download the Dataset

The dataset can be downloaded using a script provided in the TACO repository (`download.py`). This step ensures that the images and annotations are correctly retrieved and stored in a `data` directory.


In [14]:
# Install COCO API for Jupyter notebook support
subprocess.run(["pip", "install", "git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI"])

# Step 4: Download the dataset images
if not os.path.exists("data"):  # Check if the data directory exists
    subprocess.run(["python", "download.py"])

# Step 5: Confirm dataset download
if os.path.exists("data"):
    print("Dataset downloaded successfully and is ready for use!")
else:
    print("Dataset download failed. Please check for errors.")


Dataset downloaded successfully and is ready for use!


## Conclusion

This notebook streamlines the process of importing and preparing the TACO dataset, saving time and ensuring consistency. If any errors occur during the setup, check the system's configuration (e.g., Git and Pip installations) or consult the TACO GitHub repository for troubleshooting.

You can now proceed to data exploration, preprocessing, and modeling using the TACO dataset.
