# Download BD Sports-10 Dataset Through PyPI Package  

### ========================================================================================================
## Note:
## ⚠️The package and dataset may be updated in the future, so ensure you are using the correct version.⚠️
### You can easily switch dataset versions:
 - To use the resized version, comment the original import and uncomment the resized import.
 - Update the zip_file variable accordingly if you want to check the resized dataset.
### ========================================================================================================


## 📥 Download and Prepare BD Sports-10 Dataset

The **BD Sports-10 Dataset** is available in two versions:

- **Resized Version (224×224 pixels, 3,000 videos)** – optimized for machine learning and deep learning.  
- **Original Version (1920×1080 pixels, 43.84 GB)** – full-resolution videos for detailed analysis (requires ~100 GB disk space).

### Step-by-Step Guide

1. **Install the dataset**
   - Use `pip` to install the **resized** or **original** version.
   - Example for resized version: `!pip install bd-sports-10-resized==0.4.0`
   - Example for original version: `!pip install bd-sports-10-original==0.2.0`

2. **Import the download function**
   - Import the function corresponding to the version you want to download.

3. **Run the download**
   - Execute the download function to fetch the dataset into your environment.

4. **Verify the download**
   - Optionally, check if the ZIP file exists to confirm the download was successful.

5. **Extract the dataset**
   - Use a helper function to extract all videos from the ZIP file, including any nested ZIPs.
   - Optionally, remove unnecessary folders after extraction.

6. **Explore the dataset**
   - List the class folders and sample videos to verify the dataset structure.
   - Each class folder contains 300 videos, and there are 10 sports classes.

### ⚠️ Notes

- You can easily switch between **resized** and **original** versions by importing the respective download function.  
- The **resized version** is lightweight and suitable for ML/DL tasks, while the **original version** provides full-resolution videos for detailed research.


In [None]:
import os

In [2]:


!pip install bd-sports-10-resized==0.4.0

# or You can download the Original version size (43.84 GB), you must have 100 GB space in your hard disk

# !pip install bd-sports-10-original==0.2.0

Collecting bd-sports-10-resized==0.4.0
  Downloading bd_sports_10_resized-0.4.0-py3-none-any.whl.metadata (4.1 kB)
Downloading bd_sports_10_resized-0.4.0-py3-none-any.whl (10 kB)
Installing collected packages: bd-sports-10-resized
Successfully installed bd-sports-10-resized-0.4.0


In [3]:
import os

# ==========================
# Step 1: Install the package
# ==========================
# For the resized version:
# pip install bd-sports-10-resized==0.4.0

# For the original version:
# pip install bd-sports-10-original==0.2.0


# ==========================
# Step 2: Import the download function
# ==========================
# For the resized version:
from bd_sports_10_dataset_resized import download_dataset

# For the original version:
# from bd_sports_10_dataset_original import download_dataset


# ==========================
# Step 3: Run the download
# ==========================
download_dataset()


# ==========================
# Step 4 (Optional): Check if the ZIP file exists
# ==========================
# For the resized version:
zip_file_resized = "bd_sports_10_dataset_resized.zip"

if os.path.exists(zip_file_resized):
    print("✅ Original dataset download successful!")
else:
    print("❌ Download failed.")


# For the original version:

# zip_file_original = "bd_sports_10_dataset_original.zip"

# if os.path.exists(zip_file_original):
#     print("✅ Original dataset download successful!")
# else:
#     print("❌ Download failed.")


# ==========================
# Note:
# You can easily switch dataset versions:
# - To use the resized version, comment the original import and uncomment the resized import.
# - Update the zip_file variable accordingly if you want to check the resized dataset.
# ==========================


Downloading: 100%|█████████████████████████████████████████████| 3.51G/3.51G [02:29<00:00, 25.2MB/s]



✅ Download completed in 150.97 seconds.
✅ Original dataset download successful!


# Extract the zip file and remove unnecessary inner folder

# 📝 Note:
### The following ZIP extraction code is intended for the **Resized version** of the BD Sports-10 dataset.
### For the original version, which contains two separate folders ("annotations" and "dataset"),
### you may need slightly different extraction logic to handle both folders properly.


In [None]:
import zipfile
import os
import shutil

def extract_all(zip_path, extract_dir, unwanted_folder_name=None):
    """
    Extract all files from a ZIP and any nested ZIPs into extract_dir.
    Removes inner ZIPs after extraction and optionally deletes unwanted folders.
    """
    if not os.path.exists(zip_path):
        print(f"❌ ZIP file not found: {zip_path}")
        return

    os.makedirs(extract_dir, exist_ok=True)

    def extract_zip(z_path, target_dir):
        """Helper to extract a single ZIP file."""
        try:
            with zipfile.ZipFile(z_path, 'r') as zip_ref:
                zip_ref.extractall(target_dir)
                print(f"✅ Extracted: {z_path}")
        except zipfile.BadZipFile:
            print(f"⚠️ Skipped invalid ZIP: {z_path}")

    # Step 1: Extract main ZIP
    extract_zip(zip_path, extract_dir)

    # Step 2: Recursively extract nested ZIPs
    extracted_new_zip = True
    while extracted_new_zip:
        extracted_new_zip = False
        for root, dirs, files in os.walk(extract_dir):
            for file in files:
                if file.lower().endswith(".zip"):
                    nested_zip = os.path.join(root, file)
                    extract_zip(nested_zip, extract_dir)
                    os.remove(nested_zip)
                    extracted_new_zip = True


# Usage

zip_path = "/kaggle/working/bd_sports_10_dataset_resized.zip"
extract_dir = "/kaggle/working/extracted"


extract_all(zip_path, extract_dir, unwanted_folder_name)


folder_path = "/kaggle/working/extracted/BD Sports-10 Dataset (224×224 Pixels, Resized Vers"

if os.path.exists(folder_path):
    shutil.rmtree(folder_path)
    print(f"🗑️ Successfully removed: {folder_path}")
else:
    print("❌ Folder not found.")





# Print the Total Sports folder structure

In [None]:
import os

base_dir = "/kaggle/working/extracted/BD_Sports_10"

# list class folders
classes = sorted(os.listdir(base_dir))

# show few sample videos from each class
for cls in classes[:3]:  # show first 3 classes only
    class_path = os.path.join(base_dir, cls)
    videos = sorted(os.listdir(class_path))[:]  # first 5 videos
    print(f"\n🎯 Class: {cls} ({len(os.listdir(class_path))} Sports Classes total)")
    for v in videos:
        print("   ├──", v)


# Show the  bd_sports_10_resized dataset metadata

In [3]:
!pip show -f bd-sports-10-resized


Name: bd_sports_10_resized
Version: 0.4.0
Summary: Resized version of BD Sports 10 dataset with downloader and progress bar
Home-page: https://data.mendeley.com/datasets/rnh3x48nfb/1
Author: Wazih Ullah Tanzim, Syed Md. Minhaz Hossain
Author-email: wazihullahtanzim@gmail.com
License: CC BY 4.0
Location: /usr/local/lib/python3.11/dist-packages
Requires: requests, tqdm
Required-by: 
Files:
  bd_sports_10_dataset_resized/__init__.py
  bd_sports_10_dataset_resized/__pycache__/__init__.cpython-311.pyc
  bd_sports_10_dataset_resized/__pycache__/downloader.cpython-311.pyc
  bd_sports_10_dataset_resized/downloader.py
  bd_sports_10_resized-0.4.0.dist-info/INSTALLER
  bd_sports_10_resized-0.4.0.dist-info/METADATA
  bd_sports_10_resized-0.4.0.dist-info/RECORD
  bd_sports_10_resized-0.4.0.dist-info/REQUESTED
  bd_sports_10_resized-0.4.0.dist-info/WHEEL
  bd_sports_10_resized-0.4.0.dist-info/licenses/LICENSE.txt
  bd_sports_10_resized-0.4.0.dist-info/top_level.txt
