# Food Dataset Analysis (EDA)

### Suggestions / Things to Explore in EDA (both datasets):

Note: for each insight found about the dataset, it is recommended to explain what it tells us about the dataset, why it's significant, 

- [ ] **Dataset directory and split integrity:** verify the expected Food-101 structure and examine the `/root/.cache/kagglehub/datasets/rkuo2000/uecfood256/versions/1/UECFOOD256` directory and contents. Confirm class counts match expectations (1,000 images per class)
- [ ] **Image resolutions and aspect ratios:** plot width / height histograms, aspect ratios, resolution scatter, and detect outliers
- [ ] **Brightness / contrast and dynamic range:** inspect pixel intensity histograms and per-image mean/std. Find and keep note of any overly dark, blown-out, or low-contrast classes (for normalization)
- [ ] **Sharpness / blur and quality issues:** use Laplacian variance for blur scores to identify classes with many blurry images


### UEC-Food256 Dataset
Things to consider while looking / exploring dataset

- [ ] **Dataset directory names:** as you can see when you first download the dataset, the folders are named as numbers (1-256). It would be a good idea to rename each folder based on the `category.txt` file which stores the name and id
  - After renaming, check to see if there are any folders of the same name. If there are, decide to merge or keep separate with reasoning.
- [ ]

### Resources:
(may be helpful)
*   https://neptune.ai/blog/data-exploration-for-image-segmentation-and-object-detection
*   https://medium.com/@juanabascal78/exploratory-image-analysis-part-1-advanced-density-plots-19b255075dbd
*   https://www.datacamp.com/tutorial/seeing-like-a-machine-a-beginners-guide-to-image-analysis-in-machine-learning

## Import + Download Dataset

In [10]:
%pip install python-dotenv
%pip install roboflow


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting roboflow
  Obtaining dependency information for roboflow from https://files.pythonhosted.org/packages/84/e8/0bc5996818c4566e3156102a945873f5999feec846a75d3bf19ef7e82ac2/roboflow-1.2.11-py3-none-any.whl.metadata
  Downloading roboflow-1.2.11-py3-none-any.whl.metadata (9.7 kB)
Collecting idna==3.7 (from roboflow)
  Obtaining dependency information for idna==3.7 from https://files.pythonhosted.org/packages/e5/3e/741d8c82801c347547f8a2a06aa57dbb1992be9e948df2ea0eda2c8b79e8/idna-3.7-py3-none-any.whl.metadata
  Downloading idna-3.7-py3-none-any.whl.metadata (9.9 kB)
Collecting cycler (from roboflow)
  Obtaining dependency information for cycler from https://files.pythonhosted.org/packages/e7/05/c19819d5e3d95294a6f5947fb9b9629efb316b96de511b418c53d245aae6/cycler-0.12.1-py3-none-any.whl.metadata
  Using cached cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Collecting kiwisolver>=1.3.1 (from roboflow)
  Obtaining dependency information for kiwisolver>=1.3.1 from https://files.pytho


[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
%pip install kagglehub

Collecting kagglehub
  Obtaining dependency information for kagglehub from https://files.pythonhosted.org/packages/d4/0b/696bf3479afa593493be4a30416263e3a0677479306e8d7088fd95729987/kagglehub-0.4.0-py3-none-any.whl.metadata
  Downloading kagglehub-0.4.0-py3-none-any.whl.metadata (38 kB)
Collecting kagglesdk<1.0,>=0.1.14 (from kagglehub)
  Obtaining dependency information for kagglesdk<1.0,>=0.1.14 from https://files.pythonhosted.org/packages/d7/c9/4767e903e684b59708eb56649b7da19c29c4cc18f0f59f8f4947551ae316/kagglesdk-0.1.14-py3-none-any.whl.metadata
  Downloading kagglesdk-0.1.14-py3-none-any.whl.metadata (13 kB)
Collecting pyyaml (from kagglehub)
  Obtaining dependency information for pyyaml from https://files.pythonhosted.org/packages/da/e3/ea007450a105ae919a72393cb06f122f288ef60bba2dc64b26e2646fa315/pyyaml-6.0.3-cp311-cp311-win_amd64.whl.metadata
  Downloading pyyaml-6.0.3-cp311-cp311-win_amd64.whl.metadata (2.4 kB)
Collecting requests (from kagglehub)
  Obtaining dependency informa


[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [11]:
# RUN FOR UEC-FOOD256 DATASET

import kagglehub 
# Download latest version 
path = kagglehub.dataset_download("rkuo2000/uecfood256")
print("Path to dataset files:", path)

Resuming download from 2155872256 bytes (2075984473 bytes left)...
Resuming download to C:\Users\msgal\.cache\kagglehub\datasets\rkuo2000\uecfood256\1.archive (2155872256/4231856729) bytes left.


100%|██████████| 3.94G/3.94G [08:52<00:00, 3.90MB/s]

Extracting files...





Path to dataset files: C:\Users\msgal\.cache\kagglehub\datasets\rkuo2000\uecfood256\versions\1


In [None]:
# RUN FOR YUSUF FOOD DATASET

from roboflow import Roboflow
from dotenv import load_dotenv
import os

load_dotenv()  # loads variables from .env into the environment

api_key = os.getenv("YF_API_KEY")

rf = Roboflow(api_key=api_key) 
project = rf.workspace("caretech").project("food-dataset-uj20h-w2s4m")
version = project.version(1)
dataset = version.download("yolov8")


B4jK8Gc5eIoqulJRDvRV
loading Roboflow workspace...
loading Roboflow project...


In [4]:
import os

for subdir, dirs, files in os.walk(path):
    print(f"{subdir} → {len(files)} files")

/Users/leonardosiu/.cache/kagglehub/datasets/rkuo2000/uecfood256/versions/1 → 0 files
/Users/leonardosiu/.cache/kagglehub/datasets/rkuo2000/uecfood256/versions/1/UECFOOD256 → 2 files
/Users/leonardosiu/.cache/kagglehub/datasets/rkuo2000/uecfood256/versions/1/UECFOOD256/135 → 116 files
/Users/leonardosiu/.cache/kagglehub/datasets/rkuo2000/uecfood256/versions/1/UECFOOD256/61 → 109 files
/Users/leonardosiu/.cache/kagglehub/datasets/rkuo2000/uecfood256/versions/1/UECFOOD256/95 → 106 files
/Users/leonardosiu/.cache/kagglehub/datasets/rkuo2000/uecfood256/versions/1/UECFOOD256/132 → 106 files
/Users/leonardosiu/.cache/kagglehub/datasets/rkuo2000/uecfood256/versions/1/UECFOOD256/59 → 117 files
/Users/leonardosiu/.cache/kagglehub/datasets/rkuo2000/uecfood256/versions/1/UECFOOD256/92 → 168 files
/Users/leonardosiu/.cache/kagglehub/datasets/rkuo2000/uecfood256/versions/1/UECFOOD256/66 → 108 files
/Users/leonardosiu/.cache/kagglehub/datasets/rkuo2000/uecfood256/versions/1/UECFOOD256/104 → 105 file