## Attribute Prediction 

---

#### Anno_fine – Train/Val/Test Subfolders

Each split (`train/`, `val/`, `test/`) includes:

- `*.txt` – image filenames for the split
- `*_cate.txt` – category label (integer) for each image
- `*_attr.txt` – multi-label attribute vector (1 = present, -1 = absent, 0 = unknown)
- `*_bbox.txt` – bounding box coordinates `[x1, y1, x2, y2]`
- `*_landmarks.txt` – fashion landmark coordinates 

---

#### File Descriptions (in Anno_fine root)

- `list_attr_cloth.txt`  
  Attribute names and their types; maps indices to names.

- `list_attr_img.txt`  
  Attribute vectors for all images in the dataset (not needed if using pre-split files).

---

#### Eval/list_eval_partition.txt

Maps image names to `train`, `val`, or `test`.  
Used to create splits — already applied in `Anno_fine`.

---

#### img/

Contains all image files (JPG), referenced by the filenames listed in `train.txt`, `val.txt`, and `test.txt`.



In [1]:
file_path = "../data/attribute pred/Anno_fine/list_attr_cloth.txt"

with open(file_path, "r") as f:
    lines = f.readlines()

print("Total attributes:", lines[0].strip())
print("Header:", lines[1].strip())
print("First 5 attributes:")
for line in lines[2:7]:
    print(line.strip())


Total attributes: 26
Header: attribute_name   attribute_type
First 5 attributes:
floral               1
graphic              1
striped              1
embroidered          1
pleated              1


In [2]:
file_path = "../data/attribute pred/Anno_fine/list_category_cloth.txt"

with open(file_path, "r") as f:
    lines = f.readlines()

num_categories = int(lines[0].strip())
header = lines[1].strip()
cat_lines = lines[2:]

print("Total categories:", num_categories)
print("Header:", header)
print("\nFirst 5 categories with types:")
for line in cat_lines[:5]:
    print(line.strip())


Total categories: 50
Header: category_name  category_type

First 5 categories with types:
Anorak         1
Blazer         1
Blouse         1
Bomber         1
Button-Down    1


## In-shop Clothes Retrieval

---

#### Eval/list_eval_partition.txt

Main annotation file for this benchmark.  
Each line has:  
`<image_name>  <item_id>  <split>`  
Where `split ∈ {train, query, gallery}`.

- **train** → images used to learn visual embeddings  
- **query** → images used to test retrieval  
- **gallery** → database of images the model searches over

---

#### list_item_inshop.txt

A list of all unique item IDs in the dataset.  
Useful for identifying the number of distinct clothing pieces.

---

#### img/

Contains all in-shop images, organized into folders like:  
`img/WOMEN/Blouses_Shirts/id_00000001/02_2_side.jpg`

These are referenced directly by `list_eval_partition.txt`.

---

#### Notes

- Retrieval task is evaluated by comparing query images to gallery images.
- A successful retrieval returns items with the **same `item_id`** from different views (e.g., front, side).
- This benchmark does **not** use category or attribute labels — only **item identity** matters.


In [1]:
file_path = "/Users/elenezuroshvili/Desktop/Thesis/fashion-multitask-model/data/retrieval/list_eval_partition.txt"

with open(file_path, "r") as f:
    lines = f.readlines()

print("Total images:", lines[0].strip())
print("Header:", lines[1].strip())
print("\nFirst 5 entries:")
for line in lines[2:7]:
    print(line.strip())

Total images: 52712
Header: image_name item_id evaluation_status

First 5 entries:
img/WOMEN/Dresses/id_00000002/02_1_front.jpg                           id_00000002 train
img/WOMEN/Dresses/id_00000002/02_2_side.jpg                            id_00000002 train
img/WOMEN/Dresses/id_00000002/02_4_full.jpg                            id_00000002 train
img/WOMEN/Dresses/id_00000002/02_7_additional.jpg                      id_00000002 train
img/WOMEN/Skirts/id_00000003/02_1_front.jpg                            id_00000003 train


In [2]:
import os

base_path = "/Users/elenezuroshvili/Desktop/Thesis/fashion-multitask-model/data/retrieval/Anno/list_item_inshop.txt"
with open(base_path, "r") as f:
    item_lines = f.readlines()

num_items = int(item_lines[0].strip())
item_ids = [line.strip() for line in item_lines[1:]]

print(f"\n Total unique item IDs listed: {num_items}")
print(" First 5 item IDs:", item_ids[:5])


 Total unique item IDs listed: 7982
 First 5 item IDs: ['id_00000001', 'id_00000002', 'id_00000003', 'id_00000004', 'id_00000005']
