# 🧠 Generate Tumor Metadata from Figshare `.mat` Files

This notebook parses raw `.mat` files and constructs a metadata CSV containing:

- `file`: image filename (`.png`)
- `label`: tumor type (1 = meningioma, 2 = glioma, 3 = pituitary)
- `has_mask`: boolean flag indicating if a tumor mask is present
- `tumorBorder`: list of [x, y] border coordinates
- `label_name`: readable tumor label (e.g. `"glioma"`)

The metadata file is saved as `tumor_metadata.csv` and will be used for:
- Dataset analysis
- Model training/testing
- Visualization tools

In [1]:
import os
import h5py
import pandas as pd
import numpy as np

# === Local Paths ===
mat_folder = "/Users/aarogyagyawali/Downloads/Figshare datasets/BrainTumorPositive_DS_Matlab"
output_metadata_path = "/Users/aarogyagyawali/Downloads/BrainScanAI_final_output/tumor_metadata.csv"

# === Initialize Metadata ===
metadata = []

mat_files = sorted([f for f in os.listdir(mat_folder) if f.endswith('.mat')])

for file_name in mat_files:
    file_path = os.path.join(mat_folder, file_name)
    base_name = os.path.splitext(file_name)[0]

    try:
        with h5py.File(file_path, 'r') as f:
            cjdata = f['cjdata']
            
            label = int(np.array(cjdata['label'])[0][0])
            tumor_mask = np.array(cjdata['tumorMask'])
            has_mask = bool(np.any(tumor_mask))

            border_data = np.array(cjdata['tumorBorder'])
            flat_border = border_data.flatten()
            tumor_border = flat_border.reshape(-1, 2).tolist() if len(flat_border) % 2 == 0 else []

            metadata.append({
                "file": base_name + ".png",
                "label": label,
                "has_mask": has_mask,
                "tumorBorder": tumor_border
            })

    except Exception as e:
        print(f"❌ Error in {file_name}: {e}")

# === Save to CSV ===
df = pd.DataFrame(metadata)
df.to_csv(output_metadata_path, index=False)
print("✅ Metadata saved to:", output_metadata_path)


✅ Metadata saved to: /Users/aarogyagyawali/Downloads/BrainScanAI_final_output/tumor_metadata.csv


In [7]:
df = pd.read_csv("/Users/aarogyagyawali/Downloads/BrainScanAI_final_output/tumor_metadata.csv")

label_map = {1: "meningioma", 2: "glioma", 3: "pituitary"}
df["label_name"] = df["label"].map(label_map)

df.to_csv("/Users/aarogyagyawali/Downloads/BrainScanAI_final_output/tumor_metadata.csv", index=False)
print("✅ Added 'label_name' column and saved.")


✅ Added 'label_name' column and saved.


In [9]:
print(df.head(3))           # Preview first few rows
print(df["label_name"].value_counts())  # Count per tumor type


      file  label  has_mask  \
0    1.png      1      True   
1   10.png      1      True   
2  100.png      1      True   

                                         tumorBorder  label_name  
0  [[267.6152450090744, 231.37568058076226], [277...  meningioma  
1  [[248.86411149825784, 256.89198606271776], [23...  meningioma  
2  [[193.26370732550265, 175.8076305348121], [185...  meningioma  
glioma        1426
pituitary      930
meningioma     708
Name: label_name, dtype: int64
