A minimal project to run k-means split analysis with metadata.
explore_splits.py- Main analysis scriptsplits/- Split data filesk_means_split.json- K-means clustering splitsintersect_attributes.json- Available attributes list
data/- Required data filesall_filepath.txt- Image file pathslabels_images.txt- Image labelsmcrae-norms-grouped-with-concepts.json- Feature to concepts mappingthings/- Concept metadataconcepts-and-categories.json- Concept definitions and categoriesCategories_final_20200131.tsv- Supercategory mappings
pip install -r requirements.txtpython explore_splits.pyThis will analyze the "has_wheels" attribute in the k-means splits and show:
- Split statistics (train/test distribution)
- Positive/negative sample counts
- Sample examples with metadata
- Supercategory distribution
- Summary statistics
- numpy
- pandas
- Standard Python libraries (json, csv, pathlib, collections, typing)