# Feature Extraction - HOG
## Tr√≠ch xu·∫•t ƒë·∫∑c tr∆∞ng HOG t·ª´ ·∫£nh grayscale ƒë√£ preprocessing

C√°c b∆∞·ªõc th·ª±c hi·ªán:
1. **Load ·∫£nh ƒë√£ preprocessing**: ƒê·ªçc t·ª´ preprocessed_hog/
2. **Tr√≠ch xu·∫•t HOG features**: T√≠nh HOG descriptor
3. **L∆∞u features**: L∆∞u feature vectors v√†o .npz


In [1]:
# 1. Import th∆∞ vi·ªán
import os
import json
import numpy as np
from skimage.feature import hog
from tqdm import tqdm


In [2]:
# 2. Load ·∫£nh grayscale ƒë√£ preprocessing

# Load train data
train_data = np.load('preprocessed_hog/train_data.npz')
X_train_img = train_data['X']
y_train = train_data['y']

# Load test data
test_data = np.load('preprocessed_hog/test_data.npz')
X_test_img = test_data['X']
y_test = test_data['y']

# Load class mapping
with open('preprocessed_hog/class_mapping.json', 'r') as f:
    class_mapping = json.load(f)
class_mapping = {int(k): v for k, v in class_mapping.items()}

print(f"Train images: {X_train_img.shape}")
print(f"Test images: {X_test_img.shape}")
print(f"S·ªë l·ªõp: {len(class_mapping)}")


Train images: (10968, 128, 128)
Test images: (2743, 128, 128)
S·ªë l·ªõp: 23


In [3]:
# 3. ƒê·ªãnh nghƒ©a h√†m tr√≠ch xu·∫•t HOG features
# Tham s·ªë HOG
HOG_ORIENTATIONS = 9
HOG_PIXELS_PER_CELL = (8, 8)
HOG_CELLS_PER_BLOCK = (2, 2)

# Tr√≠ch xu·∫•t HOG features t·ª´ ·∫£nh grayscale ƒë√£ normalize
def extract_hog_features(img_gray):
    
    # Chuy·ªÉn v·ªÅ 0-255 ƒë·ªÉ HOG ho·∫°t ƒë·ªông t·ªët h∆°n
    img_255 = (img_gray * 255).astype('uint8')
    
    # Tr√≠ch xu·∫•t HOG features
    features = hog(
        img_255,
        orientations=HOG_ORIENTATIONS,
        pixels_per_cell=HOG_PIXELS_PER_CELL,
        cells_per_block=HOG_CELLS_PER_BLOCK,
        visualize=False,
        feature_vector=True
    )
    
    return features

print(f"HOG parameters:")
print(f"   - Orientations: {HOG_ORIENTATIONS}")
print(f"   - Pixels per cell: {HOG_PIXELS_PER_CELL}")
print(f"   - Cells per block: {HOG_CELLS_PER_BLOCK}")


HOG parameters:
   - Orientations: 9
   - Pixels per cell: (8, 8)
   - Cells per block: (2, 2)


In [4]:
# 4. Tr√≠ch xu·∫•t HOG features t·ª´ train data

X_train_features = []
for img in tqdm(X_train_img, desc="Train"):
    features = extract_hog_features(img)
    X_train_features.append(features)

X_train_features = np.array(X_train_features)

print(f"Train features shape: {X_train_features.shape}")
print(f"Feature vector length: {X_train_features.shape[1]}")


Train: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10968/10968 [00:33<00:00, 323.96it/s]


Train features shape: (10968, 8100)
Feature vector length: 8100


In [5]:
# 5. Tr√≠ch xu·∫•t HOG features t·ª´ test data

X_test_features = []
for img in tqdm(X_test_img, desc="Test"):
    features = extract_hog_features(img)
    X_test_features.append(features)

X_test_features = np.array(X_test_features)

print(f"Test features shape: {X_test_features.shape}")
print(f"Feature vector length: {X_test_features.shape[1]}")


Test: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2743/2743 [00:08<00:00, 310.76it/s]

Test features shape: (2743, 8100)
Feature vector length: 8100





In [6]:
# 6. L∆∞u features

output_dir = 'features_hog'
os.makedirs(output_dir, exist_ok=True)

# L∆∞u train features
np.savez_compressed(
    f'{output_dir}/train_features.npz',
    X=X_train_features,
    y=y_train
)

# L∆∞u test features
np.savez_compressed(
    f'{output_dir}/test_features.npz',
    X=X_test_features,
    y=y_test
)

# L∆∞u class mapping
with open(f'{output_dir}/class_mapping.json', 'w') as f:
    json.dump(class_mapping, f, indent=2, ensure_ascii=False)

# L∆∞u feature info
feature_info = {
    'method': 'HOG (Histogram of Oriented Gradients)',
    'hog_orientations': HOG_ORIENTATIONS,
    'hog_pixels_per_cell': HOG_PIXELS_PER_CELL,
    'hog_cells_per_block': HOG_CELLS_PER_BLOCK,
    'feature_vector_length': int(X_train_features.shape[1]),
    'train_samples': len(X_train_features),
    'test_samples': len(X_test_features),
    'num_classes': len(class_mapping)
}

with open(f'{output_dir}/feature_info.json', 'w') as f:
    json.dump(feature_info, f, indent=2)

print("ƒê√£ l∆∞u train_features.npz, test_features.npz v√† metadata")


ƒê√£ l∆∞u train_features.npz, test_features.npz v√† metadata


In [7]:
# 7. T·ªïng k·∫øt
print(f"\n‚úÖ Ho√†n th√†nh!")
print(f"\nüì¶ Th∆∞ m·ª•c '{output_dir}':")
print(f"   ‚îú‚îÄ‚îÄ train_features.npz: {X_train_features.shape}")
print(f"   ‚îú‚îÄ‚îÄ test_features.npz: {X_test_features.shape}")
print(f"   ‚îú‚îÄ‚îÄ class_mapping.json")
print(f"   ‚îî‚îÄ‚îÄ feature_info.json")
print(f"\nüìù Features: HOG ({X_train_features.shape[1]} dimensions)")
print(f"\n‚û°Ô∏è Ti·∫øp theo: Scale ‚Üí PCA ‚Üí Train v·ªõi SVC + SMOTE")



‚úÖ Ho√†n th√†nh!

üì¶ Th∆∞ m·ª•c 'features_hog':
   ‚îú‚îÄ‚îÄ train_features.npz: (10968, 8100)
   ‚îú‚îÄ‚îÄ test_features.npz: (2743, 8100)
   ‚îú‚îÄ‚îÄ class_mapping.json
   ‚îî‚îÄ‚îÄ feature_info.json

üìù Features: HOG (8100 dimensions)

‚û°Ô∏è Ti·∫øp theo: Scale ‚Üí PCA ‚Üí Train v·ªõi SVC + SMOTE
