# **Week 1**

## **Team 3**
- Shinto Machado
- Adrián García
- Gerard Asbert
- Kunal Purkayastha

# **Introduction and Hypothesis**

In this notebook we build and analyse a Bag of Visual Words (BoVW) pipeline for image classification.  
The main goal is to understand the role of each component of the pipeline and to identify a good overall configuration by changing one parameter at a time.

**Hypothesis:** Our hypothesis is...

# Imports

In [None]:
from bovw import BOVW
from main import Dataset, train, test

from typing import *
from PIL import Image

import numpy as np
import glob
import tqdm
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import KFold   

# 1. Load dataset, prepare cross validation (3-Fold) and basic pipeline

In [2]:
data_train = Dataset(ImageFolder="../places_reduced/train")
data_val   = Dataset(ImageFolder="../places_reduced/val")
data = data_train + data_val

kfold = KFold(n_splits=3, shuffle=True, random_state=42)

accuracies = []

for fold, (train_idx, test_idx) in enumerate(kfold.split(data), start=1):
    print(f"\n========== Fold {fold} ==========")
    train_data = [data[i] for i in train_idx]
    test_data  = [data[i] for i in test_idx]

    bovw = BOVW()
    bovw, classifier = train(dataset=train_data, bovw=bovw)

    acc = test(dataset=test_data, bovw=bovw, classifier=classifier)
    accuracies.append(acc)

print("\n========== 3-Fold Cross-Validation ==========")
print("Accuracies per fold:", accuracies)
print("Average accuracy:", np.mean(accuracies))




Phase [Training]: Extracting the descriptors: 100%|██████████| 7266/7266 [00:31<00:00, 233.83it/s]


Fitting the codebook
Computing the bovw histograms
Fitting the classifier
Accuracy on Phase[Train]: 0.21464019851116625


Phase [Eval]: Extracting the descriptors: 100%|██████████| 3634/3634 [00:32<00:00, 111.55it/s]


Computing the bovw histograms
predicting the values
Accuracy on Phase[Test]: 0.18495038588754134



Phase [Training]: Extracting the descriptors: 100%|██████████| 7267/7267 [01:04<00:00, 113.44it/s]


Fitting the codebook
Computing the bovw histograms
Fitting the classifier
Accuracy on Phase[Train]: 0.21223814773980154


Phase [Eval]: Extracting the descriptors: 100%|██████████| 3633/3633 [00:28<00:00, 128.34it/s]


Computing the bovw histograms
predicting the values
Accuracy on Phase[Test]: 0.1886376172090458



Phase [Training]: Extracting the descriptors: 100%|██████████| 7267/7267 [00:28<00:00, 256.95it/s]


Fitting the codebook
Computing the bovw histograms
Fitting the classifier
Accuracy on Phase[Train]: 0.22236007719878687


Phase [Eval]: Extracting the descriptors: 100%|██████████| 3633/3633 [00:14<00:00, 255.00it/s]


Computing the bovw histograms
predicting the values
Accuracy on Phase[Test]: 0.18109151047409042

Accuracies per fold: [0.18495038588754134, 0.1886376172090458, 0.18109151047409042]
Average accuracy: 0.18489317119022586


# 2. Different Descriptors (Sift, orb, Akaze and Dense SIFT)

In [None]:
detector_types = ['DENSE_SIFT', 'SIFT', 'ORB', 'AKAZE']

results = {}

for det in detector_types:
    print(f"\n\n###############################")
    print(f"### Descriptor: {det}")
    print(f"###############################")

    kfold = KFold(n_splits=3, shuffle=True, random_state=42)

    accuracies = []

    for fold, (train_idx, test_idx) in enumerate(kfold.split(data), start=1):
        print(f"\n========== Fold {fold} ==========")
        train_data = [data[i] for i in train_idx]
        test_data  = [data[i] for i in test_idx]

        # Usar el descriptor correspondiente
        bovw = BOVW(detector_type=det)
        bovw, classifier = train(dataset=train_data, bovw=bovw)

        acc = test(dataset=test_data, bovw=bovw, classifier=classifier)
        accuracies.append(acc)

    print("\n========== 3-Fold Cross-Validation ==========")
    print("Accuracies per fold:", accuracies)
    print("Average accuracy:", np.mean(accuracies))

    results[det] = {
        "fold_accuracies": accuracies,
        "mean_accuracy": np.mean(accuracies)
    }

print("\n\n===== Summary over descriptors =====")
for det in detector_types:
    print(f"{det}: mean accuracy = {results[det]['mean_accuracy']:.4f}")


In [2]:


# Cross-val w/ 3 folds, obtained in previous experiment using default values for each Local descriptor
data = {
    'Method': ['AKAZE', 'AKAZE', 'AKAZE', 
               'ORB', 'ORB', 'ORB',
               'SIFT', 'SIFT', 'SIFT',
               'DENSE_SIFT', 'DENSE_SIFT', 'DENSE_SIFT'],
    'Accuracy': [41.2, 45.7, 48.4,
                 42.3, 46.9, 48.7,
                 60.2, 66.1, 68.4,
                 74.3, 76.6, 80.1]
}

df = pd.DataFrame(data)

# Resumen por método: media y desviación típica
summary = (
    df.groupby('Method')['Accuracy']
      .agg(['mean', 'std'])
      .reset_index()
      .sort_values('mean')  # orden de menor a mayor
)

fig, ax = plt.subplots(figsize=(8, 5))

y_pos = np.arange(len(summary))

# Barras horizontales con barras de error (std)
ax.barh(
    y_pos,
    summary['mean'],
    xerr=summary['std'],
    align='center',
    alpha=0.85,
    linewidth=1.2,
    edgecolor='black'
)

# Puntos individuales de cada fold (para que se vea la dispersión)
for i, method in enumerate(summary['Method']):
    xs = df[df['Method'] == method]['Accuracy'].values
    # pequeño jitter vertical para que no se solapen exactamente
    jitter = np.linspace(-0.12, 0.12, len(xs))
    ax.scatter(
        xs,
        np.full_like(xs, y_pos[i], dtype=float) + jitter,
        s=45,
        edgecolor='black',
        linewidth=0.6,
        alpha=0.9
    )

ax.set_yticks(y_pos)
ax.set_yticklabels(summary['Method'])
ax.set_xlabel('Accuracy (%)', fontsize=12)
ax.set_title('Cross-validation accuracy by local descriptor', fontsize=14)

ax.grid(axis='x', linestyle='--', alpha=0.5)
ax.set_xlim(40, 85)

plt.tight_layout()
plt.show()

NameError: name 'plt' is not defined

## 2.1 Dense Sift with tiny steps and different scales

In [None]:

#Numerical results

In [None]:

#Plots and visual results

# 3. Different amount of local features

In [None]:

#Numerical results

In [None]:

#Plots and visual results

# 4. Different Codebook sizes k (10, 100, 1000, ... )

In [None]:

#Numerical results

In [None]:

#Plots and visual results

# 5. Different Classifiers (...)

In [None]:

#Numerical results

In [None]:

#Plots and visual results

# 6. Dimensionality reduction


# 7. Spatial Pyramids


# 8. Fisher Vectors


# 9. Conclusion

Our hyphothesis...
