**Practical 09**

**Aim :  Learning to Rank**
*   Implement a learning to rank algorithm (e.g., RankSVM or RankBoost).
*   Train the ranking model using labelled data and evaluate its effectiveness.


In [None]:
print("T074 Kermeen")
import numpy as np
from sklearn.svm import LinearSVC
from sklearn.metrics import ndcg_score

# Step 1: Dataset
X = np.array([
    [3, 2, 1],
    [2, 1, 0],
    [0, 1, 2],
    [1, 2, 0],
    [2, 1, 3],
    [1, 0, 2]
])

relevance = np.array([3, 2, 1, 3, 1, 2])
queries = np.array([1, 1, 1, 2, 2, 2])

# Step 2: Create pairwise data
X_pairs, y_pairs = [], []

for q in np.unique(queries):
    idx = np.where(queries == q)[0]
    for i in idx:
        for j in idx:
            if relevance[i] > relevance[j]:
                X_pairs.append(X[i] - X[j])
                y_pairs.append(1)
            elif relevance[i] < relevance[j]:
                X_pairs.append(X[i] - X[j])
                y_pairs.append(-1)

X_pairs = np.array(X_pairs)
y_pairs = np.array(y_pairs)

# Step 3: Train RankSVM
model = LinearSVC()
model.fit(X_pairs, y_pairs)

# Step 4: Evaluate using NDCG
ndcg_scores = []

for q in np.unique(queries):
    idx = np.where(queries == q)[0]
    scores = model.decision_function(X[idx])
    ndcg = ndcg_score([relevance[idx]], [scores])
    ndcg_scores.append(ndcg)

print("Average NDCG Score:", np.mean(ndcg_scores))

T074 Kermeen
Average NDCG Score: 0.947499501061509


![image.png](Below is a **short, clean extraction and explanation** of the given **Learning to Rank (RankSVM)** practical.
This is **perfect for record files, exams, and viva**.

---

## **Practical No. 9 – Learning to Rank (RankSVM)**

### **Aim**

To implement a **Learning to Rank algorithm (RankSVM)**, train it using labelled data, and evaluate its performance using **NDCG**.

---

## **Code Summary (What the program does)**

1. Creates a **sample ranking dataset**
2. Converts ranking data into **pairwise comparisons**
3. Trains a **RankSVM model**
4. Evaluates ranking quality using **NDCG**

---

## **Short Explanation of Each Part**

### **1. Import Required Libraries**

* **NumPy**: Handles numerical data and arrays
* **LinearSVC**: Used to implement RankSVM
* **ndcg_score**: Measures ranking performance

---

### **2. Dataset Generation**

* Feature matrix `X` represents documents
* `relevance` shows how important each document is
* `queries` groups documents under the same query
* Returns features, relevance labels, and query IDs

---

### **3. Pairwise Data Creation (Core of RankSVM)**

* RankSVM works on **document pairs**
* For documents under the same query:

  * If one document is more relevant → label **+1**
  * If less relevant → label **−1**
* Feature difference `(doc_i − doc_j)` is used for training

---

### **4. Model Training**

* A **Linear Support Vector Machine** is trained
* Learns which document should rank higher than another

---

### **5. Model Evaluation (NDCG)**

* Predicts ranking scores for each query
* **NDCG** compares predicted ranking with ideal ranking
* Average NDCG score indicates model effectiveness

---

### **6. Main Function**

* Executes all steps in order:

  * Load data
  * Create pairwise data
  * Train model
  * Evaluate ranking

---

## **Conclusion**

This program successfully implements **RankSVM**, trains it using labelled ranking data, and evaluates its performance using **NDCG**, demonstrating a basic **Learning to Rank system**.

---

If you want, I can also provide:

* **Ultra-short 5-line explanation**
* **Viva questions & answers**
* **Flowchart**
* **Theory write-up (1 page)**
)

In [None]:
import numpy as np
from sklearn.svm import LinearSVC
from sklearn.metrics import ndcg_score

# Generate dataset
def generate_dataset():
    X = np.array([
        [3, 2, 1],
        [2, 1, 0],
        [0, 1, 2],
        [1, 2, 0],
        [2, 1, 3],
        [1, 0, 2]
    ])
    relevance = np.array([3, 2, 1, 3, 1, 2])
    queries = np.array([1, 1, 1, 2, 2, 2])
    return X, relevance, queries

# Create pairwise data
def create_pairwise_data(X, relevance, queries):
    X_pairs, y_pairs = [], []
    for q in np.unique(queries):
        idx = np.where(queries == q)[0]
        for i in idx:
            for j in idx:
                if relevance[i] > relevance[j]:
                    X_pairs.append(X[i] - X[j])
                    y_pairs.append(1)
                elif relevance[i] < relevance[j]:
                    X_pairs.append(X[i] - X[j])
                    y_pairs.append(-1)
    return np.array(X_pairs), np.array(y_pairs)

# Train RankSVM
def train_rank_svm(X_pairs, y_pairs):
    model = LinearSVC()
    model.fit(X_pairs, y_pairs)
    return model

# Evaluate model using NDCG
def evaluate_model(model, X, relevance, queries):
    ndcg_scores = []
    for q in np.unique(queries):
        idx = np.where(queries == q)[0]
        scores = model.decision_function(X[idx])
        ndcg = ndcg_score([relevance[idx]], [scores])
        ndcg_scores.append(ndcg)
    print("Average NDCG Score:", np.mean(ndcg_scores))

# Main execution
def main():
    X, relevance, queries = generate_dataset()
    X_pairs, y_pairs = create_pairwise_data(X, relevance, queries)
    model = train_rank_svm(X_pairs, y_pairs)
    evaluate_model(model, X, relevance, queries)

if __name__ == "__main__":
    main()


Average NDCG Score: 0.947499501061509


In [None]:
print("T074 Kermeen")

import numpy as np
from sklearn.svm import LinearSVC
from sklearn.metrics import ndcg_score

# Dataset
X = np.array([
    [3, 2, 1], [2, 1, 0], [0, 1, 2],
    [1, 2, 0], [2, 1, 3], [1, 0, 2]
])
relevance = np.array([3, 2, 1, 3, 1, 2])
queries = np.array([1, 1, 1, 2, 2, 2])

# Create pairwise data
X_pairs, y_pairs = [], []

for q in np.unique(queries):
    idx = np.where(queries == q)[0]
    for i in idx:
        for j in idx:
            if relevance[i] != relevance[j]:
                X_pairs.append(X[i] - X[j])
                y_pairs.append(1 if relevance[i] > relevance[j] else -1)

X_pairs = np.array(X_pairs)
y_pairs = np.array(y_pairs)

# Train RankSVM
model = LinearSVC()
model.fit(X_pairs, y_pairs)

# Evaluate using NDCG
scores = [model.decision_function(X[queries == q]) for q in np.unique(queries)]
ndcg_scores = [ndcg_score([relevance[queries == q]], [s]) for q, s in zip(np.unique(queries), scores)]

print("Average NDCG Score:", np.mean(ndcg_scores))


T074 Kermeen
Average NDCG Score: 0.947499501061509
