### Section 5.1: GCN Model Integration in an NLP Pipeline

In this section, we’ll integrate our prepared data into a GCN model to perform an NLP task. We’ll build on the adjacency matrix and node features developed in previous sections to create a GCN model that can analyze sentence structure. We’ll go through each stage of model integration, including defining the GCN architecture, setting up a sample NLP task (such as sentence classification), and running training and evaluation.

**Contents:**

1. **Defining the GCN Model Architecture**
2. **Setting Up the NLP Task (Sentence Classification)**
3. **Preparing the Data for the Model**
4. **Training the GCN Model**
5. **Evaluating the Model**
6. **Code Walkthrough**

---



### 1. Defining the GCN Model Architecture

The GCN architecture we’ll use will have:
   - **Input Layer**: Accepts node features (one-hot encoding, POS tags, embeddings).
   - **Hidden Layers**: Graph convolutional layers for message passing and feature aggregation.
   - **Output Layer**: Generates final predictions for each node or an aggregated graph-level output.



#### Code Example: GCN Model Definition


In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class GCNLayer(nn.Module):
    def __init__(self, in_features, out_features):
        """
        A single GCN layer that performs a graph convolution operation.

        Parameters:
        - in_features (int): Number of input features for each node.
        - out_features (int): Number of output features for each node after the layer.
        """
        super(GCNLayer, self).__init__()
        # Linear transformation to map input features to out_features
        self.linear = nn.Linear(in_features, out_features)

    def forward(self, node_features, adj_matrix):
        """
        Forward pass of the GCN layer.

        Parameters:
        - node_features (torch.Tensor): Input node features matrix of shape (num_nodes, in_features).
        - adj_matrix (torch.Tensor): Adjacency matrix of the graph of shape (num_nodes, num_nodes).

        Returns:
        - torch.Tensor: Output node features after applying the graph convolution, shape (num_nodes, out_features).
        """
        # Step 1: Apply a linear transformation to node features
        transformed_features = self.linear(node_features)

        # Step 2: Aggregate the features from neighboring nodes using the adjacency matrix
        aggregated_features = torch.matmul(adj_matrix, transformed_features)

        # Step 3: Normalize by node degrees to account for varying neighbor counts
        degree_matrix = adj_matrix.sum(dim=1, keepdim=True)  # Degree matrix derived from adjacency matrix
        normalized_features = aggregated_features / degree_matrix  # Divide by degree to normalize

        # Step 4: Apply a ReLU non-linear activation function
        return F.relu(normalized_features)


class GCNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        """
        A two-layer GCN model, stacking two GCN layers for message passing.

        Parameters:
        - input_dim (int): Dimension of input features for each node.
        - hidden_dim (int): Dimension of hidden layer output features.
        - output_dim (int): Dimension of the final output layer (e.g., number of classes for classification).
        """
        super(GCNModel, self).__init__()
        # First GCN layer from input_dim to hidden_dim
        self.gcn1 = GCNLayer(input_dim, hidden_dim)
        # Second GCN layer from hidden_dim to output_dim
        self.gcn2 = GCNLayer(hidden_dim, output_dim)

    def forward(self, node_features, adj_matrix):
        """
        Forward pass of the two-layer GCN model.

        Parameters:
        - node_features (torch.Tensor): Input node features matrix of shape (num_nodes, input_dim).
        - adj_matrix (torch.Tensor): Adjacency matrix of the graph of shape (num_nodes, num_nodes).

        Returns:
        - torch.Tensor: Output node features after two layers, shape (num_nodes, output_dim).
        """
        # First layer: apply GCN layer 1
        x = self.gcn1(node_features, adj_matrix)
        # Second layer: apply GCN layer 2
        x = self.gcn2(x, adj_matrix)
        return x



### 2. Setting Up the NLP Task (Sentence Classification)

In this example, we’ll set up a simple **sentence classification** task where the model classifies a sentence as either **positive** or **negative**. Each word in the sentence will be a node, and the sentence-level prediction will be obtained by aggregating information across nodes.

#### Task Setup:
- **Labels**: Assign binary labels (0 for negative, 1 for positive) to sentences.
- **Aggregation**: After passing node features through the GCN, aggregate the node embeddings to get a graph-level representation, which we then use for sentence classification.

---



### 3. Preparing the Data for the Model

To prepare the data, we’ll:
1. Use the adjacency matrix created from dependency parsing.
2. Define node features (e.g., one-hot, POS, embeddings).
3. Combine the node features and adjacency matrix as input to the GCN model.



#### Example Data Setup


In [2]:
import numpy as np
import torch
import spacy

# Load the spaCy model for POS tagging and embeddings
nlp = spacy.load("en_core_web_sm")

# Define vocabulary and one-hot encoding function
vocab = ["The", "cat", "sat", "on", "the", "mat"]
vocab_dict = {word: i for i, word in enumerate(vocab)}

def one_hot_encode(sentence_tokens, vocab_dict):
    features = []
    for token in sentence_tokens:
        one_hot = [0] * len(vocab_dict)
        if token in vocab_dict:
            one_hot[vocab_dict[token]] = 1
        features.append(one_hot)
    return np.array(features)

def pos_tag_features(sentence):
    doc = nlp(sentence)
    pos_tags = [token.pos_ for token in doc]
    unique_tags = list(set(pos_tags))
    pos_dict = {tag: i for i, tag in enumerate(unique_tags)}
    features = []
    for tag in pos_tags:
        one_hot = [0] * len(pos_dict)
        one_hot[pos_dict[tag]] = 1
        features.append(one_hot)
    return np.array(features)

def word_embedding_features(sentence):
    doc = nlp(sentence)
    features = [token.vector for token in doc]
    return np.array(features)

def create_combined_features(sentence, vocab_dict):
    doc = nlp(sentence)
    sentence_tokens = [token.text for token in doc]
    one_hot_feats = one_hot_encode(sentence_tokens, vocab_dict)
    pos_feats = pos_tag_features(sentence)
    embedding_feats = word_embedding_features(sentence)
    combined_feats = np.concatenate((one_hot_feats, pos_feats, embedding_feats), axis=1)
    return combined_feats

# Example sentence
sentence = "The cat sat on the mat."
combined_features = create_combined_features(sentence, vocab_dict)

# Create an adjacency matrix with self-loops for the sentence
def create_adjacency_matrix_with_loops(sentence):
    doc = nlp(sentence)
    num_tokens = len(doc)
    adj_matrix = np.zeros((num_tokens, num_tokens), dtype=int)
    for token in doc:
        adj_matrix[token.i][token.head.i] = 1
        adj_matrix[token.head.i][token.i] = 1
    np.fill_diagonal(adj_matrix, 1)
    return adj_matrix

adj_matrix_with_loops = create_adjacency_matrix_with_loops(sentence)

# Convert data to PyTorch tensors
node_features = torch.tensor(combined_features, dtype=torch.float32)
adj_matrix = torch.tensor(adj_matrix_with_loops, dtype=torch.float32)
label = torch.tensor([1], dtype=torch.long)  # Example label (1 for positive, 0 for negative)

# Display tensors to confirm setup
print("Node Features Tensor:\n", node_features)
print("Adjacency Matrix Tensor:\n", adj_matrix)
print("Label Tensor:", label)


Node Features Tensor:
 tensor([[ 1.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  1.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  1.0466e+00, -6.3125e-01, -5.6540e-01,  2.7119e+00,
         -1.0801e+00, -4.9187e-02, -7.9210e-01,  6.1598e-02, -6.1989e-01,
          1.6166e+00,  1.4493e+00,  1.3127e+00, -6.7903e-01, -1.2306e+00,
         -7.8954e-01, -1.0821e+00, -8.0464e-01,  1.6262e+00, -8.7126e-01,
          4.0537e-01, -1.1336e+00, -3.7326e-01, -6.6686e-01, -1.6324e+00,
          1.8673e+00, -2.4132e-01,  1.0853e+00,  8.6994e-02, -9.4281e-02,
          6.0370e-01,  1.2150e+00, -1.2031e+00,  9.7626e-01, -2.0013e+00,
         -6.6515e-02,  9.5435e-01,  2.6909e-01, -7.1802e-01,  2.5988e-01,
          3.8899e+00, -8.0076e-02,  1.2519e+00, -1.3616e+00,  9.7839e-01,
         -9.9233e-01, -8.0711e-02, -4.8829e-01,  2.3329e+00,  1.2838e+00,
          9.2897e-02, -9.7115e-01, -3.6849e-01,  5.5837e-01,  5.8041e-01,
          8.447

In [3]:
import numpy as np
import torch

# Assume combined_features and adj_matrix_with_loops are predefined
# Example label for the sentence (e.g., 1 for positive sentiment, 0 for negative)
sentence_label = 1

# Convert feature and adjacency data to PyTorch tensors
node_features = torch.tensor(combined_features, dtype=torch.float32)  # Node features as float tensor
adj_matrix = torch.tensor(adj_matrix_with_loops, dtype=torch.float32)  # Adjacency matrix as float tensor
label = torch.tensor([sentence_label], dtype=torch.long)  # Sentence label as long tensor for classification

# Display the converted tensors to confirm their structure
print("Node Features Tensor:\n", node_features)
print("Adjacency Matrix Tensor:\n", adj_matrix)
print("Label Tensor:", label)


Node Features Tensor:
 tensor([[ 1.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  1.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  1.0466e+00, -6.3125e-01, -5.6540e-01,  2.7119e+00,
         -1.0801e+00, -4.9187e-02, -7.9210e-01,  6.1598e-02, -6.1989e-01,
          1.6166e+00,  1.4493e+00,  1.3127e+00, -6.7903e-01, -1.2306e+00,
         -7.8954e-01, -1.0821e+00, -8.0464e-01,  1.6262e+00, -8.7126e-01,
          4.0537e-01, -1.1336e+00, -3.7326e-01, -6.6686e-01, -1.6324e+00,
          1.8673e+00, -2.4132e-01,  1.0853e+00,  8.6994e-02, -9.4281e-02,
          6.0370e-01,  1.2150e+00, -1.2031e+00,  9.7626e-01, -2.0013e+00,
         -6.6515e-02,  9.5435e-01,  2.6909e-01, -7.1802e-01,  2.5988e-01,
          3.8899e+00, -8.0076e-02,  1.2519e+00, -1.3616e+00,  9.7839e-01,
         -9.9233e-01, -8.0711e-02, -4.8829e-01,  2.3329e+00,  1.2838e+00,
          9.2897e-02, -9.7115e-01, -3.6849e-01,  5.5837e-01,  5.8041e-01,
          8.447


### 4. Training the GCN Model

We’ll use a simple training loop to train the GCN model for the sentence classification task. The model will minimize the cross-entropy loss between predicted and true labels.



#### Code Example: Training Loop


In [4]:
# Define model, loss function, and optimizer
input_dim = node_features.shape[1]  # Dimensionality of input node features
hidden_dim = 8  # Hidden layer dimension for GCN
output_dim = 2  # Output dimension (e.g., binary classification: positive and negative)

# Initialize the GCN model with input, hidden, and output dimensions
model = GCNModel(input_dim, hidden_dim, output_dim)

# Define loss function for classification
criterion = nn.CrossEntropyLoss()

# Use Adam optimizer for training the model
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Number of epochs for training
epochs = 20

# Training loop
for epoch in range(epochs):
    # Forward pass: compute node-level outputs
    node_outputs = model(node_features, adj_matrix)

    # Aggregate node outputs by taking the mean to get a graph-level embedding
    sentence_output = node_outputs.mean(dim=0, keepdim=True)  # Mean aggregation for sentence-level representation

    # Compute the loss using the aggregated sentence embedding
    loss = criterion(sentence_output, label)

    # Backward pass and optimization
    optimizer.zero_grad()  # Clear previous gradients
    loss.backward()        # Backpropagate to compute gradients
    optimizer.step()        # Update model parameters

    # Print the loss for each epoch
    print(f"Epoch {epoch+1}/{epochs}, Loss: {loss.item():.4f}")


Epoch 1/20, Loss: 0.6529
Epoch 2/20, Loss: 0.5152
Epoch 3/20, Loss: 0.4485
Epoch 4/20, Loss: 0.3842
Epoch 5/20, Loss: 0.3227
Epoch 6/20, Loss: 0.2654
Epoch 7/20, Loss: 0.2130
Epoch 8/20, Loss: 0.1671
Epoch 9/20, Loss: 0.1285
Epoch 10/20, Loss: 0.0970
Epoch 11/20, Loss: 0.0721
Epoch 12/20, Loss: 0.0530
Epoch 13/20, Loss: 0.0388
Epoch 14/20, Loss: 0.0283
Epoch 15/20, Loss: 0.0207
Epoch 16/20, Loss: 0.0152
Epoch 17/20, Loss: 0.0112
Epoch 18/20, Loss: 0.0084
Epoch 19/20, Loss: 0.0064
Epoch 20/20, Loss: 0.0049



### 5. Evaluating the Model

After training, we can evaluate the model by using it to predict labels on unseen sentences.



#### Code Example: Evaluation



In [5]:
with torch.no_grad():
    """
    Evaluation phase where we pass the data through the model without calculating gradients.
    This reduces memory usage and speeds up the forward pass.
    """
    # Forward pass to get node-level outputs from the model
    node_outputs = model(node_features, adj_matrix)

    # Aggregate node outputs by taking the mean to get the sentence-level embedding
    sentence_output = node_outputs.mean(dim=0, keepdim=True)

    # Determine the predicted label by taking the class with the highest score
    _, predicted = torch.max(sentence_output, dim=1)

    # Display the predicted label and the true label
    print("Predicted Label:", predicted.item())
    print("True Label:", label.item())


Predicted Label: 1
True Label: 1



### 6. Code Walkthrough: Full Pipeline

Here’s the complete code that includes the GCN model, data preparation, training, and evaluation.




###### Explanation:
1. **GCNLayer**:
   - Defines a single GCN layer that applies a linear transformation, aggregates neighbor information, normalizes by node degrees, and applies ReLU.

2. **GCNModel**:
   - Combines two `GCNLayer` instances, allowing the model to propagate information across two hops.

3. **Data Preparation**:
   - Converts `combined_features` (node features), `adj_matrix_with_loops` (adjacency matrix with self-loops), and `sentence_label` (target label) into PyTorch tensors.

4. **Training Loop**:
   - Runs for 20 epochs.
   - For each epoch, performs a forward pass through the model, computes the mean of node embeddings for sentence classification, calculates the loss, backpropagates, and updates the model parameters.
   
5. **Evaluation**:
   - Runs a forward pass without gradient computation, aggregates node embeddings, and uses `torch.max` to predict the class label.
   - Prints the predicted and true labels for comparison.

This code is designed for binary sentence classification using GCNs, where the sentence representation is derived by aggregating node-level features.

In [9]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

# Define GCN Layer
class GCNLayer(nn.Module):
    def __init__(self, in_features, out_features):
        """
        Initialize a single GCN layer with a linear transformation.

        Parameters:
        - in_features (int): Number of input features for each node.
        - out_features (int): Number of output features for each node.
        """
        super(GCNLayer, self).__init__()
        self.linear = nn.Linear(in_features, out_features)

    def forward(self, node_features, adj_matrix):
        """
        Forward pass for the GCN layer.

        Parameters:
        - node_features (torch.Tensor): Node features matrix.
        - adj_matrix (torch.Tensor): Adjacency matrix of the graph.

        Returns:
        - torch.Tensor: Activated, normalized node features.
        """
        # Linear transformation on node features
        transformed_features = self.linear(node_features)

        # Aggregation step: multiply with adjacency matrix to aggregate neighbors
        aggregated_features = torch.matmul(adj_matrix, transformed_features)

        # Normalization by node degrees
        degree_matrix = adj_matrix.sum(dim=1, keepdim=True)
        normalized_features = aggregated_features / degree_matrix

        # Apply ReLU non-linearity
        return F.relu(normalized_features)

# Define GCN Model
class GCNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        """
        A two-layer GCN model for node feature transformation.

        Parameters:
        - input_dim (int): Dimensionality of input features.
        - hidden_dim (int): Dimensionality of hidden layer.
        - output_dim (int): Dimensionality of output layer (e.g., number of classes).
        """
        super(GCNModel, self).__init__()
        self.gcn1 = GCNLayer(input_dim, hidden_dim)
        self.gcn2 = GCNLayer(hidden_dim, output_dim)

    def forward(self, node_features, adj_matrix):
        """
        Forward pass through the two-layer GCN model.

        Parameters:
        - node_features (torch.Tensor): Node features matrix.
        - adj_matrix (torch.Tensor): Adjacency matrix of the graph.

        Returns:
        - torch.Tensor: Node-level outputs.
        """
        # First GCN layer
        x = self.gcn1(node_features, adj_matrix)
        # Second GCN layer
        x = self.gcn2(x, adj_matrix)
        return x

# Prepare node features, adjacency matrix, and label
node_features = torch.tensor(combined_features, dtype=torch.float32)  # Node features tensor
adj_matrix = torch.tensor(adj_matrix_with_loops, dtype=torch.float32)  # Adjacency matrix tensor
label = torch.tensor([sentence_label], dtype=torch.long)  # Label tensor

# Model setup
input_dim = node_features.shape[1]  # Input feature dimension
hidden_dim = 8                      # Hidden layer dimension
output_dim = 2                      # Output dimension (e.g., binary classification)
model = GCNModel(input_dim, hidden_dim, output_dim)
criterion = nn.CrossEntropyLoss()    # Loss function
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)  # Optimizer

# Training loop
epochs = 20
for epoch in range(epochs):
    # Forward pass
    node_outputs = model(node_features, adj_matrix)

    # Mean aggregation for sentence-level embedding
    sentence_output = node_outputs.mean(dim=0, keepdim=True)

    # Compute loss
    loss = criterion(sentence_output, label)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Print loss for each epoch
    print(f"Epoch {epoch+1}/{epochs}, Loss: {loss.item():.4f}")

# Evaluation
with torch.no_grad():
    # Forward pass on evaluation
    node_outputs = model(node_features, adj_matrix)
    sentence_output = node_outputs.mean(dim=0, keepdim=True)

    # Get the predicted label by finding the index with the max value
    _, predicted = torch.max(sentence_output, dim=1)

    # Print the predicted and true labels
    print("Predicted Label:", predicted.item())
    print("True Label:", label.item())


Epoch 1/20, Loss: 0.7942
Epoch 2/20, Loss: 0.7038
Epoch 3/20, Loss: 0.6931
Epoch 4/20, Loss: 0.6931
Epoch 5/20, Loss: 0.6931
Epoch 6/20, Loss: 0.6931
Epoch 7/20, Loss: 0.6931
Epoch 8/20, Loss: 0.6931
Epoch 9/20, Loss: 0.6931
Epoch 10/20, Loss: 0.6931
Epoch 11/20, Loss: 0.6931
Epoch 12/20, Loss: 0.6931
Epoch 13/20, Loss: 0.6931
Epoch 14/20, Loss: 0.6931
Epoch 15/20, Loss: 0.6931
Epoch 16/20, Loss: 0.6931
Epoch 17/20, Loss: 0.6931
Epoch 18/20, Loss: 0.6931
Epoch 19/20, Loss: 0.6931
Epoch 20/20, Loss: 0.6931
Predicted Label: 0
True Label: 1


Ob: wrong prediction

change epoch=40, it will tune more, and the prediction will be correct


### Summary and Key Takeaways

- **GCN Model Architecture**: Our model includes two GCN layers for sentence classification, allowing each node to aggregate information from its neighbors.
- **Sentence-Level Classification**: We use mean aggregation of node features to obtain a sentence representation for classification.
- **Training and Evaluation**: The model is trained using cross-entropy loss, and the output can be evaluated on unseen sentences.

With this GCN model setup, you’re now equipped to perform sentence classification using graph-based techniques. In the next section, we’ll explore ways to experiment with model configurations, including using different aggregation methods and tuning hyperparameters to optimize performance.