# **Feature extraction: Gray Level Occurrence Matrix (GLCM)**

In this section we will describe the process of feature extraction from an image, including the method used and the metrics employed. The main objective of this process is to identify and quantify the relevant image features for diagnostic classification. The technique used for the extraction of the features of interest is texture analysis. The texture of an image refers to the surface appearance of the object or material depicted in it. Its analysis focuses on the measurement and quantification of the intuitive qualities that define the term, such as: roughness, smoothness, uniformity, granulation, presence of lines or specific patterns etc.. . These textural qualities are measured through the distribution of pixel intensity values in an image.

The distribution of pixel variation refers to how the pixel intensity values vary in an image and how they are distributed in the image. If the texture is homogeneous it will have a uniform distribution of pixel intensity levels (blank sheet of paper), and if the texture is more complex it will follow a more varied distribution of pixel intensity levels by having more diversity of elements or patterns.

There are different methods for texture analysis, such as Wavelet transform, Fourier, Gabo, Local Binary Patterns (LBP) and GLCM, among others. In this work, the gray level co-occurrence matrix (GLCM) technique will be used as one of the most popular and effective techniques for texture analysis [
  

**GLCM**

GLCM is a very powerful statistical method based on image analysis developed by Robert M. Haralickt in the 1970s. It is based on the spatial distribution of gray levels in an image, providing information about the structure and composition of the image. It is used in a variety of fields involving image processing and computer vision such as: medical image analysis, geology, security, etc.

The GLCM is a frequency matrix that describes the spatial relationship between pairs of pixels in an image for a certain distance and orientation (angles). It calculates how often pixel pairs that show specific values and are in a specific spatial relationship occur in an image, i.e. how often a pixel with a gray level (i) appears in a specific spatial relationship with another pixel of gray level (j). The dimension of the matrix is equal to the number of gray levels present in the image. Concurrence matrices are second order measures, by considering pairs of neighboring pixels, providing valuable information about the structure and texture of an image that cannot be obtained directly from individual pixel values (first order).

Once the GLCM has been calculated, useful information on texture characteristics such as energy, entropy, contrast, homogeneity and correlation is obtained. The following will briefly explain what each of these characteristics consists of.

* **Energy**: Measures uniformity or homogeneity in terms of the distribution of pixel intensities in the image. An image with a high energy indicates that the pixel intensity values are uniformly distributed (neighboring pixels have a high probability of having the same gray values). On the contrary, if the image has a low energy the intensity of the pixels is different, there are abrupt variations or patterns in the image. The energy is calculated by summing the squares of all the elements of the co-occurrence matrix(Gij), where N number of gray levels in the gray level image segments.

* **Entropy**: Quantifies the randomness or irregularity of the distribution of pixel values in an image. If the entropy is high the image has a more heterogeneous and complex texture, implying that there are visual patterns or textures that may be of interest for analysis. If the entropy is low, the image is more homogeneous and uniform indicating a lack of clear or predictable patterns in the way pixel values are distributed in the image.

* **Contrast**: Measures the degree to which pairs of pixels differ in intensity or color. A high contrast means that adjacent pixels have very different gray values, so there is a large difference between light and dark pixels forming an image with a strong, sharp texture. In contrast, a low contrast indicates that adjacent pixels have similar gray values, resulting in an image with a soft or blurred texture.

* **Homogeneity**: A measure used to describe the uniformity of the intensities of adjacent pixels in an image, if it is low the gray values of adjacent pixels are very different, which may be due to the presence of various objects, textures, illuminations, shadows, among other factors. On the other hand, if it is high the gray values of adjacent pixels are similar, suggesting a uniform and smooth texture in the image.

* **Correlation**: A statistical measure that quantifies the similarity of the intensity values of pairs of pixels in an image. A high correlation value reflects that the two image regions have a very similar distribution of gray level pairs.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

Your runtime has 27.3 gigabytes of available RAM

You are using a high-RAM runtime!


In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import os
from zipfile import ZipFile
from PIL import Image
from skimage import data, io, color
import sys
import cv2
from cv2 import cvtColor
from skimage.feature import greycomatrix, greycoprops

In [None]:
train_csv = '/content/drive/MyDrive/train.csv'
train_images= '/content/drive/MyDrive/train_images.zip'

In [None]:
with ZipFile('/content/drive/MyDrive/train_images.zip', 'r') as zip_ref:
    # Loop through all files in the zip file
    for filename in zip_ref.namelist():
        # Check if the file is an image (you may want to adjust this based on your image file types)
        if filename.endswith('.png'):
            # Extract the image file to a temporary location if the directory 'DR' doesn't already exist
            if not os.path.exists('DR'):
                os.makedirs('DR')
                zip_ref.extract(filename, path='DR')
            else:
                if not os.path.exists(f'DR/{filename}'):
                    zip_ref.extract(filename, path='DR')

In [None]:
train = pd.read_csv(train_csv, delimiter=',')
#Creating column
train['labels']= np.where(train['diagnosis'] == 0, 'No', 'Si')
train.dataframeName = 'train.csv'
nRow, nCol = train.shape
print(f'There are {nRow} rows and {nCol} columns in the training set')
#len(df.columns)
train.head()

There are 3662 rows and 3 columns in the training set


Unnamed: 0,id_code,diagnosis,labels
0,000c1434d8d7,2.0,Si
1,001639a390f0,4.0,Si
2,0024cdab0c1e,1.0,Si
3,002c21358ce6,0.0,No
4,005b95c28852,0.0,No


In [None]:
train['id_length'] = train['id_code'].str.len()
print(train['id_length'].unique())
train = train[train['id_length'] != 1]
train

[12  1]


Unnamed: 0,id_code,diagnosis,labels,id_length
0,000c1434d8d7,2.0,Si,12
1,001639a390f0,4.0,Si,12
2,0024cdab0c1e,1.0,Si,12
3,002c21358ce6,0.0,No,12
4,005b95c28852,0.0,No,12
...,...,...,...,...
3657,ffa47f6a7bf4,2.0,Si,12
3658,ffc04fed30e6,0.0,No,12
3659,ffcf7b45f213,2.0,Si,12
3660,ffd97f8cd5aa,0.0,No,12


# GLCM

In [None]:
import os
import cv2
import numpy as np
from PIL import Image
from skimage.feature import greycomatrix, greycoprops
import pandas as pd

# Define the GLCM parameters
distance = [1]
directory = 'DR'
angles = [0, np.pi/4, np.pi/2, 3*np.pi/4]
properties = ['correlation', 'homogeneity', 'contrast', 'energy', 'dissimilarity']

# Create an empty list to store the texture features
texture_features = []

# Loop over all the images in the directory
for filename in os.listdir(directory):
    # Read the image
    filepath = os.path.join(directory, filename)
    if os.path.isfile(filepath) and os.path.splitext(filepath)[1].lower() in ['.png']:
       with Image.open(filepath) as img:
        # Resize and preprocess the image
        output_size = (224, 224)
        resized_img = img.copy().resize(output_size)
        gray_img = cv2.cvtColor(np.array(resized_img), cv2.COLOR_BGR2GRAY)
        equalized_img = cv2.equalizeHist(gray_img)
        blur = cv2.GaussianBlur(equalized_img, (5,5), 0)
        thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

        # Compute the GLCM and the texture features
        glcm_mat = greycomatrix(thresh, distances=distance, angles=angles, symmetric=True, normed=True)
        block_glcm = np.hstack([greycoprops(glcm_mat, props).ravel() for props in properties])

        # Append the texture features with the angles and property names
        for i, prop in enumerate(properties):
            for j, ang in enumerate(angles):
                texture_features.append(block_glcm[i*len(angles)+j])
        texture_features.append(filename)

        # Convert NumPy array to PIL Image object
        thresh_img = Image.fromarray(thresh)

        # Save preprocessed image
        os.makedirs(os.path.join(directory, "Preprocessed_images"), exist_ok=True)
        thresh_img.save(os.path.join(directory, "Preprocessed_images", filename))

# Create the pandas DataFrame for GLCM features data
columns = []
for name in properties:
    for ang in angles:
        columns.append(name + "_" + str(int(np.rad2deg(ang))))
columns.append("filename")

glcm_df = pd.DataFrame(np.array(texture_features).reshape(-1, 21), columns=columns)


[1;30;43mSe han truncado las últimas 5000 líneas del flujo de salida.[0m
/usr/local/lib/python3.10/dist-packages/skimage/feature/__init__.py:42: skimage_deprecation: Function ``greycoprops`` is deprecated and will be removed in version 1.0. Use ``skimage.feature.graycoprops`` instead.
  removed_version='1.0')
/usr/local/lib/python3.10/dist-packages/skimage/feature/__init__.py:42: skimage_deprecation: Function ``greycoprops`` is deprecated and will be removed in version 1.0. Use ``skimage.feature.graycoprops`` instead.
  removed_version='1.0')
/usr/local/lib/python3.10/dist-packages/skimage/feature/__init__.py:42: skimage_deprecation: Function ``greycoprops`` is deprecated and will be removed in version 1.0. Use ``skimage.feature.graycoprops`` instead.
  removed_version='1.0')
/usr/local/lib/python3.10/dist-packages/skimage/feature/__init__.py:42: skimage_deprecation: Function ``greycoprops`` is deprecated and will be removed in version 1.0. Use ``skimage.feature.graycoprops`` instead

In [None]:
print(glcm_df.head(10))

        correlation_0      correlation_45      correlation_90  \
0  0.9591237381454681  0.9499284054812472  0.9696182756636074   
1  0.9543529548861951  0.9401528624451998  0.9624391917217748   
2  0.9686938523922082  0.9598283632328647  0.9760227482194658   
3  0.9102065863454754  0.8891924193662563  0.9348806315459068   
4  0.9375675139947603  0.9258517750976872  0.9611045503133605   
5   0.930973665626338  0.9143741489838481  0.9476697627837024   
6  0.9206182774841879  0.8989630322933457  0.9268271177546529   
7   0.952270309677347  0.9309738220688081  0.9465846557493076   
8  0.9081049495473961  0.8896642344144806  0.9410328049062968   
9  0.9559542708125693  0.9348448484653087  0.9507482503252632   

      correlation_135       homogeneity_0      homogeneity_45  \
0  0.9509749179068804  0.9795807112034207  0.9749848002334496   
1   0.941600777063461  0.9771784419332351  0.9700782819512644   
2  0.9629115136845184  0.9861469138752621  0.9821837082048525   
3  0.8908822953236336  0

The textural features that can be extracted from a fundus image to detect diabetic retinopathy are energy, entropy, contrast, homogeneity and correlation.

Energy measures the uniformity of the distribution of pixel intensities, entropy measures the randomness or complexity of the texture, contrast measures the difference in intensities between pairs of adjacent pixels, homogeneity measures the uniformity of adjacent pixel intensities, and correlation measures the similarity of pixel intensity values in different regions of the image.

Each of these features can provide valuable information for the analysis and detection of diabetic retinopathy. Specifically, the **main goal of GLCM is to identify and quantify features relevant to the classification of the presence of diabetic retinopathy**. These features include structural alterations in blood vessels, the presence of microaneurysms, exudates, and other specific changes associated with the presence of diabetic retinopathy. The detection, quantification, and extraction of these irregularities are used as input data to train machine learning models that classify whether or not a patient has DR based on fundus images.