# Assignment nº1
## Lung Cancer Classification using Computerized Tomography (CT) Data
### Laboratory of Artificial Intelligence and Data Science (2024/25)

##### Work assembled by Alejandro Gonçalves (202205564), Francisca Mihalache (202206022), João Sousa (202205238) and Vítor Ferreira (201109428).

## Table of contents <a name="contents"></a>
1. [Introduction](#introduction)
2. [Business Understanding](#business)
8. [References](#references)

## Introduction <a name="introduction"></a>
[[go back to the top]](#contents)

The primary objective of this project is to develop a robust system for **predicting the malignancy** of lung nodules using radiomics, which could aid in the early diagnosis of lung cancer and minimize variability in human interpretation. 

To ensure a systematic and efficient approach, we will follow the **CRISP-DM** (Cross-Industry Standard Process for Data Mining) framework. CRISP-DM is a widely adopted methodology that consists of six phases: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment. For the purposes of this academic project, our focus will be on the first five phases, covering all necessary aspects of model development, from problem definition to performance assessment, while excluding the deployment phase.

## Business Understanding <a name="business"></a>
[[go back to the top]](#contents)

Lung cancer remains the leading cause of cancer-related deaths globally, with survival rates highly dependent on the stage at which the disease is detected. Despite advancements in medical imaging, only about **16% of lung cancer cases** are diagnosed at an early, localized stage, when treatment is more likely to be effective **[1]**. For example, it is estimated that in 2024, approximately **234,580 new cases of lung cancer** will be diagnosed in the United States, with **125,070 deaths** resulting from the disease **[2]**. These statistics underscore the critical need for early detection, which can significantly improve patient outcomes, increasing the five-year survival rate from around 5% in advanced stages to over 50% in early stages.

**Computed Tomography (CT)** imaging has become a valuable, non-invasive tool for identifying lung nodules —potential indicators of lung cancer. However, traditional radiological analysis of CT images poses challenges: interpretations can vary among radiologists due to the subjective nature of image analysis, and reviewing large volumes of imaging data is time-consuming. To mitigate these limitations, **Computer-Aided Diagnosis (CAD) systems** have been developed to support radiologists by automatically evaluating the malignancy risk of lung nodules, potentially increasing diagnostic accuracy and efficiency.

A promising approach within CAD systems is **radiomics** — the extraction and analysis of a large number of quantitative features from medical images to provide non-invasive diagnostic and prognostic insights. Radiomics enables the transformation of imaging data into mineable, high-dimensional information, allowing for a more objective and detailed understanding of lung nodules. Moreover, the rise of machine learning (ML) and deep learning (DL) techniques, particularly Convolutional Neural Networks (CNNs), has demonstrated the potential to automatically learn and recognize patterns within medical images, facilitating improved classification and malignancy prediction.

Previous studies have utilized these techniques in conjunction with feature extraction tools to enhance classification performance. Despite their potential, however, these approaches face challenges in data privacy, standardization, and clinical validation. Addressing these issues is crucial to ensure reliable and ethically deployment in clinical settings, enabling improved early detection of lung cancer.

## References <a name="references"></a>
[[go back to the top]](#references)

**[1]** World Health Organization. "Latest Global Cancer Data: Cancer Burden Rises to 18.1 Million New Cases and 9.6 Million Cancer Deaths in 2018". Press Release Nº263; 2018.

**[2]** American Cancer Society. "Cancer Facts & Figures 2024". Atlanta: 2024.




In [1]:
import statistics
import pandas as pd
import pylidc as pl
from pylidc.utils import consensus
import SimpleITK as sitk
from radiomics import featureextractor
import numpy as np

# Initialize the feature extractor
extractor = featureextractor.RadiomicsFeatureExtractor()

# Get the list of additional features from pl.annotation_feature_names
additional_features = pl.annotation_feature_names

# Query the LIDC-IDRI dataset for scans with annotations
scans_with_annotations = pl.query(pl.Scan).filter(pl.Scan.annotations.any()).all()

# Lists to store the extracted features and patient IDs
features_list = []

# Variable to create unique IDs for the nodules
nodule_id_counter = 1

# Iterating through all scans with annotations
for scan in scans_with_annotations:
    # Get the patient ID
    patient_id = scan.patient_id

    # Clusterize the annotations for the scan and retrieve all annotations
    nods = scan.cluster_annotations()

    # Iterating through all nodules of the patient
    for anns in nods:
        # Check if the current nodule has annotations
        if anns:
            # Convert consensus annotations into a mask
            cmask, _, _ = pl.utils.consensus(anns, clevel=0.5, pad=[(20, 20), (20, 20), (0, 0)])

            # Convert the pixel array to a SimpleITK image
            image = sitk.GetImageFromArray(cmask.astype(float))

            # Extract radiomic features using PyRadiomics
            features = extractor.execute(image, image, label=1)  # Use label 1 for the nodule

            # Add the patient ID to the features
            features['Patient_ID'] = patient_id

            # Add a unique ID for the nodule
            features['Nodule_ID'] = f'Nodule_{nodule_id_counter}'
            nodule_id_counter += 1

            def calculate_value(value):
                try:
                    return statistics.mode(value)
                except statistics.StatisticsError:
                    return np.mean(value)


            def calculate_mean(value):
                return np.mean(value)

            subtlety_value = calculate_value([ann.subtlety for ann in anns])
            internalStructure_value = calculate_value([ann.internalStructure for ann in anns])
            calcification_value = calculate_value([ann.calcification for ann in anns])
            sphericity_value = calculate_value([ann.sphericity for ann in anns])
            margin_value = calculate_value([ann.margin for ann in anns])
            lobulation_value = calculate_value([ann.lobulation for ann in anns])
            spiculation_value = calculate_value([ann.spiculation for ann in anns])
            texture_value = calculate_value([ann.texture for ann in anns])
            malignancy_value = calculate_mean([ann.malignancy for ann in anns])

            # Add the additional features to the features dictionary
            for feature_name in additional_features:
                features['subtlety'] = subtlety_value
                features['internalStructure'] = internalStructure_value
                features['sphericity'] = sphericity_value
                features['margin'] = margin_value
                features['lobulation'] = lobulation_value
                features['spiculation'] = spiculation_value
                features['texture'] = texture_value
                features['malignancy'] = malignancy_value  # final_malignancy

            # Add the features to the list
            features_list.append(features)

        # Create a DataFrame to store the features
features_df = pd.DataFrame(features_list)

# Save the features to a CSV file
features_df.to_csv('radiomic_features_test.csv', index=False)


GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Avera

Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.


GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Avera

Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.


GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Avera

Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.


GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Avera

Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.


GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Avera

Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.


GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Avera

Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.


GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Avera

Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.


GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Avera

Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.


GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Avera

Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.


GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Avera

Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.


GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Avera

Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.


GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Avera

Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.


GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated


Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.


GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Avera

Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.


GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated
GLCM is symmetrical, therefore Sum Average = 2 * Joint Avera

In [2]:
import statistics
import pandas as pd
import pylidc as pl
from pylidc.utils import consensus
import SimpleITK as sitk
from radiomics import featureextractor
import numpy as np
from skimage import measure
from scipy.fft import fftn
import matplotlib.pyplot as plt

# Inicializar o extractor de características
extractor = featureextractor.RadiomicsFeatureExtractor()

# Obter a lista de características adicionais
additional_features = pl.annotation_feature_names

# Consultar o conjunto de dados LIDC-IDRI para scans com anotações
scans_with_annotations = pl.query(pl.Scan).filter(pl.Scan.annotations.any()).all()

# Listas para armazenar as características extraídas e IDs de pacientes
features_list = []

# Variável para criar IDs únicos para os nódulos
nodule_id_counter = 1

# Função para calcular a Transformada de Fourier 3D
def calculate_fourier_3d(cmask):
    fourier_transform_3d = fftn(cmask) #funcão para calcular a fourier 
    return np.abs(fourier_transform_3d)

# Iterar por todos os scans com anotações
for scan in scans_with_annotations:
    # Obter o ID do paciente
    patient_id = scan.patient_id

    # Clusterizar as anotações para o scan e recuperar todas as anotações
    nods = scan.cluster_annotations()

    # Iterar por todos os nódulos do paciente
    for anns in nods:
        # Verificar se o nódulo atual tem anotações
        if anns:
            # Converter anotações de consenso em uma máscara
            cmask, _, _ = consensus(anns, clevel=0.5, pad=[(20, 20), (20, 20), (0, 0)])

            # Verifique a forma da máscara
            print(f"Shape of cmask: {cmask.shape}")

            # Verificar se a máscara contém rótulos válidos
            print(f'Mask unique values: {np.unique(cmask)}')  # Exibir valores únicos na máscara

            if np.sum(cmask) > 0:
                # Convert the pixel array to a SimpleITK image
                image = sitk.GetImageFromArray(cmask.astype(float))

                # Extrair características radiômicas usando PyRadiomics
                features = extractor.execute(image, image, label=1)  # Usar o rótulo 1 para o nódulo

                # Adicionar o ID do paciente às características
                features['Patient_ID'] = patient_id

                # Adicionar um ID único para o nódulo
                features['Nodule_ID'] = f'Nodule_{nodule_id_counter}'
                nodule_id_counter += 1

                # Calcular a Transformada de Fourier 3D
                fourier_magnitude_3d = 
                features['Fourier_Magnitude_3D'] = fourier_magnitude_3d.flatten().tolist()  # Armazenar como lista

                # Extrair características adicionais
                subtlety_value = statistics.mode([ann.subtlety for ann in anns])
                internalStructure_value = statistics.mode([ann.internalStructure for ann in anns])
                calcification_value = statistics.mode([ann.calcification for ann in anns])
                sphericity_value = statistics.mode([ann.sphericity for ann in anns])
                margin_value = statistics.mode([ann.margin for ann in anns])
                lobulation_value = statistics.mode([ann.lobulation for ann in anns])
                spiculation_value = statistics.mode([ann.spiculation for ann in anns])
                texture_value = statistics.mode([ann.texture for ann in anns])
                malignancy_value = np.mean([ann.malignancy for ann in anns])  # Média para malignidade

                # Adicionar as características adicionais ao dicionário de características
                features['subtlety'] = subtlety_value
                features['internalStructure'] = internalStructure_value
                features['sphericity'] = sphericity_value
                features['margin'] = margin_value
                features['lobulation'] = lobulation_value
                features['spiculation'] = spiculation_value
                features['texture'] = texture_value
                features['malignancy'] = malignancy_value

                # Adicionar as características à lista
                features_list.append(features)

            else:
                print(f'No valid labels found in mask for patient {patient_id}. Skipping this nodule.')

# Criar um DataFrame para armazenar as características
features_df = pd.DataFrame(features_list)

# Salvar as características em um arquivo CSV
features_df.to_csv('radiomic_features_test_t.csv', index=False)

# Visualizar a magnitude da Transformada de Fourier para a primeira fatia, se necessário
if len(features_list) > 0 and 'Fourier_Magnitude_3D' in features_list[0]:
    fourier_magnitude_3d_sample = features_list[0]['Fourier_Magnitude_3D']
    plt.imshow(np.log1p(np.array(fourier_magnitude_3d_sample).reshape(cmask.shape)), cmap='gray')
    plt.title('Magnitude da Transformada de Fourier - 3D (Exemplo)')
    plt.axis('off')
    plt.show()


GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated


Shape of cmask: (74, 84, 6)
Mask unique values: [False  True]


ValueError: n_components=50 must be between 0 and min(n_samples, n_features)=1 with svd_solver='full'

In [1]:
import statistics
import pandas as pd
import pylidc as pl
from pylidc.utils import consensus
import SimpleITK as sitk
from radiomics import featureextractor
import numpy as np
from sklearn.decomposition import PCA
from skimage import measure
from scipy.fft import fftn
import matplotlib.pyplot as plt

# Inicializar o extractor de características
extractor = featureextractor.RadiomicsFeatureExtractor()

# Obter a lista de características adicionais
additional_features = pl.annotation_feature_names

# Consultar o conjunto de dados LIDC-IDRI para scans com anotações
scans_with_annotations = pl.query(pl.Scan).filter(pl.Scan.annotations.any()).all()

# Listas para armazenar as características extraídas e IDs de pacientes
features_list = []

# Variável para criar IDs únicos para os nódulos
nodule_id_counter = 1

# Função para calcular a Transformada de Fourier 3D
def calculate_fourier_3d(cmask):
    fourier_transform_3d = fftn(cmask)  # Calcular Fourier 3D
    return np.abs(fourier_transform_3d)

# Função para aplicar o PCA à Fourier Magnitude 3D
def apply_pca(fourier_magnitude_3d, n_components=50):
    # Inicializar o PCA
    pca = PCA(n_components=n_components)
    # Achatar a matriz da Fourier Magnitude para 2D antes de aplicar o PCA
    flat_fourier = fourier_magnitude_3d.flatten().reshape(1, -1)
    # Aplicar o PCA
    reduced_fourier = pca.fit_transform(flat_fourier)
    return reduced_fourier.flatten()

# Iterar por todos os scans com anotações
for scan in scans_with_annotations:
    # Obter o ID do paciente
    patient_id = scan.patient_id

    # Clusterizar as anotações para o scan e recuperar todas as anotações
    nods = scan.cluster_annotations()

    # Iterar por todos os nódulos do paciente
    for anns in nods:
        # Verificar se o nódulo atual tem anotações
        if anns:
            # Converter anotações de consenso em uma máscara
            cmask, _, _ = consensus(anns, clevel=0.5, pad=[(20, 20), (20, 20), (0, 0)])

            # Verifique a forma da máscara
            print(f"Shape of cmask: {cmask.shape}")

            # Verificar se a máscara contém rótulos válidos
            print(f'Mask unique values: {np.unique(cmask)}')  # Exibir valores únicos na máscara

            if np.sum(cmask) > 0:
                # Convert the pixel array to a SimpleITK image
                image = sitk.GetImageFromArray(cmask.astype(float))

                # Extrair características radiômicas usando PyRadiomics
                features = extractor.execute(image, image, label=1)  # Usar o rótulo 1 para o nódulo

                # Adicionar o ID do paciente às características
                features['Patient_ID'] = patient_id

                # Adicionar um ID único para o nódulo
                features['Nodule_ID'] = f'Nodule_{nodule_id_counter}'
                nodule_id_counter += 1

                # Calcular a Transformada de Fourier 3D
                fourier_magnitude_3d = calculate_fourier_3d(cmask)

                # Aplicar PCA à Fourier Magnitude 3D
                reduced_fourier = apply_pca(fourier_magnitude_3d, n_components=50)

                # Armazenar o Fourier Magnitude 3D reduzido
                features['Fourier_Magnitude_3D_PCA'] = reduced_fourier.tolist()

                # Extrair características adicionais
                subtlety_value = statistics.mode([ann.subtlety for ann in anns])
                internalStructure_value = statistics.mode([ann.internalStructure for ann in anns])
                calcification_value = statistics.mode([ann.calcification for ann in anns])
                sphericity_value = statistics.mode([ann.sphericity for ann in anns])
                margin_value = statistics.mode([ann.margin for ann in anns])
                lobulation_value = statistics.mode([ann.lobulation for ann in anns])
                spiculation_value = statistics.mode([ann.spiculation for ann in anns])
                texture_value = statistics.mode([ann.texture for ann in anns])
                malignancy_value = np.mean([ann.malignancy for ann in anns])  # Média para malignidade

                # Adicionar as características adicionais ao dicionário de características
                features['subtlety'] = subtlety_value
                features['internalStructure'] = internalStructure_value
                features['sphericity'] = sphericity_value
                features['margin'] = margin_value
                features['lobulation'] = lobulation_value
                features['spiculation'] = spiculation_value
                features['texture'] = texture_value
                features['malignancy'] = malignancy_value

                # Adicionar as características à lista
                features_list.append(features)

            else:
                print(f'No valid labels found in mask for patient {patient_id}. Skipping this nodule.')

# Criar um DataFrame para armazenar as características
features_df = pd.DataFrame(features_list)

# Salvar as características em um arquivo CSV
features_df.to_csv('radiomic_features_with_pca.csv', index=False)


GLCM is symmetrical, therefore Sum Average = 2 * Joint Average, only 1 needs to be calculated


Shape of cmask: (74, 84, 6)
Mask unique values: [False  True]


ValueError: n_components=50 must be between 0 and min(n_samples, n_features)=1 with svd_solver='full'