In [17]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import radiomics
import pylidc as pl

In [18]:
help(dir)

Help on built-in function dir in module builtins:

dir(...)
    dir([object]) -> list of strings
    
    If called without an argument, return the names in the current scope.
    Else, return an alphabetized list of names comprising (some of) the attributes
    of the given object, and of attributes reachable from it.
    If the object supplies a method named __dir__, it will be used; otherwise
    the default dir() logic is used and returns:
      for a module object: the module's attributes.
      for a class object:  its attributes, and recursively the attributes
        of its bases.
      for any other object: its attributes, its class's attributes, and
        recursively the attributes of its class's base classes.



---
## Data Exploration and Extraction
---

In [53]:
# Checking the amount of scans available
print(f"There are {len(pl.query(pl.Scan).all())} Scans available")

There are 1018 Scans available


In [55]:
# Checking the amount of Annotations available
print(f"There are {len(pl.query(pl.Annotation).all())} Annotation available")

There are 6859 Annotation available


In [44]:
ann = pl.query(pl.Annotation).first()
ann.print_formatted_feature_table()

Feature              Meaning                    # 
-                    -                          - 
Subtlety           | Obvious                  | 5 
Internalstructure  | Soft Tissue              | 1 
Calcification      | Absent                   | 6 
Sphericity         | Ovoid                    | 3 
Margin             | Near Sharp               | 4 
Lobulation         | No Lobulation            | 1 
Spiculation        | No Spiculation           | 1 
Texture            | Solid                    | 5 
Malignancy         | Indeterminate            | 3 


---
### Annotation Class Analysis
---

> The Nodule model class holds the information from a single physicians annotation of a nodule >= 3mm class with a particular scan. A nodule has many contours, each of which refers to the contour drawn for nodule in each scan slice.

In [73]:
# Inspect all the attributes inside an Annotation Object
annotationParams = [param for param, paramValue in pl.Annotation.__dict__.items() if param[0] != '_' and not callable(paramValue)]
annotationParams

['id',
 'scan_id',
 'scan',
 'subtlety',
 'internalStructure',
 'calcification',
 'sphericity',
 'margin',
 'lobulation',
 'spiculation',
 'texture',
 'malignancy',
 'Subtlety',
 'InternalStructure',
 'Calcification',
 'Sphericity',
 'Margin',
 'Lobulation',
 'Spiculation',
 'Texture',
 'Malignancy',
 'centroid',
 'diameter',
 'surface_area',
 'volume',
 'contour_slice_zvals',
 'contour_slice_indices',
 'contours_matrix',
 'contours']

According to the Documentation an instance of the [Object Annotation](https://pylidc.github.io/annotation.html) contains the following important data:

| Parameter         | Type    | Range                   |
| :---------------: | :-----: | :---------------------: |
| subtlety          | Integer | $\in \{1, 2, 3, 4, 5\}$ |
| internalStructure | Integer | $\in \{1, 2, 3, 4\}$    |
| calcification     | Integer | $\in \{1, 2, 3, 4, 6\}$ |
| sphericity        | Integer | $\in \{1, 2, 3, 4, 5\}$ |
| margin            | Integer | $\in \{1, 2, 3, 4, 5\}$ |
| lobulation        | Integer | $\in \{1, 2, 3, 4, 5\}$ |
| spiculation       | Integer | $\in \{1, 2, 3, 4, 5\}$ |
| texture           | Integer | $\in \{1, 2, 3, 4, 5\}$ |
| malignancy        | Integer | $\in \{1, 2, 3, 4, 5\}$ |

subtlety:
    
    Difficulty of detection. Higher values indicate easier detection.
    
    1. 'Extremely Subtle'
    2. 'Moderately Subtle'
    3. 'Fairly Subtle'
    4. 'Moderately Obvious'
    5. 'Obvious'

internalStructure:

    Internal composition of the nodule.

    1. 'Soft Tissue'
    2. 'Fluid'
    3. 'Fat'
    4. 'Air'

calcification:

    Pattern of calcification, if present.

    1. 'Popcorn'
    2. 'Laminated'
    3. 'Solid'
    4. 'Non-central'
    5. 'Central'
    6. 'Absent'

sphericity:

    The three-dimensional shape of the nodule in terms of its roundness.

    1. 'Linear'
    2. 'Ovoid/Linear'
    3. 'Ovoid'
    4. 'Ovoid/Round'
    5. 'Round'

margin:

    Description of how well-defined the nodule margin is.

    1. 'Poorly Defined'
    2. 'Near Poorly Defined'
    3. 'Medium Margin'
    4. 'Near Sharp'
    5. 'Sharp'

lobulation:

    The degree of lobulation ranging from none to marked

    1. 'No Lobulation'
    2. 'Nearly No Lobulation'
    3. 'Medium Lobulation'
    4. 'Near Marked Lobulation'
    5. 'Marked Lobulation'

spiculation:

    The extent of spiculation present.

    1. 'No Spiculation'
    2. 'Nearly No Spiculation'
    3. 'Medium Spiculation'
    4. 'Near Marked Spiculation'
    5. 'Marked Spiculation'

texture: 

    Radiographic solidity: internal texture (solid, ground glass, or mixed). 

    1. 'Non-Solid/GGO'
    2. 'Non-Solid/Mixed'
    3. 'Part Solid/Mixed'
    4. 'Solid/Mixed'
    5. 'Solid'

malignancy: 

    Subjective assessment of the likelihood of malignancy, assuming the scan originated from a 60-year-old male smoker. 

    1. 'Highly Unlikely'
    2. 'Moderately Unlikely'
    3. 'Indeterminate'
    4. 'Moderately Suspicious'
    5. 'Highly Suspicious'

---
### Scan Class Analysis
---

> The Scan model class refers to the top-level XML file from the LIDC. A scan has many pylidc.Annotation objects, which correspond to the unblindedReadNodule XML attributes for the scan.

In [75]:
# Inspect all the attributes inside an Scan Object
scanParams = [param for param, paramValue in pl.Scan.__dict__.items() if param[0] != '_' and not callable(paramValue)]
scanParams

['id',
 'study_instance_uid',
 'series_instance_uid',
 'patient_id',
 'slice_thickness',
 'pixel_spacing',
 'contrast_used',
 'is_from_initial',
 'sorted_dicom_file_names',
 'slice_zvals',
 'slice_spacing',
 'spacings',
 'annotations',
 'zvals']

According to the Documentation an instance of the [Object Scan](https://pylidc.github.io/scan.html) contains the following attributes:

study_instance_uid 
> string – DICOM attribute (0020,000D).

series_instance_uid 
> string – DICOM attribute (0020,000E).

patient_id 
> string – Identifier of the form “LIDC-IDRI-dddd” where dddd is a string of integers.

slice_thickness 
> float – DICOM attribute (0018,0050). Note that this may not be equal to the slice_spacing attribute (see below).

slice_zvals 
> ndarray – The “z-values” for the slices of the scan (i.e., the last coordinate of the ImagePositionPatient DICOM attribute) as a NumPy array sorted in increasing order.

slice_spacing 
> float – This computed property is the median of the difference between the slice coordinates, i.e., scan.slice_zvals.

pixel_spacing 
> float – Dicom attribute (0028,0030). This is normally two values. All scans in the LIDC have equal resolutions in the transverse plane, so only one value is used here.

contrast_used 
> bool – If the DICOM file for the scan had any Contrast tag, this is marked as True.

is_from_initial 
> bool – Indicates whether or not this PatientID was tagged as part of the initial 399 release.

sorted_dicom_file_names 
    
> string – This attribute is no longer used, and can be ignored.