# [U-Net Segmentation Approach to Cancer Diagnosis](https://www.kaggle.com/c/data-science-bowl-2017#tutorial)
*approach to predicting whether a CT scan is of a patient who either has or will develop cancer within the next 12 months or not*

General Approach:
1. train a network to segment out potentially cancerous nodules
2. use the characteristics of that segmentation to make predictions about the diagnosis of the scanned patient within a 12 month time frame


# Downloading Instructions
1. **pydicom** (dicom): type in anaconda command prompt: `pip install pydicom` ([reference](http://pydicom.readthedocs.io/en/latest/getting_started.html))
2. **SimpleITK**: type in anaconda command prompt: `conda install -c https://conda.anaconda.org/simpleitk SimpleITK` ([reference](https://itk.org/Wiki/SimpleITK/GettingStarted))
3. **xgboost**: type in anaconda command prompt: `pip install xgboost` ([reference](http://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/), [long version reference](https://www.ibm.com/developerworks/community/blogs/jfp/entry/Installing_XGBoost_For_Anaconda_on_Windows?lang=en))

## Installing Keras, Tensorflow, CuDNN, Cuda Tool Kit
*how to install keras, and the gpu supported version of tensorflow, as well as the entire GPU computing library*

**Follow the instructions [here](https://github.com/3-musketeers/kaggle-dsb/blob/master/pipeline/build-simple-model/rough-draft/model_dependency_setup.md)**

## Downloading Data
**Follow the instructions [here](https://github.com/3-musketeers/kaggle-dsb/blob/master/pipeline/build-simple-model/rough-draft/model_data_setup.md)**

# Dependency Descriptions
1. **numpy**: an extension to the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays
2. **scikit-image** (skimage): collection of algorithms for image processing
3. **scikit-learn**: simple and efficient tools for data mining and data analysis
4. **keras** (tensorflow backend): high-level neural networks library, written in Python (runs on top of TensorFlow)
5. **matplotlib**: a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms
6. **pydicom** (dicom): pydicom is a pure python package for working with DICOM files such as medical images, reports, and radiotherapy objects
7. **SimpleITK**: an open-source, cross-platform system that provides developers with an extensive suite of software tools for image analysis 
8. **pandas**: providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language
9. **glob**: a module that finds all the pathnames matching a specified pattern according to the rules used by the Unix shell (results returned in arbitrary order)
10. **csv**: a module that implements classes to read and write tabular data in CSV format
11. **os**: a module that provides a portable way of using operating system dependent functionality
12. **xgboost**: a library designed and optimized for boosting trees algorithms
13. **pickle**: standard mechanism for object serialization

## Details:
1. U-Net style convolutional network: to identify regions with nodules (U-net was designed for segmenting neuronal structures)
2. appearance on nodules within the CT scan: indicate the possibility of cancer
3. Lung Nodule Analysis 2016 (LUNA2016):
   1. provides training examples with marked nodules in order train the U Net to find these nodules (CT images with annotated nodule locations)
   2. use the LUNA data set to generate an appropriate training set for our U-Net
   3. use these examples to train our supervised segmenter

In [1]:
import tensorflow as tf

In [1]:
import keras

Using TensorFlow backend.
