# Course Python Part

## Outlines and Objectives

In this part, we will cover main *** concepts and essential theoretical background *** for predictive modeling cores and techniques. Most of these concepts are ** implemented in Python as modules ** such that if each module is needed at any time, it will be easy to call instead of replicating it. In that regards, each code and concept will be introduced, discussed, and tested. 

We have around 6 clinical use cases through our python course days. You will have 4 folders for 4 days such that each one has its own (DS.py) code and notebooks. All materials will be discussed through ***thought experiments and exercises *** in which the attendees will try themselves the pseudo codes and actual implementations. 

<u> Our objectives are to learn: </u> 
1. Day 1:
    - Introduction to science and challenges
    - Quick refreshments for python and pandas 
    - Data analysis
    - Linear Regression
        - Plotting the train real outcomes and the model predictions to see how the model is good
        - Plotting the learning curve to evaluate the performance
2. Day 2:
    - Linear Regression
        - Partitioning data into folds to estimate the whole performance vs. the individual folds
        - Testing feature selection methods whether they boost the results
    - Regularization:
        - Ridge Regression: Find the best hyper-parameter and validate with nonlinearity synthesis data
        - LASSO and Elastic Net: Concepts and Exercise
    - Classification
        - Decision Tree (DT): building, information theory, and visualization
        - Support Vector Machine (SVM) and K-nearest neighbors (KNN) : Concepts
        - Draw the performance of each classifier
    - Classifiers as Regressors
        - DT, SVM, KNN as regressors
3. Day 3:
    - Logistic Regression
        - Performance
        - Odds Ratio and Risk Ratio
    - Framingham Risk Score
        - Calculations 
        - Toy Example
    - Weka Tour
4. Day 4:
    - Multiclass, Multilabel, vs. Multioutput
    - Random Forest Trees
    - Ensemble methods: Bagging, Boosting, and Voting
        - Concepts and  performance of all of the above topics
        - How to implement Random Forest Tree and AdaBoost Regressor.       

<sub>
    Framingham risk score: http://onlinelibrary.wiley.com/doi/10.1002/sim.1742/abstract
</sub>

<sub>
    Weka:http://www.cs.waikato.ac.nz/ml/weka/downloading.html
</sub>

<sub>
   ***Instructor email: ***  samir.abdelrahman@utah.edu
</sub>

# Predictive Analytics (PA)

PA is a multidisciplinary field that uses a combination of statistics and machine learning modeling techniques to predict unknow future events.

<img src="../images/PA.png" height= 75% width=75%>

<sub>
       Figure Reference: http://www.predictiveanalyticstoday.com/what-is-predictive-analytics/
</sub>

# Predictive Modeling (PM)
PM usually is a combination of machine learning and/or simulation techniques to discover patterns from a given dataset of historical records/data points.

<img src="../images/Datasets.png" height= 35% width=35% style="float: left;">


<img src="../images/WeatherExample.png" height= 42% width=42% style="right;">


<sub>
       Validation Dataset Reference: http://www.predictiveanalyticstoday.com/what-is-predictive-analytics/
</sub>

<sub>
       Weather Dataset Reference: https://www.coursehero.com/tutors-problems/Statistics-and-Probability/10616556-We-will-build-a-na%25C3%25AFve-Bayes-classifier-based-on-the-below-weather-dat/       

</sub>

# Thought Experiments

**In weather dataset: what are predictors/features/attributes and outcome variables? What is the data type of each variables?**

**Is (outlook==sunny) a good predictive value?  What about (outlook==sunny) and (temperature==hot)?**

** What is the difference between 10-fold cross-validation and bootstrapping?**

- For K-fold cross-validation , see the link:     
    https://en.wikipedia.org/wiki/Cross-validation_(statistics).  


# Machine Learning Algorithm Categories 

1. Supervised learning
2. Unsupervised learning
3. Semi-supervised learning
4. Reinforcement learning


# Supervised Learning [Predictive learning]

## Approach

1. It is a task of inferring a function from labeled training data.
2. Each training record is an example or a vector of features/attributes (predictors) and label (outcome) variables.
3. The inferred function maps the examples to new unseen example inferring the label. 

## Types 

1. if Label is categorical, then classification. **(classification types?)**.
2. if Label is continuous, then regression **(regression types?)**.

## Challenges

1. Enough training data.
2. External datsets to validate.
3. Time and memory constraints.


# Unsupervised Learning [Descriptive learning]

##  Approach

1. It is a task of inferring a similarity function from unlabeled training data to partition the dataset into subgroups (subpopulations).
2. Label could be used only for measuring the performance of clusters.


##  Types 

1. Discovering subpopulations (clustering).
2. Discovering Frequent association and patterns among predictors (association/pattern mining).
3. Using visualization to explore the data and dimensionality reduction to represent predictors.


## Challenges

1. Validation: quantitative vs qualitative also intra (in) cluster and inter (between) clusters.
2. Noise and outliers **(difference?)**.
3. Time and memory constraints.


# Semi-supervised Learning 

##  Approach
 
 ### [1]
1. Develop an Integration task between unsupervised and supervisor learners.
2. Usually used when we have many unlabeled examples and few labeled examples.
3. There many versions with different order of learning algorithms.  The aim is to boost the whole learning. 

### [2]
1. Use labeled seeds as initial examples for clustering. These seeds should be sampled using the positive and negative ratios from the data.
2. Run a clustering algorithm to find out the clusters around the seeds. 
3. Repeat the above two steps to refine and redistribute the clusters.

# Reinforcement Learning

1. A task that divides the learning algorithm into cooperative independent objects (agents) such that each agent run in a thread itself.
2. Each agent monitors the environment and learn how to react, ignore, or set up new rules.
3. An agent should run forever unless it is killed.

<img src="../images/Agent.png" height= 35% width=35% style="  right;">




**Thought Experiment: What types of agents are in real-life?**


<sub>
       Agent Structure: https://en.wikipedia.org/wiki/Reinforcement_learning 
</sub>



# Thought Experiments

- Select one of learning algorithm types 
    - Supervised (classification/regression).
    - Unsupervised (clustering/pattern mining).
    - Semi-supervised.
    - Reinforcement Learning.

to estimate the following clinical patient outcome case  and Why?
- Length of stay.
- Mortality.
- Chronic kidney disease detection.
- Chronic kidney disease staging.
- Integration among different data resources.
- A interactive healthcare system that the provider, the patient, and medical assistant to find the most adequate decision
    - The system is either closed or open to the external to world.


# Learning Challenges

1. Bias-variance tradeoff.
2. Leaner complexity and amount of training data.
    - Use assumptions.
    - Use adequate number of examples/data points.
3. Dimensionality/Feature space: 
    - Feature selection methods.
    - Statistical hypothesis testing.
    - ** What is the difference?**.
4. Data Quality.
5. Heterogeneity of feature data types.
    - Categorical, numeric, images, signals, and free-text.
6. Data types (Not all learners and implementations work with each type):
    - Categorical.
        - Encoding.
            - Each Category is a binary variable.
                - DictVectorizer in sklearn
            - Each Category is replaced by an integer value.
    - Continuous
        - Scaling.
        - Normalization.
        - Standardization.
7. Other Problems: Redundancy (derived) , sever multicollinearity, and non-linearity.
    - Use some correlation analysis to remove highly correlated or derived predictors.
    - Use learning algorithms that detect this correlation and deal with it such Neural Network.


# Bias-variance tradeoff                ![image.png](attachment:image.png)

***The two errors are the part of generalization error?***


Toy Example :  $y=3+2\times x+3 \times x^2+ 5\times x^3+.....$

1. Bias: the difference between prediction and actual value.
   - High Bias leads to model underfitting – too general to identify specific detailed data patterns.
   - Solutions:
      - Introduce new predictors.
      - Reduce parameter regularization constraints.  
2. Variance: the difference among predictions themselves.
   - High Variance leads to model overfitting – too specific to identify specific and sensitive to noise and outliers
   - Solutions:
      - Increase the number of examples.
      - Increase parameter regularization constraints.
      - Reduce the variability of feature values using data preprocessing techniques.

<sub>
       Model fits: http://docs.aws.amazon.com/machine-learning/latest/dg/model-fit-underfitting-vs-overfitting.html 
</sub>


# Bias-variance curves


![image.png](attachment:image.png)

# Predictive Modeling Challanges

1. Model updates.
2. Classifiction vs. prediction.
3. Time-varying variables.
4. Inference vs. reasoning vs. prediction.



# Model Updates

In clinical settings, there are some new predictors’ values every meantime: 
- Examples: hospital services, visit types, new diseases, and ICD9 to ICD10.
- If model is not updated periodically, then the model will be isolated.

Solutions:
- Offline updates: pull new training data every meanwhile and add to the current data; create new model of these data, and replace the old model with the new model.
- Online updates: incrementally add new examples, update the model with new training dataset while running the old one, and update the new model whenever the user doesn’t have any interaction with the model.
- Online updates with user experience for each example error, correct the example prediction, and update the model with the correct examples [online learning].


# Classification vs. prediction

**Thought Experiment: What is the difference?**

**Thought Experiment: what are the differences among visit/patient/oservation-performance levels**
    

# Time-varying variables

- Given predictors with multiple readings like vital signs, laboratory tests, medication prescriptions that changes over time.
    - Use abstract to handle each predictor like average, max, min, name/type of reading (SBP, HR,..).
        - High level abstractions leads to classification not prediction task and loose the accuracy.
    - Use time-series analysis to present the predictor over time and predict its changes over time.
    - Use longitudinal analysis if you have many patients with multiple readings over the study period.
- Otherwise, use cross-sectional analysis with many time slot divisions like T1,T2,…, Tn: 
    - Without caring with Time-varying inside each T.
    - Abstracts to present time-varying features. 

# Inference vs. Reasoning vs. Prediction

- All three challenges have level of uncertainties.
- From the given observations/parameters, what is the probability of:
    - Past/current status ==> Inference (I).
    - Future status  ==> Prediction (P).
- Given a sequence of time events, what is a probability that an event occurs before/after/overlap another event ==> Reasoning (R). The provider may want to know:
    - Why the patient shows/showed up now from his prior states, then **I/P/R?**.
    - Will the patient cancel or no-show within next 2 days, then **I/P/R?**.
    - What is the medication prescribed to the patient leads to high SBP, then **I/P/R?**.
- Many predictive analytics or models may include the three types of conclusions.


# Supervised Learning Examples

![image.png](attachment:image.png)


![image.png](attachment:image.png)

Figures' resources from wiki

![image.png](attachment:image.png)


![image.png](attachment:image.png)

Figures' resources from wiki

# Classification Evaluation Metrics

![image.png](attachment:image.png)

ROC and AUC: http://gim.unmc.edu/dxtests/roc3.htm

# Excercise

![image.png](attachment:image.png)

**What are the values of PPV and NPV?**

**What are the relationship among precision, PPV, and NPV?**

**What are the best values for the metrics?**


# Unspervised Learning Metrics

![image.png](attachment:image.png)


# Precision/Recall Tradeoff Threshold

**Consider we have 6 postives and 8 negatives**


![image.png](attachment:image.png)