# Machine Learning Term, Definitions, and Jargon Part 1

#### Key Concepts:
- **Machine Learning**: A set of statistical and mathematical modeling techniques that improve prediction performance without explicit programming.
  - **Simplified Definition**: Systems that improve their task performance through exposure to data (experience).
  - **Main Paradigms**:
    - **Supervised Learning**: Uses labeled data to learn a mapping function from input to output.
    - **Unsupervised Learning**: No labeled outputs; aims to find structure in the data.
    - **Reinforcement Learning**: Learns through trial and error with feedback on actions taken.

#### Supervised Learning:
- **Basic Principle**: Involves input-output pairs (e.g., ECG as input, heart attack diagnosis as output).
- **Traditional Programming**: Uses manually written rules (functions) to transform input to output.
  - Example: Early ECG machines used rule-based programming with expert input.
- **Machine Learning Approach**: Learns the function from large datasets rather than explicitly coding it.
  - **Advantages**: 
    - Can handle complex, varying inputs (e.g., ECGs from different patients).
    - Produces models that improve with exposure to more data.
    - Avoids rigid, simplistic rule-based systems, making it more adaptable.

#### How Supervised Learning Works:
- **Input-Output Relationship**: We provide both the inputs (e.g., medical images) and their correct outputs (labels) for the model to learn.
  - Example: ECG data paired with diagnoses (labels) allows the model to learn how to interpret new ECGs.
- **Model**: A machine learning entity that represents the learned relationship between input and output.
  - **Model Learning**: Adjusts parameters to closely match inputs with their outputs, referred to as a "function approximator."

#### Labels and Data:
- **Labels**: The correct output provided during training, guiding the model to learn the correct function.
  - Examples of labels: Medical conditions (e.g., diabetes, sepsis), imaging diagnoses (e.g., pneumonia on chest x-rays).
- **Importance of Large Data**: Supervised learning requires a significant amount of labeled data to train effectively.

#### Case Studies in Healthcare:
- **Google/UCSF Collaboration**: Used EHR data to predict mortality, readmission, and diagnosis labels at discharge. The model learned from labeled data and was able to predict these outcomes on new data.
- **Stanford Chest X-Ray AI**: A model trained on tens of thousands of chest x-rays to classify 14 conditions at the level of practicing radiologists.

#### Key Points:
- Machine learning, especially supervised learning, is effective for tasks that involve clear outcomes and large amounts of labeled data.
- **Supervised Learning Success**: Dependent on accurate labels and large, varied datasets.
- Unlike traditional programming, machine learning doesn't rely on hand-coded rules but learns patterns from the data.

This summary covers the foundational understanding and terminology of machine learning in healthcare, focusing on supervised learning, relevant examples, and its advantages over traditional programming.

# Machine Learning Term, Definitions, and Jargon Part 2

This section introduces important machine learning concepts, starting with terminology like *features* (inputs), *labels* (outputs), and *examples* (input-output pairs), and how these examples form a *dataset*. The dataset is typically split into three parts:

1. **Training set**: Used for the model to learn the relationship between features and labels.
2. **Validation set**: Held out during training and used to assess the model's ability to generalize to unseen data.
3. **Test set**: Used only after training to confirm the model’s performance on completely unseen examples.

The training process involves a loop where the model learns a function to predict outputs from inputs. The validation set is used intermittently to adjust hyperparameters and assess generalization performance. The test set provides a final unbiased measure of how well the model generalizes to new data.

*Hyperparameter tuning* involves adjusting high-level parameters that influence the training process. The ultimate goal is to create a model that performs well on unseen data, demonstrating good generalization.

Key takeaways:
- **Generalization** is the model's ability to work well on new, unseen data.
- **Hyperparameters** are adjusted to improve model performance during training.
- **Word embeddings** are a numerical way to represent unstructured data like text, enabling models to use them as features.

This explanation is the foundation for deeper concepts like hyperparameter tuning and model evaluation.

# How Machines Learn Part 1

### **Supervised Learning Overview**
- **Inputs (Features)**: Data points like patient weight, nodule size, or lab values.
- **Outputs (Labels)**: What you're trying to predict, such as BMI (numerical) or nodule malignancy (categorical).
- **Types of Problems**: 
  - **Regression**: Predicts numerical values (e.g., BMI).
  - **Classification**: Predicts categorical labels (e.g., benign vs. malignant nodules).

### **Regression Example: Predicting BMI**
- In a **regression problem**, the label is a numerical value (BMI).
- **Model Training**: You use input features (like weight) and known BMI labels to train the model.
- **Loss Function**: The model aims to minimize the difference (loss) between predicted and actual BMI values by adjusting its parameters (weights and biases).
  
### **Classification Example: Lung Nodule Prediction**
- In a **classification problem**, the label is a category (malignant or benign).
- **Binary Labels**: Here, "0" represents benign, and "1" represents malignant.
- **Sigmoid Function**: Used to transform the model's output into a probability between 0 and 1, giving an interpretable result (e.g., 0.8 means 80% likelihood of malignancy).
- **Logistic Regression**: A classification model that fits the data using a sigmoid function.
  
### **Key Concepts in Classification**
- **Threshold/Decision Boundary**: The probability cutoff that determines the classification outcome. In healthcare, this can be adjusted based on the desired model sensitivity (e.g., 30% probability for screening purposes).
- **Evaluation Metrics**: Selecting the right threshold and evaluation metrics (e.g., sensitivity, false positives) is crucial, especially in medical contexts where pretest probability and clinical utility matter.

This summary sets the foundation for understanding the practical applications of supervised learning in healthcare, from regression problems (predicting continuous outcomes like BMI) to classification tasks (predicting disease status).

# How Machines Learn Part 2

### **Multiple Features in Classification**
- **Binary Classification with Multiple Features**: 
  - **Example**: Predicting pneumonia using two features: white blood cell count (x1) and temperature (x2).
  - **Label (y)**: 0 (normal) or 1 (abnormal).

### **Visualization in Higher Dimensions**
- **3D Visualization**: With two features, you can visualize the data in three dimensions. Points in this space represent patients with pneumonia or without pneumonia.
- **Decision Boundary**: The model must learn a decision boundary that separates the two classes. In 3D, this boundary is a plane, while in 2D (for visualization), it’s a line where the model’s function equals 0.5 (50% chance of either class).

### **Generalization to Higher Dimensions**
- **Decision Boundary**: The concept remains the same regardless of the number of features. The model finds a decision boundary that separates classes based on the features.
- **Model Training**: Involves adjusting parameters (weights) corresponding to each feature to minimize the loss between predictions and actual labels.

### **Handling Complex Data**
- **Structured Data**: Simple models with direct feature multiplication can work well.
- **Unstructured Data**: Higher-dimensional feature vectors (e.g., images, text) often require more complex models. 
  - **Example**: In images, features might represent shapes or lines. In natural language, features might combine word embeddings to create context-aware representations.

### **Neural Networks and Deep Learning**
- **Layered Features**: Neural networks use layers of parameters to create complex feature combinations. This approach is essential for handling unstructured data like images and text.
- **Deep Learning**: Builds on the idea of layered features and is used for complex problems like image classification.

### **Recap of Supervised Learning Concepts**
- **Supervised Learning**: Involves training a model on labeled input-output pairs to learn how to map features to labels.
- **Classification and Regression**: Classification predicts categorical labels (e.g., cancer vs. no cancer), while regression predicts numerical values.
- **Model Training**: Involves learning the parameters that minimize the loss on the training data.

### **Practical Application**
- **Example Application**: For a model trained to classify skin lesions as cancerous or not, features are the pixels of the image, and the output is a binary classification. After training, the model should predict labels for new, unlabeled images based on learned parameters and decision boundaries.

This summary outlines the progression from simple to complex models in machine learning, illustrating how concepts scale from single-feature problems to high-dimensional unstructured data.

# Supervised Machine Learning Approaches: Regression and the "No Free Lunch" Theorem

### **Traditional Machine Learning Algorithms**

1. **Overview of Algorithms**:
   - **Linear Regression**: Models relationships between numerical features and numerical outcomes.
   - **Logistic Regression**: Models categorical data by predicting probabilities for binary outcomes.
   - **Decision Trees**: A hierarchical model that splits data into subsets based on feature values to make predictions.
   - **Support Vector Machines (SVMs)**: Finds the optimal hyperplane that separates data into different classes.
   - **Neural Networks**: Complex models with multiple layers of interconnected nodes for learning complex patterns.

2. **Algorithm Selection**:
   - **No Free Lunch Theorem**: There is no single best algorithm for all problems. The performance of algorithms depends on the dataset's size and structure. Choosing the right algorithm often involves experimentation and comparison.

3. **Regression Methods**:
   - **Polynomial Regression**: Extends linear regression to fit non-linear data by using polynomial features to capture relationships that are not a straight line.
   - **Regularization Techniques**:
     - **Lasso Regression**: Adds a penalty proportional to the absolute value of coefficients to prevent overfitting and encourage sparsity.
     - **Ridge Regression**: Adds a penalty proportional to the square of the coefficients to manage multicollinearity and overfitting.
     - **Elastic Net Regression**: Combines penalties from both Lasso and Ridge regressions to balance between them.

4. **Key Concept**:
   - **Error Measurement**: Regression algorithms are concerned with minimizing the error between predicted and actual values. Different regression techniques adjust how errors are calculated and penalized to better fit the data.

This introduction to traditional algorithms sets the stage for understanding more complex models, including neural networks, and highlights the importance of algorithm selection based on dataset characteristics.

# Other Traditional Supervised Machine Learning Approaches

### **Decision Tree Algorithms**:
1. **Structure**:
   - Decision trees build a model based on a series of decisions related to the features in a dataset.
   - The decisions are represented as branches that fork into a tree structure, and the final predictions are made at the leaves (end nodes).

2. **Training**:
   - Decision trees are fast to train and are especially useful for simple classification tasks.
   - An example task: predicting pneumonia based on features like temperature and white blood cell count.
   - The algorithm learns which features are most important by creating logical branching points (nodes) that maximize the discrimination between classes (e.g., pneumonia vs. no pneumonia).

3. **Advantages**:
   - Easy to understand and interpret.
   - Simple visual structure where the most important features are clearly highlighted.
   - Can work well with small datasets to find critical features.

4. **Disadvantages**:
   - Accuracy may suffer, especially when the model is too simple, leading to overfitting or underfitting.

---

### **Random Forests**:
1. **Ensemble Learning**:
   - **Random Forests** improve on individual decision trees by building an ensemble of many trees, thus increasing the overall predictive accuracy.
   - Each tree is trained on a randomly sampled subset of features and data points, generating diversity in the decision trees.

2. **Combination**:
   - After training, the trees are combined, either by averaging (for regression tasks) or majority voting (for classification tasks), to create a more robust final model.
   - The ensemble reduces the sensitivity of the model to small changes in the data, which often affects individual trees.

3. **Wisdom of the Crowd**:
   - The idea behind random forests can be likened to **diversifying investments in finance**. Just as a diversified investment portfolio (gold, stocks, bonds) is more stable, a random forest, by combining many trees, yields a more accurate and stable prediction than a single decision tree.

4. **Strengths**:
   - **Accuracy**: More accurate than individual decision trees.
   - **Resilience**: Handles variance well, as bad classifiers (trees) are balanced out by good ones within the forest.

---

This approach highlights the power of ensemble methods, where combining many simple models results in better performance, just like having a well-diversified investment strategy.

# Support Vector Machine (SVM)

1. **Overview**:
   - SVM is a supervised learning algorithm primarily used for **classification problems**, much like logistic regression.
   - Its key objective is to find the **best separating line (or hyperplane)** in a dataset that divides classes such as "pneumonia: yes" vs. "pneumonia: no" based on features like white blood cell count and temperature.

2. **Key Concept**:
   - SVM focuses on the **data points that are closest to the decision boundary**, called **support vectors**.
   - The goal of SVM is to create a hyperplane that maximizes the **margin** between the two classes, making the separation more balanced and less influenced by outliers.

3. **Difference from Logistic Regression**:
   - While logistic regression tries to fit the best line based on all data points (treating all points equally), SVM emphasizes those points closest to the boundary, creating a more **robust model** in the presence of outliers.
   - SVM's approach can handle situations where there are **outliers** or anomalies in the dataset, by **ignoring** their influence to a certain degree.

4. **Generalization**:
   - SVM is advantageous when you need a model that **generalizes better** on unseen data by not overfitting to outliers, as they can skew the logistic regression model.
   - For example, in medical data, outliers occur frequently, and SVM helps to maintain model accuracy by focusing on the **core relationships** in the data.

5. **Multidimensional Data**:
   - SVM is particularly well-suited for datasets with **many features** (high-dimensional data), as it can efficiently find the best separating hyperplane in these complex spaces.

### Practical Example in Medicine:
- In a **medical dataset**, where most patients follow typical patterns, but a few extreme outliers exist, SVM provides a **more generalized model** that focuses on the core data points and is less affected by anomalies. This makes SVM a strong tool for handling real-world healthcare data that often includes noisy or irregular entries.

### Key Takeaways:
- **SVM** creates a balanced model by focusing on **support vectors** and maximizing the margin between classes.
- It is ideal for **high-dimensional datasets** and offers **robustness** to **outliers**.


# Unsupervised Machine Learning

1. **Unsupervised Learning Overview**:
   - Unlike **supervised learning**, where the model is trained using labeled data, **unsupervised learning** deals with datasets that lack labels.
   - The primary goal is to **discover patterns** and structures within the data without predefined categories.

2. **Clustering**:
   - **Clustering** is a common unsupervised learning task where the objective is to group similar items together.
   - It helps to **define categories** based on inherent similarities rather than predefined labels.
   - Common clustering algorithms include **K-means** and **hierarchical clustering**.

3. **Medical Applications**:
   - Unsupervised learning, particularly clustering, has been used in healthcare to find novel patterns in large, unlabeled datasets.
   - For example, in heart failure research, clustering of electronic health records revealed distinct patient groups with different clinical characteristics and outcomes. This approach led to new classifications and risk stratification strategies.

4. **Evaluating Unsupervised Learning**:
   - One of the challenges with unsupervised learning is assessing the **meaningfulness** of the identified clusters.
   - Effective clustering should produce groups that are **reproducible** and actionable. For instance, if a new patient’s cluster assignment predicts different clinical outcomes or treatment needs, this indicates that the clustering has practical significance.

5. **Contrast with Supervised Learning**:
   - **Supervised learning** relies on labeled data and aims to predict outcomes based on known labels.
   - **Unsupervised learning** explores unlabeled data to uncover hidden structures or patterns, which can be useful for exploratory analysis and discovering new insights.

6. **Transition to Neural Networks**:
   - With a foundation in supervised and unsupervised learning, we are now set to explore more advanced techniques, such as **neural networks** and **deep learning**, which build on these principles.

### Key Takeaways:
- **Unsupervised learning** helps find patterns in **unlabeled data**, often using techniques like **clustering**.
- It’s valuable in fields like healthcare for discovering new categories or risk factors from large datasets.
- Evaluating the effectiveness of clustering involves checking if the results are reproducible and actionable.
- Understanding these concepts sets the stage for diving into more complex models like neural networks and deep learning.