# Machine Learning

### 1. **Supervised vs. Unsupervised Learning**

* **Supervised Learning**: Learning from labeled data to make predictions (e.g., classification, regression)
* **Unsupervised Learning**: Learning from unlabeled data to find patterns or groupings (e.g., clustering, dimensionality reduction)
* **Semi-supervised and Self-supervised Learning**: Combining labeled and unlabeled data to improve model performance
* **Reinforcement Learning**: Learning by interacting with the environment and receiving rewards or penalties

### 2. **Data Preprocessing and Cleaning**

* **Handling Missing Data**: Imputation strategies (mean, median, interpolation, etc.)
* **Data Scaling and Normalization**: Standardization (Z-score) and Min-Max scaling
* **Data Encoding**:

  * One-Hot Encoding for categorical data
  * Label Encoding
* **Feature Engineering**: Creating new features from raw data (e.g., polynomial features, interaction terms)
* **Outliers Detection and Treatment**: Identifying and dealing with extreme values that may distort the model
* **Dealing with Imbalanced Datasets**: Resampling techniques (oversampling, undersampling, SMOTE)

### 3. **Data Splitting and Evaluation**

* **Training, Validation, and Test Split**: How to partition data for model evaluation
* **Cross-Validation**: K-fold cross-validation and leave-one-out cross-validation
* **Evaluation Metrics**:

  * Regression metrics (Mean Absolute Error, Mean Squared Error, R-squared)
  * Classification metrics (Accuracy, Precision, Recall, F1-score, ROC Curve, AUC)
  * Confusion Matrix (True Positives, True Negatives, False Positives, False Negatives)

### 4. **Overfitting and Underfitting**

* **Bias-Variance Trade-off**: Understanding the relationship between model complexity and performance
* **Regularization**: L1 (Lasso) and L2 (Ridge) regularization to prevent overfitting
* **Early Stopping**: Stopping the training process early to prevent the model from learning noise
* **Cross-Validation for Tuning**: Using cross-validation to tune hyperparameters and prevent overfitting

### 5. **Bias and Fairness in Machine Learning**

* **Bias in Data**: Understanding how biased data can lead to biased models
* **Fairness Metrics**: Equal opportunity, demographic parity, and equalized odds
* **Ethical Considerations**: Addressing issues like discrimination, privacy concerns, and transparency

### 6. **Feature Selection and Dimensionality Reduction**

* **Feature Selection**: Choosing the most relevant features for a model (filter, wrapper, and embedded methods)
* **Principal Component Analysis (PCA)**: Reducing the dimensionality of data while preserving variance
* **t-SNE and UMAP**: Non-linear dimensionality reduction for visualization

### 7. **Model Selection and Hyperparameter Tuning**

* **Grid Search**: Exhaustive search over a predefined set of hyperparameters
* **Random Search**: Randomized search over hyperparameter space
* **Bayesian Optimization**: Using probabilistic models to select the most promising hyperparameters
* **Automated Machine Learning (AutoML)**: Tools like Auto-sklearn and TPOT that automate model selection and hyperparameter tuning

### 8. **Ensemble Methods**

* **Bagging**: Combining multiple models to reduce variance (e.g., Random Forests)
* **Boosting**: Sequentially building models to correct errors of previous models (e.g., Gradient Boosting, XGBoost, AdaBoost)
* **Stacking**: Combining different models (often of different types) for improved performance

### 9. **Basic Understanding of ML Frameworks**

* Popular ML libraries and frameworks such as:

  * **Scikit-learn** (for classical ML algorithms)
  * **TensorFlow/PyTorch** (for deep learning)
  * **XGBoost/LightGBM** (for gradient boosting)


