This repository contains explanations and implementations of various machine learning algorithms. Below are the details of each algorithm, including theoretical backgrounds and coding examples.
Simple Linear Regression is a method to predict a dependent variable (Y) based on the value of an independent variable (X). The relationship is modeled through a linear equation: [ Y = a + bX ] where:
- ( a ) is the intercept
- ( b ) is the slope
Multiple Linear Regression involves predicting a dependent variable based on multiple independent variables. The linear relationship is expressed as: [ Y = a + b_1X_1 + b_2X_2 + \ldots + b_nX_n ] where:
- ( a ) is the intercept
- ( b_1, b_2, \ldots, b_n ) are the coefficients
Simple Logistic Regression is used for binary classification problems. The logistic model predicts the probability of the dependent variable being in one of the two categories: [ P(Y=1) = \frac{1}{1 + e^{-(a + bX)}} ]
Multiple Logistic Regression extends simple logistic regression to handle multiple independent variables. The model is: [ P(Y=1) = \frac{1}{1 + e^{-(a + b_1X_1 + b_2X_2 + \ldots + b_nX_n)}} ]
Ensemble learning combines multiple models to improve the overall performance. Two popular ensemble techniques are:
Random Forest is an ensemble of decision trees, generally trained with the "bagging" method. It improves accuracy and controls overfitting.
XGBoost (Extreme Gradient Boosting) is a powerful boosting technique that builds models in a stage-wise fashion and optimizes the model using gradient boosting framework.
A Decision Tree is a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.
K-means clustering partitions the dataset into K clusters, where each data point belongs to the cluster with the nearest mean. It minimizes intra-cluster variance.