### Logistic Regression
- A statistical method for analyzing a dataset in which there are **one or more independent** variables that determine an outcome.
- The outcome is typically binary (0 or 1, true or false, yes or no)
- **Linear regression** outputs continuous values
- **Logistic regression** outputs probabilities that map to binary outcomes.

#### Working Mechanism
1. **Sigmoid Function:** To map the linear combination of the input features (attributes or columns) to a probability (between 0 and 1)

    **Formula**:
   
   $$\sigma(z) = \frac{1}{1 + e^{-z}}$$

   where $z$ is the linear combination of input features $\beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n$

3. **Probability and Decision Threshold:** The output of the sigmoid function is interpreted as the probability of the dependent variable being 1. A threshold (commonly 0.5) is used to decide the final classification.
   - If $\sigma(z) >= 0.5$, predict 1
   - If $\sigma(z) < 0.5$, predict 0

4. **Cost Function and Optimization:** Logistic regression uses the cross-entropy loss (or log loss) as the cost function. **Gradient descent** or other optimization algorithms are used to minimize the cost function. 

##### Example:
Assume we have a dataset of students' scores and whether they passed an exam (0 or 1). We want to predict whether a student will pass based on their score.

In [42]:
from sklearn.linear_model import LogisticRegression
import numpy as np

In [43]:
# Dummy data
X = np.array([[30], [50], [70], [90]]) # Scores
y = np.array([0, 0, 1, 1]) # Pass or Fail

In [44]:
# Train the model
model = LogisticRegression()
model.fit(X, y)

In [45]:
# Predict probability of passing for a new score
score = np.array([[85]])
prob = model.predict_proba(score)

In [46]:
prob

array([[5.61474608e-05, 9.99943853e-01]])

### K-Nearest Neighbors (K-NN)
- It is a non-parametric, instance-based learning algorithm used for classification and regression.
- The output is determined based on the K nearest neighbors of the input.

#### Working Mechanism
1. **Distance Metric:** K-NN uses a distance metric (e.g., Euclidean distance) to measure the similarity between data points.

   **Formula**: 

   $$
   d(x, y) = \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2}
   $$

2. **Choosing K**: The parameter $K$ determines the number of neighbors to consider. The choice of $K$ affects the bias-variance tradeoff:

   - Small $K$: Low bias, high variance
   - Large $K$: High bias, low variance
  
3. **Classification:**

   - Identify the $K$ nearest neighbors.
   - Use majority voting to assign the class label. Each neighbor votes for its class, and the class with the most votes is assigned to the input point.

4. **Regression:**

   - Identify the $K$ nearest neighbors.
   - The output is the average of the values of these $K$ neighbors.
   


##### Example:
Assume we have a dataset of points labeled as red (0) or blue (1), and we want to classify a new point.

In [16]:
# Import modules
from sklearn.neighbors import KNeighborsClassifier

In [30]:
# Example data
X = np.array([[1, 2], [2, 3], [3, 4], [6, 7], [7, 9]]) # Points
y = np.array([0, 0, 0, 1, 1])

In [18]:
# Train the model
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X, y)

In [27]:
# Predict the class for a new point
new_point1 = np.array([[9, 10]])
new_point2 = np.array([[5, 5]])
p1 = model.predict(new_point1)
p2 = model.predict(new_point2)

In [29]:
print(f"First prediction: {p1}") # Blue
print(f"Second Prediction: {p2}") # Red

First prediction: [1]
Second Prediction: [0]


### Decision Trees
- It is a non-parametric supervised learning method used for classification and regression.
- They work by splitting the data into subsets based on the value of input features, creating a tree-like model of decisions.

**Note:** Non-parametric statistics refers to a branch of statistics that does not rely on assumptions about the underlying distribution of the data.

#### Working Mechanism
1. **Splitting Criteria**:

   - **Classification Trees**:
         Use criteria like Gini impurity, entropy (informaiton gain), or misclassification error to decide the best split.

    $$
        Gini(D) = 1 - \sum_{i=1}^{C} p_i^2
    $$

    where $( p_i )$ is the probability of a randomly chosen element being classified into class $( i )$, and $( C )$ is the total number of classes.

   - **Regression Trees**:
         Use criteria like mean squared error (MSE) to decide the best split.

    $$
        MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
    $$

   where $( y_i )$ is the actual value, $( \hat{y}_i ) $ is the predicted value, and $( n )$ is the number of data points.

2. **Tree Building**:
   - Start with the entire dataset at the root.
   - Choose the best feature and threshold to split the data based on the chosen criteria.
   - Recursively split the subsets until a stopping condition is met (e.g., maximum depth, minimu samples per leaf).
     

3. **Pruning**:

   - Pruning is used to reduce the size of the tree and prevent overfitting by removing sections of the tree that provide little power in predicting target variables.

##### Example:
Assume we have a dataset of patients with their age and whether they have a certain disease (0 or 1), and we want to predict the disease status.

In [32]:
# Import modules
from sklearn.tree import DecisionTreeClassifier

In [33]:
# Example data
X = np.array([[25, 0], [30, 0], [45, 1], [50, 1], [65, 1]]) # Age and feature
y = np.array([0, 0, 1, 1, 1]) # Disease status

In [34]:
# Train the model
model = DecisionTreeClassifier()
model.fit(X, y)

In [37]:
new_data1 = np.array([[40, 0]])
new_data2 = np.array([[52, 1]])
p1 = model.predict(new_data1)
p2 = model.predict(new_data2)

In [38]:
print(f"First prediction: {p1}") # Non-diseased
print(f"Second prediction: {p2}") # Diseased

First prediction: [0]
Second prediction: [1]
