Classification is a fundamental task in machine learning where the goal is to assign inputs into predefined categories or classes. There are several types of classification methods, each suitable for different types of data and problems. Here’s an overview of the primary types of classification methods:

### 1. **Binary Classification(as per course)**
Binary classification involves categorizing data into one of two classes. It is the simplest form of classification and is used in scenarios where there are only two possible outcomes.

**Examples:**
- Spam detection (spam or not spam)
- Disease diagnosis (disease or no disease)

**Common Algorithms:**
- Logistic Regression
- Support Vector Machine (SVM)
- Decision Trees
- Random Forest
- Neural Networks

### 2. **Multiclass Classification(as per course)**
Multiclass classification deals with problems where there are more than two classes. Each instance is assigned to one of three or more classes.

**Examples:**
- Handwriting recognition (digits 0-9)
- Animal species classification (cat, dog, mouse, etc.)

**Common Algorithms:**
- Softmax Regression (extension of Logistic Regression)
- Decision Trees
- Random Forest
- Gradient Boosting Machines (GBM)
- Neural Networks (with a softmax layer)

### 3. **Multilabel Classification**
In multilabel classification, each instance can be assigned to multiple classes simultaneously. This is useful in situations where the presence of multiple labels is possible.

**Examples:**
- Text categorization (a document may belong to multiple topics)
- Image tagging (an image may contain multiple objects like "cat" and "dog")

**Common Algorithms:**
- Binary Relevance (treating each label as a separate binary classification problem)
- Classifier Chains
- Adapted algorithms like Random Forest and SVM for multilabel settings
- Neural Networks

### 4. **Imbalanced Classification**
Imbalanced classification deals with datasets where the classes are not represented equally. One class may be significantly underrepresented, which can pose challenges for standard classification algorithms.

**Examples:**
- Fraud detection (fraudulent transactions are much fewer than legitimate ones)
- Rare disease detection

**Common Techniques:**
- Resampling (oversampling the minority class or undersampling the majority class)
- SMOTE (Synthetic Minority Over-sampling Technique)
- Adjusting class weights in algorithms
- Anomaly detection techniques

### 5. **Ordinal Classification**
Ordinal classification is used when the target variable is ordinal, meaning the classes have a natural order but the intervals between them are not necessarily equal.

**Examples:**
- Customer satisfaction ratings (very dissatisfied, dissatisfied, neutral, satisfied, very satisfied)
- Education levels (high school, bachelor’s, master’s, PhD)

**Common Algorithms:**
- Ordinal Logistic Regression
- Decision Trees and Ensemble Methods adapted for ordinal data

### 6. **Hierarchical Classification**
Hierarchical classification deals with classifying data into a hierarchy of classes. This is useful when the classes are organized in a tree-like structure.

**Examples:**
- Document classification in a hierarchical taxonomy
- Animal classification (species, genus, family, order, etc.)

**Common Techniques:**
- Recursive classification (classifying at each level of the hierarchy)
- Adapted algorithms that take the hierarchy into account

### Choosing the Right Classification Method
Selecting the appropriate classification method depends on various factors:
- **Nature of the Problem**: The type of classification problem (binary, multiclass, multilabel) dictates the suitable algorithms.
- **Data Characteristics**: The amount and quality of the data, the presence of imbalanced classes, and whether the data is ordinal or hierarchical.
- **Performance Requirements**: Accuracy, interpretability, computational efficiency, and scalability needs.

### Summary
Classification is a diverse field with several specialized types depending on the nature of the problem and the data. Understanding the characteristics and requirements of your specific problem is crucial for selecting the most appropriate classification method and achieving the best performance.

    Classification Algorithms

    Logistic Regression

Logistic Regression is a statistical method used for binary classification that models the probability of a binary outcome based on one or more predictor variables. It is a type of regression analysis where the dependent variable is categorical, typically taking values of 0 or 1.

Why we can not use regression line in logistic regression:

Best fit line will change due to outliers and classification prediction will nopt be accurate enough. 

<img src="Logistics Regression.jpg" width = "550">

<img src ="Logistics Regression-formula.png">

<img src="sigmoid-formula.png" width="250">

using the above function the -infinity to infinity will be mapped to 0 to 1 respectively and this cna be proved by keeping x as -infinity and +infinity and solving the above problem

<h4><b><strong>1. </strong></b><span>When </span><i><em class="GFGEditorTheme__textItalic">z</em></i><span> is large and positive, </span><img src="https://quicklatex.com/cache3/0d/ql_88de3172b89db5c4235e998071f33d0d_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt="e^z" title="Rendered by QuickLaTeX.com" height="19" width="21" style="vertical-align: 0px;"><span>  becomes very small, and</span><img src="https://quicklatex.com/cache3/b9/ql_c83c24e540ce5f5be4f3c9f1ed182ab9_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt="\frac{1}{1+e^{-z}}  " title="Rendered by QuickLaTeX.com" height="35" width="60" style="vertical-align: 21px;"><span> approaches 1.</span></h4>

<blockquote><p dir="ltr"><span>when z=5, </span></p>


<p><img src="https://quicklatex.com/cache3/64/ql_7720d9f27e3d48ed712fa305b11a9464_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt=" \begin{aligned}  \frac{1}{1+e^{-5}}&amp;= \frac{1}{{1+\frac{1}{e^{5}}}} \\ &amp;=\frac{1}{(\frac{e^{5}+1}{e^{5}})} \\ &amp;=\frac{e^{5}}{e^{5}+1} \\&amp; = 0.9933071490757153 \end{aligned}" title="Rendered by QuickLaTeX.com" height="244" width="352" style="vertical-align: 0px;"></p>

<p dir="ltr"><span>  Which is close to 1 indicating that the sigmoid function, when applied to a large positive value like 5, outputs a probability close to 1,  suggesting a high confidence in the positive class.</span></p>

</blockquote>

<p dir="ltr"><b><strong>2. When z is large and negative, </strong></b><img src="https://quicklatex.com/cache3/45/ql_abe48db81ab1d9b56cb6fccfc88a7645_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt="e^z  " title="Rendered by QuickLaTeX.com" height="18" width="21" style="vertical-align: 33px;"><b><strong>  dominates, and </strong></b><img src="https://quicklatex.com/cache3/b9/ql_c83c24e540ce5f5be4f3c9f1ed182ab9_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt="\frac{1}{1+e^{-z}}  " title="Rendered by QuickLaTeX.com" height="35" width="60" style="vertical-align: 21px;"><b><strong> approaches 0.</strong></b></p>


<blockquote><p dir="ltr"><span>when z= -5, the expression becomes,</span></p>


<p><img src="https://quicklatex.com/cache3/78/ql_1b392b4026753dd8afd3431900d40f78_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt=" \begin{aligned}    \frac{1}{1+e^{-(-5)}}&amp;= \frac{1}{1+e^{5}}   \\&amp; = 0.0066928509242848554   \end{aligned}" title="Rendered by QuickLaTeX.com" height="89" width="425" style="vertical-align: 0px;"></p>

<p dir="ltr"><span>Which is close to 0 indicating that sigmoid function, when applied to a large negative value like -5, outputs a probability close to 0. In other words, when z is a large negative number, the exponential term </span><img src="https://quicklatex.com/cache3/45/ql_abe48db81ab1d9b56cb6fccfc88a7645_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt="e^z  " title="Rendered by QuickLaTeX.com" height="18" width="21" style="vertical-align: 33px;"><span> dominates the denominator, causing the sigmoid function to approach 0.</span></p><div style="text-align:center;margin:20px 0;max-height:280px" id="GFG_AD_gfg_outstream_incontent-2" data-google-query-id="CJ34j5rMloYDFdwegwMdGSgKEA"><div id="google_ads_iframe_/27823234/gfg_outstream_incontent_2_0__container__" style="border: 0pt none;"></div></div>

</blockquote>

While dealing with an optimization problem with Non-convex graph we face the problem of ***getting stuck at the local minima instead of the global minima***. The presence of multiple local minima can make it challenging to find the optimal solution for a machine learning model. If the model gets trapped in a local minimum, it will not achieve the best possible performance. That’s where comes Log Loss or Cross Entropy Function most important term in the case of logistic regression.

    Log Loss for Logistic regression
Log loss is a classification evaluation metric that is used to compare different models which we build during the process of model development. It is considered one of the efficient metrics for evaluation purposes while dealing with the soft probabilities predicted by the model.

The log of corrected probabilities, in logistic regression, is obtained by taking the natural logarithm (base e) of the predicted probabilities.

<p><img src="https://quicklatex.com/cache3/a0/ql_a91bc2f9bf3fb99be03da27affe287a0_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt="\begin{aligned} \text{log loss} &amp;= \ln{\sigma} \\ &amp;= \ln{\frac{1}{1+e^{-z}}} \end{aligned} " title="Rendered by QuickLaTeX.com" height="93" width="234" style="vertical-align: 0px;"></p>

<blockquote><p dir="ltr"><span>Let’s find the log odds for the examples, z=5 </span></p><div style="text-align:center;margin:20px 0;max-height:280px" id="GFG_AD_gfg_outstream_incontent-3" data-google-query-id="CJ74j5rMloYDFdwegwMdGSgKEA"><div id="google_ads_iframe_/27823234/gfg_outstream_incontent_5_0__container__" style="border: 0pt none; width: 300px; height: 0px;"></div></div>

<p dir="ltr"><img src="https://quicklatex.com/cache3/18/ql_d7ae18d3d98b17e88e1a1280bd07e118_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt=" \begin{aligned}    \ln{(z=5)}&amp;=\ln{\frac{1}{1+e^{-5}}} \\ &amp;= \ln{\left (\frac{1}{{1+\frac{1}{e^{5}}}}  \right )}  \\ &amp;=\ln{\left (\frac{e^{5}}{1+e^{5}}  \right )}  \\ &amp;=\ln{e^5}-\ln{(1+e^5)}  \\ &amp;= 5 - \ln{(1+2.78^5)} \\ &amp;= 5 - \ln{(1+2.71828^5)} \\ &amp;= 5-5.006712007735388 \\&amp;\approx  -0.00671200  \end{aligned}" title="Rendered by QuickLaTeX.com" height="417" width="405" style="vertical-align: 0px;"></p>

<p dir="ltr"><span>and for z=-5,</span></p>

<p dir="ltr"><img src="https://quicklatex.com/cache3/0d/ql_73fce0a3861ba27e3e10445f5ac02b0d_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt=" \begin{aligned}    \ln{(z=-5)}&amp;=\ln{\frac{1}{1+e^{-(-5)}}} \\ &amp;= \ln{\left (\frac{1}{1+e^{5}}  \right )}  \\ &amp;=\ln{1}-\ln{(1+e^5)}  \\ &amp;= 0 - \ln{(1+2.78^5)} \\ &amp;= - \ln{(1+2.71828^5)} \\&amp;\approx  -5.00671200  \end{aligned}" title="Rendered by QuickLaTeX.com" height="288" width="377" style="vertical-align: 0px;"></p>

<p dir="ltr"><span>These log values are negative. In order to maintain the common convention that lower loss scores are better, we take the negative average of these values to deal with the negative sign.</span></p>

</blockquote>

Hence, The Log Loss can be summarized with the following formula:

<p><img src="https://quicklatex.com/cache3/14/ql_2f92860e0315785753a33d562227d014_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt="J = -\sum_{i=1}^{m}y_i\log \left ( h_\theta\left ( x_i \right ) \right ) + \left ( 1-y_i \right )\log \left (1- h_\theta\left ( x_i \right ) \right )" title="Rendered by QuickLaTeX.com" height="29" width="602" style="vertical-align: -8px;"></p>

<ul><li value="1"><span>m is the number of training examples</span></li><li value="2"><img src="https://www.geeksforgeeks.org/wp-content/ql-cache/quicklatex.com-aa4072aa7d68e86ba6302fb6c7240c2d_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt="y_i" title="Rendered by QuickLaTeX.com" height="17" width="20" style="vertical-align: -5px;"><span> is the true class label for the i-th example (either 0 or 1).</span></li><li value="3"><img src="https://quicklatex.com/cache3/d8/ql_c7cef34134a78f50d8b3e81b45c334d8_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt="h_\theta(x_i)" title="Rendered by QuickLaTeX.com" height="27" width="67" style="vertical-align: -7px;"><span>  is the predicted probability for the i-th example, as calculated by the logistic regression model.</span></li><li value="4"><img src="https://www.geeksforgeeks.org/wp-content/ql-cache/quicklatex.com-a372b7ef1ffaec3b4ad80e0141550990_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt="\theta" title="Rendered by QuickLaTeX.com" height="19" width="11" style="vertical-align: 0px;"><span>  is the model parameters</span></li></ul>


<p dir="ltr" style="text-align: start"><span>The first term in the sum represents the cross-entropy for the positive class (</span><img src="https://quicklatex.com/cache3/28/ql_92120be29655b76c7d55619fcc0e8b28_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt="y_i = 1     " title="Rendered by QuickLaTeX.com" height="23" width="67" style="vertical-align: 28px;"><span>), and the second term represents the cross-entropy for the negative class (</span><img src="https://quicklatex.com/cache3/cb/ql_1a0f0a4cfa88029b507e8f65a0a56ccb_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt="y_i = 0     " title="Rendered by QuickLaTeX.com" height="23" width="69" style="vertical-align: 28px;"><span>). The goal of logistic regression is to minimize the cost function by adjusting the model parameters</span><img src="https://quicklatex.com/cache3/d5/ql_1a5c5aaf819cc6a606fbf45c34a3aed5_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt="\theta     " title="Rendered by QuickLaTeX.com" height="18" width="11" style="vertical-align: 33px;"></p>

In Summary:

<ul><li value="1"><span>Calculate predicted probabilities using the sigmoid function.</span></li><li value="2"><span>Apply the natural logarithm to the corrected probabilities.</span></li><li value="3"><span>Sum up and average the log values, then negate the result to get the Log Loss.</span></li></ul>

Cost function for Logistic Regression

<p><img src="https://quicklatex.com/cache3/de/ql_28005cb1997ce27e1d5ae918fef6bfde_l3.svg" class="ql-img-inline-formula quicklatex-auto-format" alt="Cost(h_{\theta}(x),y) = \left\{\begin{matrix} -log(h_{\theta}(x)) &amp; if&amp;y=1 \\ -log(1-h_{\theta}(x))&amp; if&amp; y = 0 \end{matrix}\right. " title="Rendered by QuickLaTeX.com" height="63" width="536" style="vertical-align: -25px;"></p>

<ul><li value="1"><b><strong>Case 1: </strong></b><span>If y = 1, that is the true label of the class is 1. Cost = 0 if the predicted value of the label is 1 as well. But as h</span><sub><span>θ</span></sub><span>(x) deviates from 1 and approaches 0 cost function increases exponentially and tends to infinity which can be appreciated from the below graph as well.&nbsp;</span></li><li value="2"><b><strong>Case 2: </strong></b><span>If y = 0, that is the true label of the class is 0. Cost = 0 if the predicted value of the label is 0 as well. But as h</span><sub><span>θ</span></sub><span>(x) deviates from 0 and approaches 1 cost function increases exponentially and tends to infinity which can be appreciated from the below graph as well.</span></li></ul>

<p dir="ltr"><img alt="download-(4)" height="400" src="https://media.geeksforgeeks.org/wp-content/uploads/20231205010859/download-(4).png" width="650"></p>

<p dir="ltr"><span>With the modification of the cost function, we have achieved a loss function that penalizes the model weights more and more as the predicted value of the label deviates more and more from the actual label.</span></p>

<p dir="ltr"><span>The choice of cost function, log loss or cross-entropy, is significant for logistic regression. It quantifies the disparity between predicted probabilities and actual outcomes, providing a measure of how well the model aligns with the ground truth.</span></p>

        Few more points are there whcih i need to see the recordings from 12:20 onwards??