<h1>Supervised Learning</h1>
<li><strong>Supervised learning</strong> algorithms are trained using labeled examples, such as an input where the desired output is known</li>
<li>For example, a segment of text could have a category label, such as</li>
<li>> <strong>Spam</strong> vs <strong>Legitimate</strong> Email</li>
<li>> <strong>Positive</strong> vs <strong>Negative</strong> Movie Review</li>
<li>The network receives a set of inputs along with the corresponding correct outputs, and the algorithm learns by comparing it's actual output with correct outputs to find errors</li>
<li>It them modifies the model accordingly</li>
<li>Supervised learning is commonly used in applications where historical data predicts likely future events</li>
<li>1. Get your data! Customers, Sensors, etc</li>
<li>2. Clean and format your data (using Pandas)</li>
<li>3. Test Data | Training Data (Model Training & Building)</li>
<li>4. Model Testing (Can adjust model parameters returning to Model Training & Building)</li>
<li>5. Model Deployment</li>
<li>What we just showd is a simplified approach to supervised learning, it contains an issue!</li>
<li>Is it fair to use our single split of the data to evaluate our models preformance?</li>
<li>After all, we were given the chance to update the model parameters again and again</li>
<li>To fix this issue, data is often split into 3 sets</li>
<li>> Training Data: Used to train model parameters</li>
<li>> Validation Data: Used to determine what model hyperparameters to adjust</li>
<li>> Test Data: Used to get some final performance metric</li>
<li>This means after we see the results on the <strong>final test set</strong> we don't get to go back and adjust any model parameters!</li>
<li>This final measure is what we label the true performance of the model to be</li>
<li>In this course, in general we will simplify our data by using a simple <strong>train/test split</strong></li>
<li>We will simply train and then evaluate on a test set (leaving the option to students to go back and adjust parameters)</li>
<li>After going through the course, you will be able to easily perform another split to get <strong>3 data sets</strong> if you desire</li>

<h2>Evaluating Performance (Classification)</h2>
<li>We just learned that after our machine learning process is complete, we will use performance metrics to evaluate how our model did</li>
<li>Let's discuss classification metrics in more detail!</li>
<li>The key classification metrics we need to understand:</li>
<li>> Accuracy</li>
<li>> Recall</li>
<li>> Precision</li>
<li>> F1-Score</li>
<li>But first, we should understand the reasoning behind these metrics and how they will actually work in the real world</li>
<li>Typically in any classification task your model can only achieve two results:</li>
<li>> Either your model was correct in its prediction</li>
<li>> Or your model was <strong>incorrect</strong> in its prediction</li>
<li>Fortunately incorrect vs correct expands yo situations where you have multiple classes</li>
<li>For the purposes of explaining the metrics, let's imagine a <strong>binary classification</strong> situation, where we only have two available classes</li>
<li>In our example, we will attempt to predict if an image is a dog or a cat</li>
<li>Since this is supervised learning, we will first <strong>fit/train</strong> a model on <strong>training data</strong>, then <strong>test</strong> the model on <strong>testing data</strong></li>
<li>Once we have the model's predictions from the <strong>X_test</strong> data, we compare it to the <strong>true y values</strong> (the correct labels)</li>
<li>We repeat this process for all the images in our X test data</li>
<li>At the end we will have a count of correct matches and a count of incorrect matches</li>
<li>The key realization we need to make, is that <strong>in the real world, not all incorrect or correct matches hold equal value!</strong></li>
<li>Also in the real world, a single metric won't tell the complete story!</li>
<li>To undestand all of this, let's bring back the 4 metrics we mentioned and see how they are calculated</li>
<li>We could organize our predicted values compared to the real values in a confusion matrix</li>
<li>> Accuracy: Accuracy in classification problems is the <strong>number of correct predictions</strong> made by the model divided by the <strong>total number of predictions</strong></li>
<li>>> For example, if the X_test set was 100 images and our model <strong>correctly</strong> predicted 80 images, then we have <strong>80/100; 0.8 or 80% accuracy</strong></li>
<li>>> Accuracy is useful when target classes are well balanced</li>
<li>>> In our example, we would have roughly the same amount of cat images as we have dog images</li>
<li>>> Accuracy is <strong>not</strong> a good choice with <strong>unbalanced</strong> classes!</li>
<li>>> Imagine we had 99 images of dogs and 1 image of a cat; if our model was simply a line that always predicted <strong>dog</strong> we would get 99% accuracy!</li>
<li>>> Imagine we had 99 images of dogs and 1 image of cat. If our model was simply a line that always predicted <strong>dog</strong> we would get 99% accuracy!. In this situacion we'll want to understand <strong>recall and precision</strong></li>
<li>> Recall: Ability of a model to find all the relevant cases within a dataset. The precise definition of recal is the number of true positives <strong>divided by</strong> the number of true positivites plus the number of false negatives</li>
<li>>> Ability of a classification model to identify only the relevant data points. Precision is defined as the number of true positives divided by the number of true positives plus the number of false positives</li>
<li>> Recall and Precision</li>
<li>>> Often you hace a trade-off between Recall and Precision. While recall expresses the ability to find all relevant instances in a dataset, precision expresses the proportion of the data points our model says was relevant actually were relevant</li>
<li>> F1-Score: In cases where we want to find an optimal blend of precision and recall we can combine the two metrics using what is called the F1-score</li>
<li>>> The f1 score is the harmonic mean of precision and recall taking both metrics into account in the following equation:</li>
<li>>> F1 = 2 * ((precision * recall) / (precision + recall))</li>
<li>>> We use the harmonic mean instead of a simple average because it punishes extreme values. A classifier with a precision of 1.0 and a recall of 0.0 has a simple average of 0.5 but an F1 score of 0</li>
<li>We can also view all correctly classified versus incorrectly classified images in the form of a confusion matrix</li>

<table>
	<tbody>
		<tr>
			<td>&nbsp;</td>
			<td>&nbsp;</td>
			<td colspan="2">predicted condition</td>
		</tr>
		<tr>
			<td>&nbsp;</td>
			<td>total population</td>
			<td>prediction positive</td>
			<td>prediction negative</td>
		</tr>
		<tr>
			<td rowspan="2">true condition</td>
			<td>condition positive</td>
			<td>True Positive (TP)</td>
			<td>False Negative (FN) (type || error)</td>
		</tr>
		<tr>
			<td>condition negative</td>
			<td>False Positive (FP) (Type | error)</td>
			<td>True Negative (TN)</td>
		<
/tr>The main point to remember with the confusion matrix and the various calculated metrics is that they are all fundamentally ways of comparing the predicted values versus the true values
	</tbody>What constitutes "good" metrics, will really depend on the specific situation
</table>Still confused on the confusion matrix?
<li></li>No problem! Check out the Wikipedia page for it, it has a really good diagram with all the formulas for all the metrics
<li></li>Throughout the training, we'll usually just print out metrics (e. g. accuracy)
<li></li>Let's think back on this idea of: what is a good enough accuracy?
<li></li>This all depends on the context of the situation! Did you create a model to predict presence of a disease? Is the disease presence well balanced in the general population? (Probably not!)
<li></li>Often models are used as quick diagnostics test to have <strong>before</strong> having a more invasive test (e. g. getting urine test before getting a biopsy). We also need to consider what is at stake!
<li></li>Often we have a precision/recall trade off, we need to decide if the model will should focus on fixing False Positives vs False Negatives
<li></li>In disease diagnosis, it is probably better to go in the direction of False positives, so we make sure we correctly classify as many cases of disease as possible!
<li></li>All of this is to say, machine learning is not performed in a "vacuum", but instead a collaborative process where we should consult with exprets in the domain (e. g. medical doctors)
<li></li>
<li></li>
<li></li>