# Basic steps of ML

1. **Data Collection**
2. **Data Preprocessing**( done by human)
3. **Model Selection** (Selecting the type of algo which needs to be trained)
4. **Model training** ( We feed the data to the model in the form of inputs, outputs by using .fit method)
5. **Model Testing/Prediction** ( We test the model by using .predict method)

### Model can be classified into 3 types

1. **Supervised learning** ( Data with known outputs) - has the best accuracy and trustworthy
2. **Un-Supervised learning** ( Data with unknown outputs(labels)) - not the most accurate
3. **Reinforcement learning** ( Data trained with previous experiences) 

### Supervised learning 
1. **Regression** - predicts based on patterns (can make own predictions)
2. **Classification** - predict only if it is trained on the data pattern, if not cannot predict (cannot make own predictions)

### Unsupervised Learning

1. **Clustering** - Ml technique designed to group unlabelled examples based on similarities
2. **Association** - A rule based ML algo method for discovering relations between variables in large dataset

In [6]:
import numpy as np
from sklearn.svm import SVC
from sklearn.linear_model import LinearRegression

inputs = np.array([[1],[2],[3],[4],[5]])
outputs = np.array([7,14,21,28,35])
alg =SVC()
alg.fit(inputs, outputs)
model = LinearRegression()
model.fit(inputs, outputs)
result1 = model.predict(np.array([[10],[69],[420]]))
result2 = alg.predict(np.array([[10],[69],[420]]))
print(result1)
print("Linear Regression: ",model.score(inputs, outputs))
print(result2)
print("SVC: ",alg.score(inputs, outputs))



[  70.  483. 2940.]
Linear Regression:  1.0
[35 35 35]
SVC:  1.0


## Model Performance Metrics

To evaluate the effectiveness and efficiency of a model

#### TP TN FP FN

**TP** : True Positive  
**TN** : True Negative  
**FP** : False Positive  
**FN** : False Negative  



### Accuracy
ie no of correct predictions

**Accuracy** = (TP+TN) / (TP+TN+FP+FN)

### Precision

tells us how many predictions made by the model are actually correct. Essentially used when the FP's are high.  
The importance of precision is in  eg: Video recommendation systems, e-commerce websites.   
It gives us insight into the model's ability to avoid false positives.

**Precision** = TP / (TP + FP)

### Recall(Sensitivity)
Also known as the sensitivity of true positive rate, recall measures the proportion of true positive predictions among all actual positive instances in the dataset. It gives us insights into model's ability to avoid false negatives.  
It is calculated as -  
**Recall** = TP / (TP + FN)

TP : patients correctly diagnosed with the disease  
FP : patients incorrectly diagnosed with the disease  
FN : patients with the disease incorrectly diagnosed as not having the disease  
TN : patients with the disease correctly diagnosed as not having the disease


## Precision vs Recall

1. Precision can be seen as a measure of quality.  1. Recall can be seen as a measure of quantity.
2. Higher precision means that an algo returns more relevant results than irrelevant ones. 1. Higher recall means that an algo returns mosrt of the relevant results (whether irrelevant ones are also returned).
3. Precision measures the accuracy of positive precisions. 3. Recall measures the complete correctness of the predictions

## Differences between Precision and Recall

1. **Precision** is seen as a measure of quality, while **Recall** is seen as a measure of quantity.
2. Higher **Precision** means that an algorithm returns more relevant results than irrelevant ones, whereas higher **Recall** means that an algorithm returns most of the relevant results (even if irrelevant ones are also returned).
3. **Precision** measures the accuracy of positive predictions, while **Recall** measures the completeness of the predictions.
4. **Precision** is crucial in scenarios where false positives are costly, such as spam detection or fraud detection.  
    **Recall** is crucial in scenarios where false negatives are costly, such as disease diagnosis or search-and-rescue operations.

### Example:
- In a **spam email classifier**:
  - **Precision** ensures that emails classified as spam are truly spam (avoiding false positives like important emails being marked as spam).
  - **Recall** ensures that most spam emails are identified (avoiding false negatives like spam emails being missed).

- In a **medical diagnosis system**:
  - **Precision** ensures that patients diagnosed with a disease truly have the disease (avoiding false positives like unnecessary treatments).
  - **Recall** ensures that most patients with the disease are correctly diagnosed (avoiding false negatives like missing a diagnosis).  

## Student Score Model

In [12]:
import numpy as np
from sklearn.svm import SVC
from sklearn.linear_model import LinearRegression

inputs = np.array([[20],[34],[35],[59],[60],[74],[75]])
outputs = np.array(['fail','better luck next time','pass','good score keep going','first class','just missed distinction','distinction congrats'])
alg =SVC()
alg.fit(inputs, outputs)
# model = LinearRegression()
# model.fit(inputs, outputs)
# result1 = model.predict(np.array([[10],[69],[420]]))
result2 = alg.predict(np.array([[34],[36],[58],[61],[73],[78]]))
# print(result1)
# print("Linear Regression: ",model.score(inputs, outputs))
print(result2)
print("SVC: ",alg.score(inputs, outputs))



['better luck next time' 'pass' 'good score keep going' 'first class'
 'just missed distinction' 'distinction congrats']
SVC:  1.0
