
# Machine Learning Algorithms for Data Science

This guide is designed for data science students to understand when to use **Linear Regression**, **Polynomial Regression**, and **Logistic Regression**, what kind of data each algorithm requires, and how to implement them using Python.

---

## 1. Linear Regression

### When to Use:
- **Use Linear Regression** when you have continuous data and you believe the relationship between the features and the target variable is linear (i.e., changes in the input are proportional to changes in the output).
- **Typical Problems**: Predicting house prices, predicting sales over time.

### Type of Data Required:
- **Features**: Numeric, continuous variables.
- **Target Variable**: Continuous variable.

### How to Use:
The `scikit-learn` library makes it easy to implement linear regression. Here's a code snippet for fitting a simple linear regression model.

```python
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt

# Sample data (replace with your own)
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1, 4, 9, 16, 25])

# Initialize and fit the model
model = LinearRegression()
model.fit(X, y)

# Predict
y_pred = model.predict(X)

# Visualization
plt.scatter(X, y, color='blue', label='Actual')
plt.plot(X, y_pred, color='red', label='Predicted')
plt.title('Linear Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()
```

### Visualization:
The red line represents the predicted values based on the model. If the relationship is truly linear, the points should be close to the line.

---

![image.png](attachment:image.png)

## 2. Polynomial Regression

### When to Use:
- **Use Polynomial Regression** when you have continuous data, but the relationship between the variables is non-linear.
- **Typical Problems**: Modeling growth rates, complex curves in engineering, or any dataset where a curve fits the data better than a straight line.

### Type of Data Required:
- **Features**: Numeric, continuous variables (like Linear Regression).
- **Target Variable**: Continuous variable.

### How to Use:
Polynomial regression is implemented by transforming the features into a polynomial form and then applying linear regression.

```python
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt

# Sample data (replace with your own)
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1, 8, 27, 64, 125])

# Transform features to polynomial (degree 3)
poly = PolynomialFeatures(degree=3)
X_poly = poly.fit_transform(X)

# Initialize and fit the model
model = LinearRegression()
model.fit(X_poly, y)

# Predict
y_pred = model.predict(X_poly)

# Visualization
plt.scatter(X, y, color='blue', label='Actual')
plt.plot(X, y_pred, color='red', label='Predicted')
plt.title('Polynomial Regression (Degree 3)')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()
```

### Visualization:
The red curve shows the predictions of the polynomial regression model. If the data follows a non-linear trend, the curve will fit the data points better than a straight line.

---

![image.png](attachment:image.png)

## 3. Logistic Regression

### When to Use:
- **Use Logistic Regression** when you are dealing with a binary classification problem (i.e., the target variable has only two possible outcomes such as 0/1, True/False, or Yes/No).
- **Typical Problems**: Spam detection, predicting whether a student will pass/fail, predicting if a tumor is malignant/benign.

### Type of Data Required:
- **Features**: Numeric or categorical features (requires encoding if categorical).
- **Target Variable**: Binary variable (0/1).

### How to Use:
Here’s how to implement Logistic Regression for a binary classification task.

```python
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np
import matplotlib.pyplot as plt
from sklearn import metrics

# Load sample dataset (Iris dataset with binary classification)
iris = load_iris()
X = iris.data[iris.target != 2]  # Only use 2 classes
y = iris.target[iris.target != 2]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize and fit the model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = metrics.accuracy_score(y_test, y_pred)

# Output accuracy
print(f'Accuracy: {accuracy * 100:.2f}%')

# Visualization (2D example for the first two features)
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap='winter', label='True labels')
plt.title('Logistic Regression Classification')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
```

### Visualization:
The plot above uses only the first two features of the Iris dataset. The scatter plot colors the points based on their true class labels, and the model attempts to classify them using Logistic Regression.

---
![image.png](attachment:image.png)



## When to Use Each Algorithm

- **Linear Regression**: Use when the relationship between the dependent and independent variables is linear (i.e., the change in the target is proportional to the change in the features). Typically used for predicting continuous outcomes.
  
- **Polynomial Regression**: Use when the data shows a non-linear pattern, but the response variable is still continuous. It’s an extension of linear regression that fits a polynomial equation instead of a straight line.

- **Logistic Regression**: Use when your problem involves binary classification (e.g., True/False, Yes/No). It’s used for problems where the target variable is categorical, specifically binary.

---

## Conclusion

Selecting the right algorithm depends on the type of problem you're trying to solve and the characteristics of the data. Here's a quick guide:

- **Linear Relationship** → Linear Regression.
- **Non-linear Relationship (Continuous)** → Polynomial Regression.
- **Binary Classification** → Logistic Regression.

### References
- [Linear Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)
- [Polynomial Regression](https://scikit-learn.org/stable/auto_examples/linear_model/plot_polynomial_interpolation.html)
- [Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)
```

### Key Features of This Guide:
1. **Explanations**: Clear guidance on when and why to use each algorithm.
2. **Data Requirements**: Each section explains the type of data needed.
3. **Code Snippets**: Ready-to-use code for implementing and visualizing results using `scikit-learn` and `matplotlib`.
4. **Visualization**: Includes code to visualize the outputs of the models to better understand how well they perform.

