# Machine Learning for Data Science
Machine learning is a subset of artificial intelligence that involves teaching computers to learn patterns from data and make predictions or decisions without explicit programming. This notebook provides an introduction to machine learning, core algorithms, and practical implementation using Python's scikit-learn library.

## Introduction to Machine Learning
Machine learning can be broadly categorized into two main types:

- **Supervised Learning**: The model learns from labeled data (input-output pairs) to make predictions.
- **Unsupervised Learning**: The model identifies patterns or groupings in data without labeled outcomes.

### Use Case in Data Science
1. **Supervised Learning:** Predicting house prices based on features such as location, size, and amenities.
2. **Unsupervised Learning:** Customer segmentation in marketing to identify distinct groups of customers with similar behavior.

## Supervised Learning Algorithms
Supervised learning uses labeled datasets to train models. Here are some key algorithms:

### 1. Linear Regression
Linear regression is used to predict a continuous target variable based on one or more input features.

#### Example:
```python
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic data
X = np.random.rand(100, 1) * 10  # Input feature
y = 3 * X + np.random.randn(100, 1) * 2  # Target with noise

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluate model
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
```

### 2. Logistic Regression
Logistic regression is used for binary classification problems, predicting probabilities for categorical outcomes.

#### Example:
```python
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification

# Generate synthetic binary classification data
X, y = make_classification(n_samples=200, n_features=2, n_classes=2, random_state=42)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Evaluate model
accuracy = model.score(X_test, y_test)
print("Accuracy:", accuracy)
```

### 3. Decision Trees
Decision trees are versatile algorithms used for both classification and regression tasks.

#### Example:
```python
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification

# Generate synthetic data
X, y = make_classification(n_samples=200, n_features=4, n_classes=2, random_state=42)

# Train decision tree classifier
model = DecisionTreeClassifier()
model.fit(X, y)

print("Feature importances:", model.feature_importances_)
```

### 4. Support Vector Machines (SVM)
SVMs are used for classification tasks and work well with both linear and non-linear decision boundaries.

#### Example:
```python
from sklearn.svm import SVC

# Train support vector classifier
model = SVC(kernel='linear')
model.fit(X_train, y_train)

accuracy = model.score(X_test, y_test)
print("SVM Accuracy:", accuracy)
```

## Model Evaluation and Selection
Evaluating model performance ensures that the model generalizes well to unseen data.

- **Metrics for Regression:** Mean Squared Error (MSE), R-squared.
- **Metrics for Classification:** Accuracy, Precision, Recall, F1-Score.

#### Example:
```python
from sklearn.metrics import classification_report, confusion_matrix

# Generate evaluation metrics for classification
y_pred = model.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
```

## Basic Implementation with scikit-learn
Scikit-learn is a powerful library for machine learning in Python. It provides tools for data preprocessing, model training, and evaluation.

#### Workflow:
1. **Load Data**: Import or generate datasets.
2. **Preprocess Data**: Handle missing values, scale features.
3. **Split Data**: Divide data into training and test sets.
4. **Train Model**: Fit a machine learning model.
5. **Evaluate Model**: Use metrics to assess performance.

### Example:
```python
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Load data
X, y = make_classification(n_samples=300, n_features=5, random_state=42)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train random forest classifier
model = RandomForestClassifier()
model.fit(X_train, y_train)

print("Model accuracy:", model.score(X_test, y_test))
```

## Practice Exercises
1. Train a linear regression model on a dataset of your choice. Evaluate its performance.
2. Implement a decision tree classifier to predict whether customers will churn based on their behavior.
3. Use logistic regression for a binary classification task (e.g., predicting whether an email is spam).
4. Train a support vector machine (SVM) for a multi-class classification problem.
