#### Feature Store
A feature store is a centralized repository for storing, sharing, and managing features used in machine learning models. It helps ensure consistency between training and serving, supports feature reuse, and enables real-time feature access.

#### Offline vs. Online Inference
- **Offline inference**: Predictions are made in batches, often on a schedule, and results are stored for later use (e.g., nightly churn prediction).
- **Online inference**: Predictions are made in real-time as requests arrive, enabling immediate responses (e.g., fraud detection during a transaction).

#### Batch vs. Real-Time Pipelines
- **Batch pipelines**: Process large volumes of data at once, suitable for periodic updates and analytics.
- **Real-time pipelines**: Process data as it arrives, enabling immediate actions and up-to-date insights.

#### Model Monitoring & Drift
Model monitoring tracks the performance of deployed models over time. Drift occurs when the data or relationships change, causing model accuracy to degrade. Monitoring helps detect drift and trigger retraining or updates.

#### End-to-End ML Pipeline Design
A typical ML pipeline includes:
1. **Data Collection**: Gather raw data from various sources.
2. **Data Processing**: Clean, transform, and engineer features.
3. **Model Training**: Train models using processed data.
4. **Model Evaluation**: Assess model performance.
5. **Deployment**: Serve the model for inference.
6. **Monitoring**: Track model performance and data drift.

#### ML Project Lifecycle

![image.png](attachment:image.png)

Let's see a basic code example for a simple ML pipeline.

In [None]:
# Basic ML Pipeline Example
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# 1. Data Collection
# (Here, we create synthetic data)
data = pd.DataFrame({
    'feature1': np.random.rand(100),
    'feature2': np.random.rand(100),
    'label': np.random.randint(0, 2, 100)
})

# 2. Data Processing
X = data[['feature1', 'feature2']]
y = data['label']

# 3. Model Training
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression()
model.fit(X_train, y_train)

# 4. Model Evaluation
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model accuracy: {accuracy:.2f}")