## Artificial intelligence VS Machine learning VS Deep learning

#### Artificial intelligence 
It refers to the simulation of human intelligence in machines, enabling them to perform tasks like reasoning, decision-making, and problem-solving. It focuses on creating intelligent systems that can mimic human behavior. AI can be rule-based, knowledge-driven, or data-driven, and it includes subfields like robotics, natural language processing (NLP), and computer vision.

#### Machine learning 
It is a subset of AI that uses algorithms to enable systems to learn from data and improve over time without explicit programming. It focuses on identifying patterns and making predictions or decisions based on data.

#### Deep learning 
It is a specialized subset of ML that uses artificial neural networks with multiple layers to process and analyze data. It excels at handling large datasets and complex, unstructured data like images, audio, and text. DL models automatically extract features from raw data, eliminating the need for manual feature engineering.

## Machine Learning Paradigms

### 1. Supervised Learning
Used when labeled data is available ‚Äî the model learns by example.

- **Regression:** Predicting continuous values  
  _Example:_ House price prediction  
- **Classification:** Predicting discrete categories  
  _Example:_ Spam vs. Not Spam detection  

---

### 2. Unsupervised Learning
Used when data has no labels ‚Äî the model finds hidden patterns.

- **Clustering:** Grouping similar data points  
  _Example:_ Customer segmentation  
- **Dimensionality Reduction:** Simplifying data while retaining structure  
  _Example:_ PCA for visualization  
- **Anomaly Detection:** Identifying unusual data points  
  _Example:_ Fraud detection  
- **Association:** Discovering relationships between variables  
  _Example:_ Market basket analysis, item positioning in Walmart  

---

### 3. Semi-Supervised Learning
A hybrid approach where the model is trained on a small amount of labeled data and a large amount of unlabeled data.  
_Example:_ Text classification with limited labeled samples. Google photos

---

### 4. Reinforcement Learning
An agent learns by interacting with an environment and receiving feedback (rewards or penalties)
_Example:_ Game-playing AI (like AlphaGo) or autonomous robotics, driverless cars


## ‚öôÔ∏è Batch vs. Online Machine Learning

### üß© What is Batch (Offline) Learning?
- In **Batch Learning**, the model is trained on the **entire dataset at once**.
- Once trained, the model remains **static** until it is retrained from scratch on new data.
- Commonly used for **stable, large datasets** where data doesn‚Äôt change rapidly.

üß† **Example:**  
Training a fraud detection model on last year‚Äôs transaction data and redeploying it after full retraining.

---

### ‚ö†Ô∏è Problems with Batch Learning
- **Computationally expensive:** Retraining the whole model every time new data arrives is resource-heavy.  
- **Slow adaptation:** Cannot handle streaming or real-time data efficiently.  
- **Storage issues:** Requires storing and reprocessing the entire dataset.  
- **Downtime:** Model updates are not continuous ‚Äî performance may degrade between retrain cycles.

---

### ‚ùå Disadvantages of Batch ML
- Not suitable for **dynamic environments** (e.g., real-time recommendation systems).  
- **Outdated models** between retraining cycles.  
- Hard to scale with **massive, evolving datasets**.

---

### ‚ö° Online (Incremental) Machine Learning
- Learns **incrementally**, updating the model **as new data arrives**.
- Ideal for **streaming data** or scenarios where data is continuously generated.  
- The model evolves in **real-time**, adapting to new patterns and trends.

üß† **Example:**  
Stock price prediction systems that continuously update as new price data streams in.

---

### üîÅ When to Use Which Type?
| Scenario | Recommended Approach |
|-----------|----------------------|
| Static data, rarely updated | **Batch Learning** |
| Large data that fits in memory | **Batch Learning** |
| Streaming or continuously updating data | **Online Learning** |
| Real-time personalization or adaptive systems | **Online Learning** |
| When there is a concept drift* | **Online Learning** |

---
Concept drift refers to changes in the data patterns and relationships that the ML model has learned, potentially causing a decline in the production model quality.

### üî¢ Learning Rate in Online ML
- Defines **how much the model updates** with each new observation.  
- A **high learning rate** ‚Üí fast adaptation but risk of instability.  
- A **low learning rate** ‚Üí stable but slower convergence.  


### üß∞ Out-of-Core Learning
- Technique used when the dataset **doesn‚Äôt fit in memory (RAM)**.  
- Processes data in **mini-batches** (chunks) sequentially instead of all at once.  
- Commonly used in **scikit-learn**, **TensorFlow**, and **PyTorch** pipelines.

---

### ‚ö†Ô∏è Disadvantages of Out-of-Core Learning
- **Slower training** due to disk I/O operations.  
- **Complex implementation** ‚Äî needs proper data pipeline management.  
- **Limited model types** support this mode efficiently.

---

üß≠ **Summary**
- **Batch ML** ‚Üí Great for stability and large offline datasets.  
- **Online ML** ‚Üí Essential for adaptability, streaming data, and continuous learning.  
- **Out-of-Core ML** bridges the gap when data size exceeds memory limits.


### üîÅ Instance vs Model?
Instance based learning: 
Machine learning approach where the system memorizes training examples and uses them to classify new instances based on a similarity measure. It is also referred to as lazy learning because it delays computation until a new query needs to be classified. Instead of building a global model, it relies on local approximations derived from the stored instances.
e.g: K-Nearest Neighbors (KNN)

Model based learning
This approach is by constructing models from the training data that can generalize better than instance-based methods. This involves using algorithms like linear regression, logistic regression, random forest, etc. trees to create an underlying model from which predictions can be made for new data points.

https://vitalflux.com/instance-based-learning-model-based-learning-differences/#google_vignette

## Machine learning development life cycle (MLDLC)
- Framing the problem
- Gathering the data
- Data Preprocessing (Missing values, duplicates, Outliers, Scaling issues)
- Exploratory data analysis (EDA) (Vizualizations, Univariate, Bi-variate, Outlier detection, Imbalance)
- Feature engineering and selection
- Model training, evaluation, and selection
- Model deployment
- Testing
- Optimization and maintenance

## Challenges in Machine learning
- Data collection
- Insufficient data/Labelled data
- Non representative data - Entire data is not present (Sampling bias or Sampling Noise)
- Poor Quality Data
- Irelevant Features
- Overfitting
- Underfitting
- Software integration
- Offline Learning/ Deployment
- Cost involved