# Machine Learning

## What is ML?
Machine Learning (ML) is a subset of Artificial Intelligence (AI) that enables systems to learn from data and improve their performance over time without being explicitly programmed. It involves developing algorithms that can identify patterns, make predictions, and adapt based on new data.

## AI, ML, and Data Science. What is the difference?
- **Artificial Intelligence (AI)**: A broad field that aims to create intelligent machines that can simulate human intelligence, including reasoning, learning, and decision-making.
- **Machine Learning (ML)**: A subset of AI that focuses on training models to learn from data and make predictions or decisions without explicit programming.
- **Data Science**: An interdisciplinary field that uses statistical, mathematical, and computational techniques to extract insights from data. It encompasses AI and ML but also includes data engineering, visualization, and analytics.

## Steps in a ML Project
### 1. Data Collection
The first step in an ML project is collecting relevant data. Data can come from various sources such as databases, APIs, or user-generated inputs. The quality and quantity of data significantly impact the performance of the ML model.

### 2. Data Modeling
#### Problem Definition
- Clearly define the problem the ML model is intended to solve.

#### Data
- Identify the type of data available:
  - **Structured**: Data organized into tables with rows and columns.
  - **Unstructured**: Data such as images, videos, and text.
  - **Static**: Pre-collected, non-changing datasets.
  - **Streaming**: Real-time data that continuously updates.

#### Evaluation
- Determine success criteria:
  - **Accuracy**: The percentage of correct predictions.
  - **Precision**: The proportion of true positive predictions.
  - **Recall**: The ability of the model to capture all relevant instances.

#### Features
- Select relevant features that will influence the model’s predictions.
  - **Feature Variables**:
    - Numerical: Continuous values (e.g., age, height).
    - Categorical: Discrete values (e.g., gender, category labels).
  - **Target Variables**: The outcome the model aims to predict.

#### Modeling
- Select the appropriate model type and training strategy.
  - **Data Splitting**: Divide data into three sets:
    - **Training Set** (70-80%): Used to train the model.
    - **Validation Set** (10-15%): Used to fine-tune parameters and prevent overfitting.
    - **Test Set** (10-15%): Used to evaluate final model performance.
  - Ensure no data duplication to avoid the model memorizing instead of learning.
  - Train the model using suitable algorithms.
  - Tune hyperparameters to optimize performance.
  - Compare different models based on:
    - **Underfitting**: The model is too simple and performs poorly. Solutions include increasing model complexity, adjusting hyperparameters, or training longer.
    - **Overfitting**: The model is too complex and memorizes training data. Solutions include collecting more data, simplifying the model, or using regularization techniques.
    - **Data Leakage**: Unintentional sharing of information between training and validation sets.
    - **Data Mismatch**: Differences in data distribution between training and real-world application.

#### Experiments
- Keep track of different model configurations and test alternative approaches.
- Iterate based on evaluation results.

### 3. Deployment
After a model is trained and evaluated, it is deployed into production for real-world use. This step includes:
- Integrating the model into an application or service.
- Monitoring model performance over time.
- Updating or retraining the model as new data becomes available.

## Types of ML
### 1. Supervised Learning
- **Classification**: Categorizing data into predefined labels (e.g., spam detection).
- **Regression**: Predicting continuous values (e.g., house price estimation).

### 2. Unsupervised Learning
- **Clustering**: Grouping data into clusters based on similarity (e.g., customer segmentation).
- **Association Rule Learning**: Discovering relationships between variables (e.g., market basket analysis).

### 3. Reinforcement Learning
- **Skill Acquisition**: Training agents to learn optimal actions.
- **Real-time Learning**: Adapting behavior based on dynamic environments.

### 4. Transfer Learning
Reusing a pre-trained model on a new task to improve performance and reduce training time.

In [1]:
print("ola")

ola
