# **Machine Learning** 

Machine learning (ML) is a subset of artificial intelligence (AI) that enables systems to learn from data, identify patterns, and make decisions or predictions without being explicitly programmed. Unlike traditional programming (where explicit instructions are coded), machine learning systems improve their performance as they are exposed to more data over time.

___

# **1. Definition of Machine Learning**

- [Machine Learning Blogs](https://mailchimp.com/resources/types-of-machine-learning-alogrithms/?ds_c=DEPT_BAU_GOOGLE_SEARCH_APAC_EN_NB_ACQUIRE_BROAD_DSA-50OFF_APACOTH&ds_kids=p82448132187&ds_a_lid=dsa-2227026702184&ds_cid=71700000123022287&ds_agid=58700008937753817&gclsrc=aw.ds&gad_source=1&gad_campaignid=22775832945&gbraid=0AAAAADh1Fp3pyLLSP8rOb1DzoFwzugj6t&gclid=Cj0KCQjw-4XFBhCBARIsAAdNOktLmw3RGhzRU4-HsHTCXYf-Er7D_xRI_ZXd7jBxk-jpyvnSzt2vuG4aAvAAEALw_wcB)

Arthur Samuel (1959) defined it as:

__"The field of study that gives computers the ability to learn without being explicitly programmed."__

In practical terms, ML involves:

* __Input:__ Feeding data into an algorithm.

* __Learning:__ The algorithm detects patterns and relationships in the data.

* __Output:__ The system makes predictions or decisions based on learned patterns.

___


# **2. Core Components of Machine Learning**

To deeply understand ML, we need to break it into key components:

### (a) Data

Data is the foundation of ML.

Types of data:

* __Structured__ (e.g., tabular data: sales, numbers, categories)

* __Unstructured__ (e.g., images, videos, text, audio)

Quality and quantity of data heavily affect model performance.

### (b) Features and Labels

* __Features (X):__ Input variables or characteristics describing each instance (e.g., height, weight).

* __Labels (y):__ The target variable we want to predict (e.g., "disease: yes/no").

### (c) Model

* A mathematical or computational representation of the relationship between input and output.

* Models learn by adjusting parameters during training.

### (d) Algorithm

An algorithm is a method or set of rules used to find patterns in data and optimize the model (e.g., gradient descent).

### (e) Loss/Cost Function

* A mathematical function measuring how far predictions are from actual values.

* The model tries to minimize this loss.


<img src="../images/ml-lifcycle.png" alt="Life Cycle of ML" width="300" />


---

# **3. How Machine Learning Works**

The ML process typically involves these steps:

* __Collect Data:__ Gather relevant, representative, and high-quality data.

* __Preprocess Data:__ Clean (handle missing values, normalize) and transform data.

* __Split Data:__

    * Training set (to teach the model).

    * Validation/test set (to check performance on unseen data).

    * Choose a Model & Algorithm: (e.g., linear regression, decision tree, neural network).

* __Train the Model:__ Feed training data and adjust model parameters to minimize loss.

* __Evaluate Performance:__ Measure accuracy, precision, recall, F1 score, etc.

* __Deploy the Model:__ Use it to predict on new, unseen data.

* __Iterate & Improve:__ Retrain and optimize with new data.

<!-- ![Working of Machine Learning](images/Machine_learning_works.png) -->

<img src="../images/Machine_learning_works.png" alt="Working of Machine Learning" width="400"/>








___

# **4. Types of Machine Learning**

ML is broadly categorized into:

### (a) Supervised Learning
__Definition:__ The model is trained on labeled data (inputs and correct outputs are known).

__Goal:__ Learn a mapping from input features to output labels.

__Examples:__

* Predicting house prices (regression)

* Email spam detection (classification)

* Algorithms: Linear regression, logistic regression, SVM, decision trees, random forest, neural networks.

### (b) Unsupervised Learning

__Definition:__ The model is trained on unlabeled data (no predefined output).

__Goal:__ Discover patterns, structure, or relationships.

__Examples:__

* Market segmentation (clustering)

* Anomaly detection

* Algorithms: K-means clustering, hierarchical clustering, PCA (dimensionality reduction).

### (c) Reinforcement Learning (RL)

__Definition:__ The model learns by interacting with an environment, receiving rewards or penalties.

__Goal:__ Learn a sequence of actions that maximize long-term rewards.

__Examples:__

* Game-playing AI (e.g., AlphaGo)

* Robotics navigation

* Key Concepts:

* Agent (decision-maker)

* Environment

* Reward signal


<img src="../images/ml-types.png" alt="Linear Regression Graph" width="400" height="250"/>



### (d) Semi-supervised Learning

__Definition:__ Uses a small amount of labeled data + large amount of unlabeled data.

__Applications:__ Medical diagnosis (where labeled data is scarce).

### (e) Self-supervised Learning

Emerging approach where the system creates its own labels from raw data (used in large language models like GPT).


___

# **5. Important Concepts in Machine Learning**

* __Overfitting:__ Model performs well on training data but poorly on unseen data (memorization instead of learning patterns).

* __Underfitting:__ Model is too simple to capture the complexity of data.

* __Bias-Variance Tradeoff:__

    * __Bias:__ Error due to oversimplified assumptions.

    * __Variance:__ Error due to sensitivity to training data noise.

* __Regularization:__ Techniques (e.g., L1, L2) to reduce overfitting.

* __Cross-validation:__ Splitting data multiple ways to better evaluate model.

___

# **6. Advanced Topics**

* __Deep Learning (DL):__ ML subset using neural networks with many layers (e.g., CNNs, RNNs, Transformers).

* __Transfer Learning:__ Using a pre-trained model on a new task.

* __Online Learning:__ Model updates in real-time as new data streams in.

* __Explainable AI (XAI):__ Making ML models interpretable to humans.

* __Generative AI:__ Models that create new data (e.g., GPT for text, Stable Diffusion for images).


<img src="../images/bias.png" width=300>


___

# **7. Applications of Machine Learning**

ML is everywhere:

* __Healthcare:__ Disease diagnosis, drug discovery.

* __Finance:__ Fraud detection, credit scoring.

* __Retail:__ Recommendation systems.

* __Transportation:__ Self-driving cars.

* __NLP:__ Chatbots, translation.

* __Computer Vision:__ Face recognition, object detection.

# **8. Challenges in Machine Learning**

* Data quality and quantity (garbage in → garbage out).

* Interpretability (black-box models).

* Ethical concerns (bias, fairness, privacy).

* Scalability (training on massive datasets).

* Deployment (integrating models into real-world systems).

# **9. ML vs. AI vs. DL**

* __AI:__ Broad field of building intelligent systems.

* __ML:__ Subfield of AI focused on data-driven learning.

* __DL:__ Subfield of ML focusing on neural networks with many layers.