### Q1. Explain the following with an example-
1. Artificial Intelligence
2. Machine Learning
3. Deep Learning

#### **Artificial Intelligence (AI)**
   - **Definition**: The broad field of creating machines or systems that can perform tasks that typically require human intelligence.
   - **Includes**: Various techniques and subfields such as Machine Learning and Deep Learning.

#### **Machine Learning (ML)**
   - **Definition**: A subset of AI that involves training algorithms to learn patterns from data and make predictions or decisions based on it.
   - **Techniques**: Supervised learning, unsupervised learning, reinforcement learning, etc.

#### **Deep Learning (DL)**
   - **Definition**: A subset of Machine Learning that uses neural networks with many layers (deep neural networks) to model complex patterns in large datasets.
   - **Applications**: Image recognition, natural language processing, and more.

#### **Summary Table**

| **Category**         | **Description**                                                 |
|----------------------|-----------------------------------------------------------------|
| **Artificial Intelligence (AI)** | Broad field focused on creating systems that mimic human intelligence. |
| **Machine Learning (ML)**       | Subset of AI involving algorithms that learn from data.         |
| **Deep Learning (DL)**          | Subset of ML using deep neural networks to model complex patterns. |

### Q2. What is Supervised Learning? List some examples of Supervised Learning.

#### Supervised Learning

**Definition**: A machine learning method where the model is trained on labeled data to predict outcomes for new data.

**Examples**:
1. **Classification**:
   - Spam detection
   - Image classification
   - Medical diagnosis

2. **Regression**:
   - House price prediction
   - Stock price forecasting
   - Temperature prediction

### Q3. What is Unsupervised Learning? List some examples of Unsupervised Learning.

#### Unsupervised Learning

**Definition**: A machine learning method where the model is trained on unlabeled data to identify patterns or groupings without explicit guidance on the outcomes.

**Examples**:
1. **Clustering**:
   - Customer segmentation
   - Image segmentation
   - Market basket analysis

2. **Dimensionality Reduction**:
   - Principal Component Analysis (PCA)
   - t-Distributed Stochastic Neighbor Embedding (t-SNE)
   - Feature extraction for visualizations

### Q4. What is the differences Between AI, ML, DL, and DS.

#### Differences Between AI, ML, DL, and DS:

| **Term**         | **Definition**                                                                 | **Scope**                                                      | **Examples**                                                 |
|------------------|---------------------------------------------------------------------------------|----------------------------------------------------------------|--------------------------------------------------------------|
| **Artificial Intelligence (AI)** | The broad field of creating systems that can perform tasks requiring human intelligence. | Encompasses all techniques for making machines "intelligent." | Chatbots, recommendation systems, autonomous vehicles.      |
| **Machine Learning (ML)**       | A subset of AI where systems learn from data to improve their performance without being explicitly programmed. | Focuses on algorithms and statistical models that learn from data. | Spam detection, image recognition, predictive analytics.    |
| **Deep Learning (DL)**           | A subset of ML involving neural networks with many layers (deep neural networks) to model complex patterns in large datasets. | Specializes in complex models and large-scale data.           | Speech recognition, image classification, natural language processing. |
| **Data Science (DS)**            | The interdisciplinary field of extracting insights and knowledge from data through various techniques, including statistics, data analysis, and machine learning. | Broader than AI, includes data preparation, analysis, and visualization. | Data-driven decision-making, business intelligence, data visualization. |

#### Summary
- **AI** is the broad field aiming to create intelligent systems.
- **ML** is a subset of AI focused on algorithms that learn from data.
- **DL** is a subset of ML that uses deep neural networks for more complex learning.
- **DS** is a broader field that encompasses data analysis and the application of various techniques to derive insights from data.

### Q5. What are the main differences between supervised, unsupervised, and semi-supervised learning.

| Feature                      | Supervised Learning                                | Unsupervised Learning                             | Semi-Supervised Learning                          |
|------------------------------|----------------------------------------------------|--------------------------------------------------|--------------------------------------------------|
| **Definition**               | Learning from labeled data                         | Learning from unlabeled data                      | Learning from a mix of labeled and unlabeled data |
| **Data Requirement**         | Requires labeled data                              | Requires unlabeled data                           | Requires both labeled and unlabeled data          |
| **Output**                   | Predicts the output based on input data            | Finds hidden patterns or intrinsic structures     | Uses labeled data to improve pattern discovery    |
| **Example Algorithms**       | Linear Regression, Decision Trees, SVM, etc.       | K-Means, PCA, Hierarchical Clustering, etc.       | Semi-Supervised SVM, Semi-Supervised Clustering   |
| **Use Case**                 | Classification, Regression                         | Clustering, Association                           | When labeled data is expensive, but unlabeled data is abundant |
| **Goal**                     | Map inputs to outputs                              | Group similar data points                         | Improve learning accuracy using fewer labels      |
| **Complexity**               | Moderate to high, depending on the algorithm       | Generally lower as it involves clustering         | Varies depending on the amount of labeled data and the algorithm used |

### Q6- What is train, test and validation split? Explain the importance of each term.

#### Train, Test, and Validation Split

**Train, test, and validation split** is a method used to divide a dataset into three parts to ensure the effectiveness and generalization ability of a machine learning model. Here’s an explanation of each term and its importance:

| **Term**          | **Definition** | **Importance** |
|-------------------|----------------|----------------|
| **Training Set**  | The portion of the dataset used to train the model. | Used to fit the model, allowing it to learn patterns and relationships from the data. It directly influences the model's performance and accuracy. |
| **Validation Set**| A subset of the dataset used to tune model hyperparameters and evaluate the model during training. | Helps in fine-tuning the model and preventing overfitting. It provides an unbiased evaluation of the model while adjusting parameters. |
| **Test Set**      | The portion of the dataset used to assess the model's final performance. | Provides an unbiased evaluation of the final model. It measures how well the model generalizes to new, unseen data and helps ensure the model's reliability and robustness. |


### Q7- How can unsupervised learning be used in anomaly detection?

#### Using Unsupervised Learning for Anomaly Detection

**Unsupervised learning** can detect anomalies (outliers) without labeled data. Here’s how:

1. **Clustering**:
   - **Method**: Group similar data points (e.g., K-means, DBSCAN).
   - **Detection**: Points far from cluster centroids or in small clusters are anomalies.

2. **Density Estimation**:
   - **Method**: Estimate data density (e.g., Kernel Density Estimation, Isolation Forest).
   - **Detection**: Points in low-density regions are anomalies.

3. **Principal Component Analysis (PCA)**:
   - **Method**: Reduce data dimensionality.
   - **Detection**: Points with large reconstruction errors are anomalies.

4. **Autoencoders**:
   - **Method**: Neural networks learn data representations.
   - **Detection**: Points with high reconstruction errors are anomalies.

### Example Workflow

1. **Data Collection**: Gather dataset.
2. **Preprocessing**: Normalize/scale data.
3. **Algorithm Selection**: Choose appropriate method.
4. **Model Training**: Train on entire dataset.
5. **Anomaly Scoring**: Score each point.
6. **Threshold Setting**: Classify based on score threshold.
7. **Evaluation**: Assess performance with metrics.

### Practical Example: Fraud Detection

- **Data**: Transaction records.
- **Algorithm**: Isolation Forest.
- **Process**:
  - Train model on historical data.
  - Score each transaction.
  - Classify high-score transactions as potential fraud.
  - Investigate flagged transactions further.

**Benefits**:
- **Scalability**: Handles large datasets.
- **Flexibility**: No need for labeled data.
- **Adaptability**: Learns from new data anomalies.

### Q8- List down some commonly used supervised learning algorithms and unsupervised learning algorithms.

#### Commonly Used Supervised Learning Algorithms

1. **Linear Regression**:
   - **Usage**: Predicting continuous values.
   - **Example**: House price prediction.

2. **Logistic Regression**:
   - **Usage**: Binary classification.
   - **Example**: Spam detection in emails.

3. **Decision Trees**:
   - **Usage**: Classification and regression.
   - **Example**: Customer churn prediction.

4. **Support Vector Machines (SVM)**:
   - **Usage**: Classification and regression.
   - **Example**: Image classification.

5. **K-Nearest Neighbors (KNN)**:
   - **Usage**: Classification and regression.
   - **Example**: Handwriting recognition.

6. **Naive Bayes**:
   - **Usage**: Classification.
   - **Example**: Text classification.

7. **Random Forest**:
   - **Usage**: Classification and regression.
   - **Example**: Loan default prediction.

8. **Gradient Boosting Machines (GBM)**:
   - **Usage**: Classification and regression.
   - **Example**: Fraud detection.

9. **Neural Networks**:
   - **Usage**: Classification and regression.
   - **Example**: Image and speech recognition.

#### Commonly Used Unsupervised Learning Algorithms

1. **K-Means Clustering**:
   - **Usage**: Clustering data into K groups.
   - **Example**: Customer segmentation.

2. **Hierarchical Clustering**:
   - **Usage**: Creating a hierarchy of clusters.
   - **Example**: Gene sequence analysis.

3. **DBSCAN (Density-Based Spatial Clustering of Applications with Noise)**:
   - **Usage**: Clustering based on data density.
   - **Example**: Identifying clusters of points in spatial data.

4. **Principal Component Analysis (PCA)**:
   - **Usage**: Dimensionality reduction.
   - **Example**: Reducing features for visualization.

5. **Independent Component Analysis (ICA)**:
   - **Usage**: Signal separation.
   - **Example**: Separating mixed audio signals.

6. **Autoencoders**:
   - **Usage**: Data compression and anomaly detection.
   - **Example**: Image denoising.

7. **t-Distributed Stochastic Neighbor Embedding (t-SNE)**:
   - **Usage**: Data visualization.
   - **Example**: Visualizing high-dimensional data.

8. **Gaussian Mixture Models (GMM)**:
   - **Usage**: Probabilistic clustering.
   - **Example**: Clustering customer data with soft assignments.