<div style="max-width:66ch;">

# Exercise - AI intro

---
These are introductory exercises in AI theory with focus in **terminology** and **concepts**. The purpose of this exercise is to get an overview of field of machine learning and AI. We will introduce these concepts throughout the course.

<p class = "alert alert-info" role="alert"><b>Note</b> that in cases when you start to repeat code, try not to. Create functions to reuse code instead. </p>

<p class = "alert alert-info" role="alert"><b>Remember</b> to use <b>descriptive variable, function, index </b> and <b> column names</b> in order to get readable code </p>

<p class = "alert alert-info" role="alert"><b>Remember</b> to format your input questions in a pedagogical way to guide the user

The number of stars (\*), (\*\*), (\*\*\*) denotes the difficulty level of the task

---

</div>

<div style="max-width:66ch;">

## 0. Glossary (\*)

You can use an LLM for help and/or search for results online, but try to summarize the meaning in your own words, in order to properly learn the terminologies. I don't expect you to know all of them now, but after this course you should know these and many more by heart.

| Terminology                | Meaning                                                                                                  |
| :------------------------- | -------------------------------------------------------------------------------------------------------- |
| supervised learning        | Model learns from labeled data to make predictions.                                                       |
| unsupervised learning      | Model learns patterns from unlabeled data.                                                                |
| machine learning           | Computers learn from data to make decisions or predictions.                                               |
| deep learning              | Subset of ML using neural networks for complex tasks.                                                      |
| pattern recognition        | Identifying patterns or features in data.                                                                 |
| reinforcement learning     | Learning by interacting with an environment and receiving feedback.                                        |
| data science               | Extracting insights from data using scientific methods.                                                    |
| data engineering           | Preparing and managing data for analysis.                                                                  |
| computer vision            | Teaching computers to interpret visual information.                                                        |
| algorithm                  | Step-by-step procedure for solving problems.                                                              |
| bias                       | Systematic error in predictions due to assumptions or data collection.                                     |
| variance                   | Variability in predictions due to fluctuations in training data.                                           |
| overfitting                | Model captures noise instead of underlying patterns.                                                      |
| underfitting               | Model is too simple to capture data patterns.                                                             |
| gradient descent           | Optimization algorithm to minimize loss.                                                                  |
| transfer learning          | Adapting a model trained on one task to another task.                                                      |
| regression                 | Predicting numerical values.                                                                              |
| classification             | Predicting categories or labels.                                                                          |
| artificial neural networks | Computing systems inspired by brain neurons.                                                              |
| data augmentation          | Increasing dataset size/diversity with transformations.                                                   |
| synthetic data             | Artificially generated data mimicking real-world distributions.                                            |
| regularization             | Techniques to prevent overfitting.                                                                       |
| perceptron                 | Simplest form of neural network.                                                                          |
| qualitative data           | Non-numeric data describing qualities.                                                                   |
| quantitative data          | Numeric data expressing quantities.                                                                      |
| independent variable       | Variable causing changes in another variable.                                                            |
| dependent variable         | Variable influenced by changes in another variable.                                                       |
| label                      | Tag or category assigned to data instance.                                                                |


</div>


<div style="max-width:66ch;">

---

## 1. Machine learning, deep learning and AI

Draw an illustration of how machine learning, deep learning and artificial intelligence relate to each other and explain it with your own words.

</div>


<div style="max-width:66ch;">

---

## 2. Regression and classification

What is the main difference between regression and classification? 

&nbsp; a) Give an example of a problem that can be solved with regression. 
**Predicting house prices based on various features.**

&nbsp; b) Give an example of a problem that can be solved with classification.
**Classifying emails as spam or not spam based on their content.**


The main difference between regression and classification lies in the nature of the output they produce.

Regression deals with predicting a continuous value. It aims to establish a relationship between input variables and a continuous output variable. For example, predicting house prices based on features like size, location, number of bedrooms, etc., is a regression problem.

Classification, on the other hand, deals with predicting a categorical label or class. It categorizes data into different classes or groups based on certain features. For instance, spam email detection is a classification problem where emails are categorized as either spam or not spam based on various features like keywords, sender information, etc.

</div>


<div style="max-width:66ch;">

---

## 3. Scaling data (\*)

What does scaling data mean, and why do some machine learning algorithm require data to be scaled? 
**Scaling data involves adjusting the range of values of features to make them comparable. Some machine learning algorithms require scaled data to ensure equal feature contributions, faster convergence, improved model performance, and accurate distance calculations. Common scaling methods include normalization and standardization.**
</div>


<div style="max-width:66ch;">

---

## 4. Train|test split (\*)

What is the purpose to split the data into a training part and a test part? 
**Splitting the data into training and test sets helps evaluate how well a machine learning model generalizes to new, unseen data. It prevents overfitting, allows for tuning model parameters, and assesses the model's real-world performance.**

</div>


<div style="max-width:66ch;">

---

## 5. Data leakage (\*)

What is data leakage, why is it bad and how can you avoid it?
**Data leakage occurs when information from outside the training dataset is used to create a model, leading to overly optimistic performance metrics. It's bad because it results in models that perform poorly on new data. To avoid it, ensure strict separation of training and test data, be cautious during feature engineering, and use cross-validation for evaluation.**
</div>


<div style="max-width:66ch;">

---
## 6. Cross-validation (\*)

How does cross-validation work, and when is it good to use cross-validation?
**Cross-validation involves splitting the data into multiple subsets, training the model on a subset, and validating it on the remaining data. It's good to use when you have limited data or want to assess model performance without relying on a single train-test split, helping to obtain a more robust estimate of performance.**

</div>


<div style="max-width:66ch;">

---

## 7. Confusion matrix (\*)

When can you and should you use a confusion matrix? 
**Confusion matrices help in assessing model accuracy, identifying areas of improvement, and determining the presence of class imbalances or misclassifications. They are especially valuable when you need to dive deeper into the performance of a classifier beyond simple accuracy metrics.**

---

</div>


<div style="width: 66ch;">


</div>

<div style="background-color: #FFF; color: #212121; border-radius: 20px; width:25ch; box-shadow: rgba(0, 0, 0, 0.16) 0px 1px 4px; display: flex; justify-content: center; align-items: center;">
<div style="padding: 1em; width: 60%;">
    <h2 style="font-size: 1.2rem;">Kokchun Giang</h2>
    <a href="https://www.linkedin.com/in/kokchungiang/" target="_blank" style="display: flex; align-items: center; gap: .4em; color:#0A66C2;">
        <img src="https://content.linkedin.com/content/dam/me/business/en-us/amp/brand-site/v2/bg/LI-Bug.svg.original.svg" width="20"> 
        LinkedIn profile
    </a>
    <a href="https://github.com/kokchun/Portfolio-Kokchun-Giang" target="_blank" style="display: flex; align-items: center; gap: .4em; margin: 1em 0; color:#0A66C2;">
        <img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" width="20"> 
        Github portfolio
    </a>
    <span>AIgineer AB</span>
    <div>
</div>