# Machine Learning

1. For each of the following examples describe at least one possible input and
output. Justify your answers:  
* 1.1 A self-driving car
* 1.2 Netflix recommendation system
* 1.3 Signature recognition
* 1.4 Medical diagnosis


1. Answer here

| ---- | ----------- | ---------  |

| 1.1  | A self-driving car.

Input(s): Camera and sensor data from the car's surroundings, such as images, lidar data, GPS coordinates, and speedometer readings.          
Output: Control signals to the car's steering, acceleration, and braking systems to navigate safely to a destination.
Justification: The self-driving car needs to continuously analyse its environment and make decisions based on real-time data to navigate roads safely and efficiently.

| 1.2  | Netflix recommendation system.

Input: User viewing history, ratings, browsing behavior, genre preferences, and demographic information.
Output: Personalised movie or TV show recommendations tailored to each user's tastes and preferences.
Justification: Netflix uses machine learning algorithms to analyse user data and provide recommendations that increase user engagement and satisfaction.

| 1.3  | Signature recognition.

Input: Images or scans of handwritten signatures.
Output: Verification or authentication of the signature, indicating whether it matches a reference signature or belongs to an authorised individual.
Justification: Signature recognition systems are used for security and authentication purposes, such as verifying the identity of individuals during financial transactions or document signings.

| 1.4  | Medical diagnosis.

Input: Patient medical history, symptoms, physical examination findings, laboratory test results, imaging scans (e.g., X-rays, MRIs), and genetic data.
Output: Diagnosis of the patient's medical condition, including identification of diseases, disorders, or health risks, along with recommended treatments or interventions.
Justification: Medical diagnosis systems analyse complex and diverse data to assist healthcare professionals in identifying and treating diseases accurately and efficiently, ultimately improving patient outcomes.


2. For each of the following case studies, determine whether it is appropriate to utilise regression or classification machine learning algorithms. Justify your answers:
* 2.1 Classifying emails as promotion or social based on their content and metadata.
* 2.2 Forecasting the stock price of a company based on historical data and market trends.
* 2.3 Sorting images of animals into different species based on their visual features.
* 2.4 Predicting the likelihood of a patient having a particular disease based on medical history and diagnostic test results.

2. Answer here
* 2.1 Classifying emails as promotion or social based on their content and metadata:

Appropriateness: Classification algorithm.

Justification: This is a binary classification problem where the goal is to categorise emails into one of two classes (promotion or social). Classification algorithms are suitable for this task as they can learn patterns from labeled data and make predictions based on features extracted from email content and metadata.

* 2.2 Forecasting the stock price of a company based on historical data and market trends:

Appropriateness: Regression algorithm.

Justification: Stock price prediction involves predicting a continuous variable (i.e., the price of a stock) based on historical data and market trends. Regression algorithms are suitable for this task as they can learn the relationship between input features (e.g., historical stock prices, market indicators) and the target variable (i.e., stock price) to make predictions.

* 2.3 Sorting images of animals into different species based on their visual features:

Appropriateness: Classification algorithm.

Justification: This is a multi-class classification problem where the goal is to assign images of animals to one of several species categories based on their visual features. Classification algorithms are appropriate for this task as they can learn to distinguish between different classes and assign new images to the most likely category.

* 2.4 Predicting the likelihood of a patient having a particular disease based on medical history and diagnostic test results:

Appropriateness: Classification algorithm.

Justification: This is a binary or multi-class classification problem depending on the number of diseases being considered. The goal is to predict whether a patient is likely to have a specific disease based on their medical history and diagnostic test results. Classification algorithms are suitable for this task as they can learn from historical data to make predictions about the presence or absence of a disease.

3. For each of the following real-world problems, determine whether it is appropriate to utilise a supervised or unsupervised machine learning algorithm. Justify your answers:
* 3.1 Detecting anomalies in a manufacturing process using sensor data without prior knowledge of specific anomaly patterns.
* 3.2 Predicting customer lifetime value based on historical transaction data and customer demographics.
* 3.3 Segmenting customer demographics based on their purchase history, browsing behaviour, and preferences.
* 3.4 Analysing social media posts to categorise them into different themes.


3. Answer here
* 3.1 Detecting anomalies in a manufacturing process using sensor data without prior knowledge of specific anomaly patterns.

Appropriateness: Unsupervised learning algorithm.

Justification: Since there is no prior knowledge of specific anomaly patterns, unsupervised learning is suitable for anomaly detection. Techniques like clustering or autoencoders can be used to identify patterns in the sensor data and detect deviations that indicate anomalies.

* 3.2 Predicting customer lifetime value based on historical transaction data and customer demographics.

Appropriateness: Supervised learning algorithm.

Justification: This is a regression problem where the goal is to predict a continuous value (customer lifetime value) based on input features (historical transaction data and customer demographics). Supervised learning algorithms are appropriate as they can learn the relationship between the features and the target variable from labeled data.

* 3.3 Segmenting customer demographics based on their purchase history, browsing behaviour, and preferences.

Appropriateness: Unsupervised learning algorithm.

Justification: Customer segmentation is typically an unsupervised learning task where the goal is to group customers into segments based on their similarities in purchase history, browsing behavior, and preferences. Clustering algorithms like K-means or hierarchical clustering are commonly used for this purpose.

* 3.4 Analysing social media posts to categorise them into different themes.

Appropriateness: Supervised learning algorithm.

Justification: Categorising social media posts into different themes is a classification problem that requires labeled data. Supervised learning algorithms are appropriate as they can learn from labeled examples to categorise new posts into predefined themes. If the themes are not predefined and need to be discovered, an unsupervised approach like topic modeling could also be considered.

4.
For each of the following real-world problems, determine whether it is appropriate to utilise semi-supervised machine learning algorithms. Justify your answers:
* 4.1 Predicting fraudulent financial transactions using a dataset where most transactions are labelled as fraudulent or legitimate.
* 4.2 Analysing customer satisfaction surveys where only a small portion of the data is labelled with satisfaction ratings.
* 4.3 Identifying spam emails in a dataset where the majority of emails are labelled.
* 4.4 Predicting the probability of default for credit card applicants based on their complete financial and credit-related information.


4. Answer here
* 4.1 Predicting fraudulent financial transactions using a dataset where most transactions are labelled as fraudulent or legitimate.

Appropriateness: No, supervised learning algorithm.

Justification: Since most transactions are already labeled as fraudulent or legitimate, a supervised learning algorithm is appropriate. The model can learn directly from the labeled data, which is abundant, to make accurate predictions.

* 4.2 Analysing customer satisfaction surveys where only a small portion of the data is labelled with satisfaction ratings.

Appropriateness: Yes, semi-supervised learning algorithm.

Justification: Since only a small portion of the data is labeled, semi-supervised learning is suitable. It can leverage the large amount of unlabeled data along with the small labeled set to improve the learning process and make better predictions on the overall dataset.

* 4.3 Identifying spam emails in a dataset where the majority of emails are labelled.

Appropriateness: No, supervised learning algorithm.

Justification: Since the majority of emails are labeled as spam or not spam, supervised learning is appropriate. The model can be trained effectively using the labeled data to classify new emails accurately.

* 4.4 Predicting the probability of default for credit card applicants based on their complete financial and credit-related information.

Appropriateness: No, supervised learning algorithm.

Justification: Typically, in credit risk modeling, there is sufficient labeled data (historical records of defaults and non-defaults). Supervised learning algorithms are well-suited to learn from this labeled data to predict the probability of default for new applicants. If there were a large amount of unlabeled data, semi-supervised learning could be considered, but it's usually not the case in this scenario.