# Week 0 - Problem Set

Practice what you learnt about the fundamentals of Machine Learning with these problems.

__1. In a markdown block, create an IPO (Input-Process-Output) table for a traditional algorithm that converts Celsius to Fahrenheit.__

Hint: The formula is `F = (C Ã— 9/5) + 32`


|input|process|output|
|---|---|---|
|temperature in celsius (C)| C * 9/5 + 32 = F| temperature in freedom units (F) |

__2. In a markdown block, create an IPO table for a machine learning model that predicts temperature conversion. How would this differ from the traditional algorithm?__

|input|process|output|
|---|---|---|
|temperature in celsius|machine learning algorithm (black box)|output based on training data|

__3. Explain in your own words what a "black box" means in the context of machine learning. Why might this be a problem?__

A 'black box' is a system where the inputs and outputs are shown, but the inner workings of the algorithm are either hidden or too complex to understand, used in machine learning algorithms due to how the models' algorithms work. 

This can be an issue, as it can prevent people from knowing *how* a machine learning model arrived at the output it did, which can make it harder to solve issues within the model's algorithm. 

__4. A company wants to build a machine learning model to predict which job applicants to interview. They plan to train it on 5 years of historical hiring data. Identify TWO potential sources of bias in this approach.__

Two potential sources of bias that the model could encounter in this approach would be human bias, as it would collect the biases/prejudices of the previous hiring data (which could be outdated compared to new hiring requirements), and algorithm bias, as machine learning models can be prone to error, and may not pick up on the nuances that would give someone the opportunity to be interviewed. 

__5. Match each ethical principle with its description:__

| Principle | Description |
|-----------|-------------|
| A. Fairness |  1. Decisions can be explained and understood |
| B. Transparency |  2. Someone is responsible for the model's outcomes |
| C. Accountability |  3. Personal information is protected |
| D. Privacy |  4. Model treats all groups equitably |

1 - B

2 - C

3 - D

4 - A


__6. Look at the code below. Explain why the ML model doesn't give exactly 6 as the answer, even though 4 + 2 = 6.__

In [6]:
import numpy as np
from sklearn.linear_model import LinearRegression

# Training data
X = np.array([[1, 1], [2, 2], [3, 1], [5, 3]])
y = np.array([2, 4, 4, 8])

model = LinearRegression()
model.fit(X, y)

prediction = model.predict([[4, 2]])[0]
print(f"ML Model says 4 + 2 = {prediction:.4f}")
print(f"Actual answer: 4 + 2 = 6")

ML Model says 4 + 2 = 6.0000
Actual answer: 4 + 2 = 6


The model's answer is given with these decimals as it is probabilistic, and may not be exactly equal to 6 (hence the decimals)

__7. A facial recognition system is trained primarily on images of people with light skin. What type of bias is this, and what problems might it cause?__

This is an example of data bias, and can cause issues if someone with darker skin were to use said system. 

__8. Consider the following scenario:__

A hospital uses a machine learning model to predict which patients need urgent care. The model was trained on historical data where patients from wealthier areas received more tests and therefore had more documented health issues.

a) What type of bias might this model have?

b) How might this affect patients from less wealthy areas?

c) What ethical principle is being violated?

a. data bias (and likely human bias as well)

b. This will cause issues in giving urgent care to patients from poorer areas, as the algorithm will likely assume that as they have had fewer tests (and this fewer diagnosis), they are less in need of urgent care - leading to an increased number of preventable deaths

c. This causes the machine learning model to violate principles of fairness and safety - as it disadvantages those who are less wealthy, and mistakes will likely end in death

__9. Complete the table comparing Traditional Programming and Machine Learning:__

| Aspect | Traditional Programming | Machine Learning |
|--------|------------------------|------------------|
| How rules are created | Written by humans | ? |
| Outputs | Deterministic | ? |
| Can improve without reprogramming | No | ? |
| Transparency | ? | Often a black box |
| Best for | Problems with clear rules | ? |


| Aspect | Traditional Programming | Machine Learning |
|--------|------------------------|------------------|
| How rules are created | Written by humans | trained by humans, output determined using neural networks |
| Outputs | Deterministic | prediction based on past data |
| Can improve without reprogramming | No | yes? |
| Transparency | Dependant on who is programming, larger companies often lack transparency with their programs, however many programmers choose to make their programs open-source, which allows anyone to view and modify the code | Often a black box |
| Best for | Problems with clear rules | predicting and analysing data which is too difficult to be done by humans |

__10. Read the following case study and answer the questions:__

### Case Study: Predictive Policing

A city implements a machine learning system to predict where crimes are likely to occur. The system is trained on historical arrest data from the past 10 years. Police are then sent to patrol areas the model predicts will have high crime.

**Questions:**

a) If certain neighborhoods have historically been over-policed (more police presence leading to more arrests), how might this affect the model's predictions?

b) Could this create a feedback loop? Explain how.

c) What questions should be asked before deploying such a system?

d) Who should be accountable if the system leads to unfair outcomes?

a. Areas with higher police presence would have more arrests, leading the model to assume that these areas also have a higher crime rates. 

b. This can create a feedback loop, as assuming an area has a higher crime rate would lead to the assumption that the area needs a greater police presence. This leads to more arrests being made, creating a feedback loop (which feeds back into the ai as it is updated with more relevant data). 

c. Before creating and implementing such a system, the bias and ethical issues of such an algorithm should be taken into account. The model may have biased data (as shown above), which can lead to issues of fairness (as areas which are less weathy and/or have more people of colour can have a higher arrest rate due to prejudices in the police system), and safety (as this could lead to wrongful arrests, which severely impact people's lives, or crime in other areas to get less attention, etc)

d. If such a system lead to unfair outcomes (which it likely would), those who programmed and trained this algorithm should be held accountable (for not considering ethical implications), as well as those who choose to use said model (especially if they blindly follow the decisions the model makes, instead of thinking critically about it's outputs). 

__11. EXTENSION: Research one real-world example of bias in a machine learning system (not mentioned in the lesson). Write a short paragraph describing:__

- What the system was designed to do
- What bias was discovered
- What the consequences were
- How it could have been prevented