# Fairness

### What is fairness?
An algorithm is considered unfair if it negatively impacts or disproportionately harms a group of people, such as those defined by race, gender, disability, ethnicity, etc.
Fairness related harms include:
- Allocation- if one group is favored over another
- Quality of service- if data is trained for one specific scenario but is then applied to a more complex scenario, it may lead to a poor performing service
- Stereotyping- associating a group with pre assigned attributes
- Denigration- to unfairly criticize and/or label something/someone
- Over/under representation- self explanatory

### Detecting and Mitigating Unfairness
Unfairness can be caused by a few different things, such as:
- over reliance on historical data where certain groups may have been over and under represented
- general lack of representation in data
- unfair assumptions made during development

To mitigate unfairness, you can:
- identify harms and benefits
	- false negatives (reject but y=1)- hypothesis is actually true but assumed false
	- false positives (accept but y=0)- hypothesis is actually false but assumed true
- identify the affected groups
- define fairness metrics

The open source python package Fairlearn can be used to assess and mitigate a system's fairness

# Techniques of Machine Learning

High level overview of creating machine learning processes
1. Decide on the question- start by asking a question that cannot be answered by simple conditional/rules based programs
2. Collect and Prepare Data- quality and quantity of data determine how well initial question can be answered. visualizing data is important here. also includes splitting into training and testing sets to build a model
3. Choose training method-depending on question and nature of the data, choose how to train a model to best reflect the data and make accurate predictions
4. train the model- use various algorithms to train a model to recognize patterns in the data
5. evaluate the model- use testing data to see how well your model is performing
6. parameter tuning- based on model performance, redo the process with different parameters which control the behavior of the algorithms used to train the model
7. predict- use new inputs to test the accuracy of the model

## Pre-building tasks

### Data
- collect data- be aware of sources and bias. Document origin
- prepare data- there can be several steps involved in this
	- collate data and normalize it if it comes froom diversive sources
	- improve quality and quantity through various methods (like converting strings to numbers), or generate new data based on original data, or randomize and shuffle the data

### Features and Target
-  A feature is a measurable property of the data. it is often a column of the dataset, such as "date" or "size" or "color"
	- features are usually represented as `x` in code, and is the input variable used to train a model
- A target is the thing you are trying to predict, usually represented as `y`

#### Selecting a feature variable
- Feature selection and extraction-
	- feature extraction creates new features from functions of the original features, whereas feature selection returns a subset of existing features

#### Visualizing Data
Visualizing data may help uncover hidden correlations, bias, or unbalanced data.
Can use libraries like seaborn or matplotlib

#### Splitting the Dataset
The data should be split into two or more parts of unequal size which still represent the data well.
- training- this part of the data is fit to your model to train it. should be the majority of the data
- testing- a test dataset is an independent group of data, often gathered from the original data, used to confirm the performance of the built model
- validating- not always needed, but can be used to tune hyperparameters to improve the model

## Building a model
Using training data, build a statistical model of the data, training it with algorithms. Training a model means exposing it to data and allowing it to make assumptions about perceived patterns.
- Decide on training method
- train the model- often `model.fit`

Evaluate the model
- model fitting- refers to the accuracy of the model's underlying function as it attempts to analyze unfamiliar data
- underfitting and overfitting can occur

![image.png](attachment:image.png)