#### About

> Bias and Fairness

Bias and fairness are important ethical considerations in machine learning, related to the existence of systematic errors in predictions and the fairness of different groups of machine learning models. Bias refers to systematic errors in a model's predictions, while fairness refers to the model's fair treatment of different groups, regardless of their demographic or other characteristics. Machine learning models learn patterns from data, and if the data used to train the model is skewed, the model's predictions may also be skewed. Bias in machine learning models can be of various types, such as sampling bias, measurement bias, label bias, and algorithm bias. Improper predictions can lead to unfair treatment of different groups, leading to discrimination, inequality and injustice.


For example, suppose a company uses a machine learning model to screen job candidates based on resumes. The model was trained on a dataset of previous job applicants, which was skewed by a predominance of male job applicants and a lack of diversity in terms of gender, race and other demographics. The model learns skewed patterns from this data and can produce biases in its predictions. Therefore, the model may favor male applicants over female applicants or applicants of certain races. Such bias can lead to unfair treatment and discrimination against women or applicants from disadvantaged groups, resulting in an unfair and discriminatory recruitment process. > Various techniques can be used to address bias and fairness issues in machine learning, such as:

1. Bias Reduction Techniques: Techniques that aim to reduce or eliminate bias in the training data or model, such as resampling, reweighting, adversarial training, and fairness-aware machine learning algorithms. 
2. Equity awareness assessment metrics: Metrics that measure the validity of model predictions, such as demographic parity, equality of opportunity, and equality of opportunity. 

3. Constructing means: Techniques that involve careful selection or construction of items to reduce bias and ensure fairness to different groups. 
4. Transparency and explainability. Ensure that machine learning models are transparent and explainable so that bias and unfair treatment can be identified and eliminated. 
5. Ethical considerations. Incorporate ethical considerations into the design, development, and deployment of machine learning models, such as ensuring a diverse representation of training data, recognizing potential biases, and regularly reviewing and monitoring model integrity.







In [9]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
from sklearn.datasets import fetch_openml


In [24]:
# Load the "Adult" dataset
adult = fetch_openml(name='adult', version=2)

  warn(


In [26]:
# Convert the dataset to a pandas DataFrame
df = pd.DataFrame(adult['data'], columns=adult['feature_names'])
y = adult['target']


In [29]:
# Perform one-hot encoding for categorical features
X = pd.get_dummies(df)

In [30]:
X

Unnamed: 0,age,fnlwgt,education-num,capital-gain,capital-loss,hours-per-week,workclass_Federal-gov,workclass_Local-gov,workclass_Never-worked,workclass_Private,...,native-country_Portugal,native-country_Puerto-Rico,native-country_Scotland,native-country_South,native-country_Taiwan,native-country_Thailand,native-country_Trinadad&Tobago,native-country_United-States,native-country_Vietnam,native-country_Yugoslavia
0,25.0,226802.0,7.0,0.0,0.0,40.0,0,0,0,1,...,0,0,0,0,0,0,0,1,0,0
1,38.0,89814.0,9.0,0.0,0.0,50.0,0,0,0,1,...,0,0,0,0,0,0,0,1,0,0
2,28.0,336951.0,12.0,0.0,0.0,40.0,0,1,0,0,...,0,0,0,0,0,0,0,1,0,0
3,44.0,160323.0,10.0,7688.0,0.0,40.0,0,0,0,1,...,0,0,0,0,0,0,0,1,0,0
4,18.0,103497.0,10.0,0.0,0.0,30.0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48837,27.0,257302.0,12.0,0.0,0.0,38.0,0,0,0,1,...,0,0,0,0,0,0,0,1,0,0
48838,40.0,154374.0,9.0,0.0,0.0,40.0,0,0,0,1,...,0,0,0,0,0,0,0,1,0,0
48839,58.0,151910.0,9.0,0.0,0.0,40.0,0,0,0,1,...,0,0,0,0,0,0,0,1,0,0
48840,22.0,201490.0,9.0,0.0,0.0,20.0,0,0,0,1,...,0,0,0,0,0,0,0,1,0,0


In [31]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [32]:
model = LogisticRegression()
model.fit(X_train, y_train)


In [33]:
y_pred = model.predict(X_test)


In [34]:
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

Accuracy: 0.8037


In [35]:
report = classification_report(y_test, y_pred)
print("Classification Report:")
print(report)








Classification Report:
              precision    recall  f1-score   support

       <=50K       0.81      0.97      0.88      7479
        >50K       0.71      0.27      0.39      2290

    accuracy                           0.80      9769
   macro avg       0.76      0.62      0.64      9769
weighted avg       0.79      0.80      0.77      9769



This dataset may contain biases as it may not be fully representative of the real-world population. For example, it may have imbalanced representation of certain demographic groups, leading to biased predictions. 

To address bias and fairness in this example, we can use techniques such as re-sampling, re-weighting, or fairness-aware machine learning algorithms to mitigate the bias in the training data or model. 