Biases in data and models.

This repository explores the topic of biases and abuses in data and aims to study their effects on various experiments. The experiments will be conducted using Jupyter Notebook to analyze and understand the impact of biases in data and find ways to minimize them.

Tech Stack

Keras with TensorFlow
Numerical Python Stack
Word2Vec
Scikit-Learn
Jupyter

Datasets

Introduction

In today's data-driven world, it is crucial to be aware of the biases and abuses that can exist within datasets. Biases can arise from various sources, such as data collection methods, sampling techniques, or even human judgment. These biases can lead to skewed results and unfair outcomes, impacting decision-making processes and perpetuating inequalities.

The purpose of this project is to shed light on the presence of biases and abuses in data & trained model and explore ways to mitigate their effects.

Topics to explore

Bias in Natural Language Processing models.
Convolutional Neural Network Manifold Learning.
Global Black-box Explanation.
Local Black-box Explanation.
FairML

Biases in Data

Biases in data can occur in different forms, including:

Selection Bias: When certain groups or characteristics are overrepresented or underrepresented in the dataset due to biased sampling methods.
Confirmation Bias: When data is selectively collected or interpreted to support preconceived notions or beliefs.
Measurement Bias: When measurement instruments or techniques introduce systematic errors or inaccuracies.
Cultural Bias: When data reflects the biases and perspectives of a particular culture or group.

Experimental Setup

The experiments will be conducted using Jupyter Notebook, a popular tool for data analysis and visualization. The datasets used in the experiments will be carefully selected to highlight different types of biases and potential abuses. The code and analysis will be documented in the Jupyter Notebook files provided in this repository.

Results and Analysis

The results obtained from the experiments will be analyzed to identify the presence and impact of biases in the data. Various statistical techniques and machine learning algorithms will be used to quantify and understand the biases. Additionally, strategies and methodologies to minimize biases and improve the fairness of the data will be explored.

Conclusion

By studying biases and abuses in data, I aim to raise awareness about their existence and impact on decision-making processes. Through rigorous experimentation and analysis, I strive to develop best practices and guidelines to minimize biases and promote fairness in data-driven applications.

Please refer to the Jupyter Notebook files in this repository for detailed experiments, code, and analysis.

Insights

In the form of Critical Questions/Discussions at the end of each Notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Bias in NLP		Bias in NLP
CNN manifold learning		CNN manifold learning
FairML		FairML
Global Black-box explanation		Global Black-box explanation
Local Black-box		Local Black-box
README.MD		README.MD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Biases in data and models.

Tech Stack

Datasets

Introduction

Topics to explore

Biases in Data

Experimental Setup

Results and Analysis

Conclusion

Insights

About

Releases

Packages

Languages

arvinsingh/biases-in-data

Folders and files

Latest commit

History

Repository files navigation

Biases in data and models.

Tech Stack

Datasets

Introduction

Topics to explore

Biases in Data

Experimental Setup

Results and Analysis

Conclusion

Insights

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages