Skip to content

aelena/trustworthy-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 

Repository files navigation

Towards a Trustworthy AI Program

This repository outlines a comprehensive program for setting up an in-house Trustworthy AI initiative or capability group. It spans areas beyond technology itself, including ethics, law, social sciences, and philosophy. The goal is to build a Body of Knowledge (BoK) for AI auditors within organizations.

flowchart TD
    A("fa:fa-book-open AI Auditor BoK") -- Builds --> B("fa:fa-code Technical Expertise")
    A -- Follows --> C("fa:fa-comment-dots Regulatory & Ethics")
    A -- Leverages --> D("fa:fa-shapes Auditor Skills")
    B -- Feeds into --> E("Organizational AI&ML BoK")
    C -- Feeds into --> E
    D -- Feeds into --> E
    E -- Builds --> F("Trustworthy AI Capability")

    style B color:#424242, fill:#AA00FF, stroke:#AA00FF,fill:#BBDEFB,stroke:#FFF9C4
    style C color:#FFFFFF, stroke:#00C853, fill:#00C853,fill:#2962FF,stroke:#FFF9C4
    style D color:#FFFFFF, stroke:#2962FF, fill:#2962FF,fill:#757575,stroke:#FFF9C4
    style F color:#FFFFFF, fill:#AA00FF, stroke:#AA00FF,fill:#FF6D00,stroke:#FFF9C4
Loading
Disclaimer

Although I harbour encyclopedical ambitions, this tsundoku-ish repo can only be a work in progress, part learning journey, part intellectual pursuit, and does not intend to be a final one stop shop.

The ultimate goal is to build a BoK for a team of AI Auditors inside an organization, according to the definition of Body of Knowledge offered by Wikipedia.

No affiliation links whatsoever.



Table of Contents



1. Foundations

What is Trustworthy AI

Detailed exploration of the concept

The topic of Trustworthy AI has garnered significant attention due to the rapid development and deployment of AI technologies. This interest is driven by the need to ensure that AI systems are safe, fair, explainable, and accountable.

Six crucial dimensions in achieving trustworthy AI:

  • Safety & Robustness
  • Nondiscrimination & Fairness
  • Explainability
  • Privacy
  • Accountability & Auditability
  • Environmental Well-being

Key references:

Core Principles

Detailed principles page

The five foundational ethical principles for AI:

  1. Beneficence - AI should be designed to benefit humanity
  2. Non-maleficence - AI should not cause harm
  3. Autonomy - AI should respect human agency and decision-making
  4. Justice - AI should be fair and non-discriminatory
  5. Explicability - AI decisions should be explainable and transparent

AI/ML Fundamentals

Get more than a passing familiarity with the underlying technology and main paradigms.



2. AI Development Lifecycle

Detailed lifecycle page

Understanding all stages of the AI & ML Development Lifecycle is critical for auditors. The lifecycle can be thought of in 4 phases:

  1. Phase 1: Before ML - check if non-ML solutions can solve the problem
  2. Phase 2: Simple ML models (logistic regression, gradient-boosted trees, k-NN)
  3. Phase 3: Optimizing simple models (hyperparameter search, feature engineering, ensembles)
  4. Phase 4: Complex models if simpler approaches don't meet requirements

Problem Scoping & Data

Data Quality & Governance

Dedicated page on data quality

Key aspects to evaluate:

  • Data quality - accuracy, completeness, consistency
  • Relevance - alignment with problem scope
  • Contextual appropriateness - time, location, scenario representation
  • Bias and variety - representation across groups
  • Provenance - sourcing, documentation, trustworthiness
  • Evaluating Data Quality
  • A 2024 Survey of ETL tools

Labeling & Augmentation

Dedicated page

Synthetic Data

Dedicated page

Synthetic data is evolving fast with interesting use cases.

Opportunities:

  • Addressing data deficits and representation concerns
  • Privacy protection and bias reduction
  • Economic efficiency vs real-world data collection
  • Compliance requirements

Risks:

References:

Model Training & Alignment

Training Techniques

RLHF & Human Feedback Methods

Dedicated RLHF page

Reinforcement Learning from Human Feedback incorporates human input to enhance AI model training. Key approaches:

  • Binary/Scalar Feedback
  • Comparative RLHF
  • Proximal Policy Optimization (PPO)
  • Direct Preference Optimization (DPO)
  • Constitutional AI
  • RLAIF (AI Feedback)

Model Evaluation & Validation

Deployment & Monitoring

An undeployed model is worthless, and an unmonitored one is a risk.

Key concerns:

  • Conceptual drift - data distribution shifts over time
  • Quality drift - production data differs from training data
  • Infrastructure monitoring - SLAs, failures, latencies, scalability

References:



3. Trustworthiness Dimensions

Transparency & Explainability

Dedicated transparency page | Algorithmic transparency page

Techniques to make AI models interpretable and decisions understandable:

Fairness & Bias

Dedicated bias page | Types of AI bias

Privacy & Security

Differential Privacy

Dedicated page

Adversarial Attacks & Defenses

Dedicated attacks page

MLSecOps

Dedicated page

Robustness & Reliability

Safety & Alignment

AI Safety Fundamentals

  • AI Alignment - making AI systems do what humans want without unintended side effects
  • Risk Assessment - identifying and mitigating AI risks
  • Fail-safe Mechanisms - graceful degradation strategies
  • GenAI Safety - NIST GenAI Risk Management Profile
  • Long-term Safety - AGI considerations (Roman V. Yampolskiy's work)

Agentic AI Safety

Dedicated page

As AI systems become more autonomous, new safety considerations emerge:

  • Agent autonomy and oversight
  • Tool use and function calling risks
  • Multi-agent coordination
  • Sandboxing and containment
  • Human-in-the-loop requirements

Frontier Model Evaluations

Evaluating capabilities and risks of frontier AI models:

  • Capability Evaluations - dangerous capability assessments
  • Uplift Studies - measuring capability gains from AI assistance
  • Automated Red Teaming - AI testing AI
  • Pre-deployment Testing - safety assessments before release

Red Teaming

Red teaming overview

AI/ML Red Teaming identifies vulnerabilities and weaknesses before exploitation:

Vendor Tools:



4. Governance & Regulation

Legal Frameworks

Sector-Specific Regulations:

More on legal aspects

Organizational Governance

Ethics Frameworks

ISO Standards:

Sustainability & Environmental Impact

Dedicated page

Key topics:

  • Understanding sustainability concerns around AI/ML models
  • Tools and techniques to measure environmental impact
  • Carbon footprint of training and inference
  • Green AI practices


5. Auditing & Assessment

Systematic Auditing of AI Models

Challenges page

Comprehensive auditing frameworks must:

  • Consider multiple dimensions: governance, strategy, performance, monitoring, review
  • Cover both technical aspects and ethical considerations
  • Adhere to evolving standards (ISO/IEC 42001:2023, EU AI Act, etc.)
  • Evaluate monitoring metrics and remediation procedures
  • Include the entire AI lifecycle
  • Account for stakeholder interests and ethical metrics

Key Challenges:

  • Absence of standardized frameworks
  • Rapidly evolving field requiring continuous learning
  • AI system complexity and black-box nature
  • Different regulatory requirements across jurisdictions
  • Skills gap in the industry
  • Difficulty validating massive training datasets

References:

Audit Process & Methodology

Detailed process page

AI Assurance

Dedicated assurance page

AI assurance provides confidence that an AI system is designed, developed, and deployed responsibly. Key aspects:

  • Independent evaluation
  • Criteria-based assessment
  • Transparency
  • Accountability

Audit Planning and Scoping

Audit Execution Techniques

  • Data sampling and analysis - examining training and test data for bias, quality, representativeness
  • Data lineage and provenance - integrity verification
  • Model evaluation and testing - LIME, SHAP, adversarial testing, stress testing
  • Source code and architecture review - security vulnerabilities

Tools & Techniques

Evidence gathering techniques

Documentation Standards

Specialized Auditing Skills

AI Performance Metrics

Programming for AI Auditing

  • Basic Python for data analysis and model inspection
  • Libraries: AI Fairness 360, SHAP, LIME
  • Understanding different roles: data scientist, AI product owner

Soft Skills for AI Auditors

Critical & Ethical Decision Making

AI ethics literature has converged on 5 core principles: transparency, justice and fairness, non-maleficence, responsibility, and privacy.

  • Ability to critically evaluate AI-generated outputs
  • Healthy skepticism towards AI insights
  • Navigating ethical dilemmas in auditing

References:

Communication and Stakeholder Management

  • Explaining technical concepts to non-technical audiences
  • Communicating with stakeholders of varying AI literacy
  • Negotiation and conflict resolution in audit scenarios
  • Sector-specific knowledge


6. Resources

Code Examples

Practical Python implementations of key Trustworthy AI techniques are available in the code/ folder:

Topic File Libraries
Bias Testing bias_testing.py AIF360, Fairlearn
Explainability explainability.py SHAP, LIME
Adversarial Testing adversarial_testing.py ART, PyRIT
Evaluation Frameworks eval_frameworks.py Inspect AI, Custom
Differential Privacy differential_privacy.py Opacus, TensorFlow Privacy

Each file contains verbose explanations of the underlying concepts, practical runnable examples, and best practices for production use.

Tools, Templates & Checklists

Commercial Auditing Tools

Training & Certifications

Training:

Certifications:

Books & Papers

Vendor Resources

About

Repository containing useful links and content for trustworthy AI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages