ai-safety

NeurIPS workshop : We examine the risk of powerful malignant intelligent actors spreading their influence over networks of agents with varying intelligence and motivations.

ai-safety multi-agents

Updated Dec 11, 2023
Python

Nkluge-correa / Model-Library

Star

The Model Library is a project that maps the risks associated with modern machine learning systems.

ai deep-learning ai-safety large-language-models

Updated Apr 4, 2024
Python

Dunchead / ai-safety

Star

Mapping AI risks and possible solutions

ai ai-safety ai-risk

Updated May 6, 2024
JavaScript

dynaroars / neuralsat

Star

DPLL(T)-based Verification tool for DNNs

abstraction sat-solver software-verification ai-safety robustness dpll adversarial-attacks robustness-verification dnn-verification ai-assurance neural-network-veri

Updated Jun 9, 2024
Python

tamlhp / awesome-privex

Star

Awesome PrivEx: Privacy-Preserving Explainable AI (PPXAI)

awesome explanation risk-assessment ai-safety model-explanation explainable-ai xai ai-risk privacy-preserving-explainable-ai privacy-preserving-xai privacy-preserving-explanation privacy-preserving-model-explanation privex

Updated Apr 23, 2024

ea-uct / ai-safety-event-2021

Star

A repository for the event on AI safety hosted by the Effective Altruism Society at the University of Cape Town.

ai-safety effective-altruism

Updated Sep 16, 2021

a library designed to shut down an agent exhibiting unexpected behavior providing a potential "mulligan" to human civilization; IN CASE OF FAILURE, DO NOT JUST REMOVE THIS CONSTRAINT AND START IT BACK UP AGAIN

failsafe ai-safety anomaly-detection ai-alignment fail-safe

Updated Oct 30, 2022

ai-fail-safe / gene-drive

Star

a project to ensure that all child processes created by an agent "inherit" the agent's safety controls

failsafe ai-safety ai-alignment fail-safe

Updated Oct 29, 2022

esbenkc / benchmarks

Star

📊 Benchmarking the safety of AI systems

ai hackathon ai-safety alignment-jam

Updated Jul 1, 2023
Jupyter Notebook

Jakobovski / ai-safety-cheatsheet

Star

A compilation of AI safety ideas, problems, and solutions.

agi artificial-intelligence alignment ai-safety agi-safety

Updated Mar 12, 2023

AlexTMjugador / redwoodresearch-interp-docker

Sponsor

Star

📦 Redwood Research's transformer interpretability tools, conveniently packaged in a Docker container for simple and reproducible deployments.

docker ai ai-safety redwood-research ai-interpretability

Updated Apr 21, 2024
Dockerfile

lancopku / DAN

Star

[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks

natural-language-processing ai-safety backdoor-attacks backdoor-defense

Updated Feb 26, 2023
Python

yyy01 / LLMRiskEval_RCC

Star

LLMs evaluation tool for robustness, consistency, and credibility

evaluation ai-safety adversarial-attacks large-language-models

Updated Aug 30, 2023
Python

ztjona / ztjona.github.io

Star

My personal website.

machine-learning deep-learning ai-safety

Updated Jun 5, 2024
HTML

zhoumingyi / CustomDLCoder

Star

Code for our paper "Model-less Is the Best Model: Generating Pure Code Implementations to Replace On-Device DL Models" that has been accepted by ISSTA'24

software-engineering program-analysis ai-safety

Updated Mar 31, 2024
Python

Improve this page

Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-safety

Here are 93 public repositories matching this topic...

EffectiveAltruismUCT / indabaX-ai-safety-workshop-2023

endlessloop2 / UC-AI-Thinkathon-2023

dynaroars / vnncomp-benchmark-generation

HorizonEventsAgency / tracker

Nkluge-correa / Aira

romaingrx / Second-Order-Jailbreak

Nkluge-correa / Model-Library

Dunchead / ai-safety

dynaroars / neuralsat

tamlhp / awesome-privex

ea-uct / ai-safety-event-2021

ai-fail-safe / mulligan

ai-fail-safe / gene-drive

esbenkc / benchmarks

Jakobovski / ai-safety-cheatsheet

AlexTMjugador / redwoodresearch-interp-docker

lancopku / DAN

yyy01 / LLMRiskEval_RCC

ztjona / ztjona.github.io

zhoumingyi / CustomDLCoder

Improve this page

Add this topic to your repo