ReFT: Representation Finetuning for Language Models
-
Updated
May 18, 2024 - Python
ReFT: Representation Finetuning for Language Models
Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.
Fit interpretable models. Explain blackbox machine learning.
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
Local Universal Rule-based Explanations
A curated list of awesome responsible machine learning resources.
[ICML'24] Official PyTorch Implementation of TimeX++
Code accompanying a review article on interpretability and XAI. Includes examples for both simple (sparse regression) and sophisticated (concept bottlenecks) approaches, using notebooks that can be run in a few minutes.
Code for paper: Are Large Language Models Post Hoc Explainers?
A game theoretic approach to explain the output of any machine learning model.
Official Implementation of the paper guided attention for interpretable motion captioning
Robust multimodal brain registration via keypoints
🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals
Running interpretability experiments with application to weak-to-strong generalization
A collection of anomaly detection methods (iid/point-based, graph and time series) including active learning for anomaly detection/discovery, bayesian rule-mining, description for diversity/explanation/interpretability. Analysis of incorporating label feedback with ensemble and tree-based detectors. Includes adversarial attacks with Graph Convol…
Creating a PyTorch LSTM to classify movies by genre and visualizing the model's reasoning process
The website for NDIF, the National Deep Inference Fabric
Interpretability for sequence generation models 🐛 🔍
TrustyAI Explainability Toolkit
Add a description, image, and links to the interpretability topic page so that developers can more easily learn about it.
To associate your repository with the interpretability topic, visit your repo's landing page and select "manage topics."