Training and exploration of linear probes into Othello-GPT by Li et al. (2022)
-
Updated
Jun 29, 2023 - Jupyter Notebook
Training and exploration of linear probes into Othello-GPT by Li et al. (2022)
Optimizing Mind static website v1
A collection of infrastructure and tools for research in neural network interpretability.
The repository contains data and scripts for the study "From Prediction Markets to Interpretable Collective Intelligence" by Alexey V. Osipov and Nikolay N. Osipov (arXiv:2204.13424 [cs.GT]).
Interpretations on the HPA dataset.
Summarize "Interpretable Machine Learning" book.
Investigation of state space model interpretability using SHAP (SHapley Additive exPlanations), co-authors Yin Li and Lancaster Wu
Axis Tour: Word Tour Determines the Order of Axes in ICA-transformed Embeddings
Techniques for interpreting ConvNets
Code for the paper: PatchX: Explaining Deep Models by Intelligible Pattern Patches for Time-series Classification
Feature selection is widely used in nearly all data science pipelines. Hence I have created functions that do a form of backward stepwise selection based on the XGBoost classifier feature importance and a set of other input values with the goal to return the number of features to keep in regard to a prefered AUC-score.
This Alignment Jam Hackathon project explores whether the concept of "logit lens" applies to the encoder and decoder layers in Whisper, an end-to-end speech recognition model.
StellarGraph - Machine Learning on Graphs
A Quick Look at B-cos Nets' Adversarial Robustness
This code is part of the paper: "A Deep Dive Into Neural Synchrony Evaluation for Audio-visual Translation" published at ACM ICMI 2022.
Technical audit of Automated Decision System for Fairness and Bias
In this repo. I apply several variation of GradCAM including GradCAM, GradCAMPlusPlus, EigenCAM and etc on a pretrained Resnet model and report ROAD as an evaluation metric for interpretability.
Neural model interpretation on MRI data
Creating the model and approach to manage and adjust the process/equipment
where I learn and explore mechanistic interpretability of transformers
Add a description, image, and links to the interpretability topic page so that developers can more easily learn about it.
To associate your repository with the interpretability topic, visit your repo's landing page and select "manage topics."