[Ongoing] Knowledge base additions #142

jphall663 · 2023-10-27T19:00:04Z

~~executive order on AI~~
~~NIST 800-30 rev1 https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-30r1.pdf~~
~~IEEE 1012 (199X or 2016) https://people.eecs.ku.edu/~hossein/Teaching/Stds/1012.pdf~~
~~https://www.uspto.gov/sites/default/files/documents/USPTO_AI-Report_2020-10-07.pdf~~
~~https://www.commerce.gov/issues/intellectual-property (see if you think it misses the mark, it might b/c I don't see an AI focus)~~
~~https://standards.ieee.org/ieee/3119/10729/~~

jphall663 · 2023-10-30T12:17:01Z

~~https://www.frontiermodelforum.org/uploads/2023/10/FMF-AI-Red-Teaming.pdf~~

~~https://github.com/openai/openai-cookbook/tree/main~~

jphall663 · 2023-10-30T12:41:04Z

~~https://resources.oreilly.com/examples/0636920415947/-/blob/master/Attack_Cheat_Sheet.png <- community resources~~

datherton09 · 2023-10-30T17:16:14Z

All added. Waiting on EO. Decided to go ahead and add the "Intellectual property" page because I could still imagine it being a useful resource/portal (especially considering the USTPO falls under it, and that contains a specific resource we link to).

jphall663 · 2023-10-31T15:29:25Z

~~https://www.imda.gov.sg/resources/press-releases-factsheets-and-speeches/press-releases/2023/generative-ai-evaluation-sandbox <- GAI resources~~

datherton09 · 2024-02-21T17:47:13Z

[ALL ADDED, 2/21/2024]

benchmarks:

https://wavesbench.github.io/
https://github.com/huggingface/evaluate
https://github.com/AI-secure/DecodingTrust
https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vQObeTxvXtOs--zd98qG2xBHHuTTJOyNISBJPthZFr3at2LCrs3rcv73d4of1A78JV2eLuxECFXJY43/pubhtml
https://safetyprompts.com/
python software:

https://github.com/lilacai/lilac
official guidance:

https://www.ohchr.org/sites/default/files/documents/issues/business/b-tech/taxonomy-GenAI-Human-Rights-Harms.pdf
community resources:

https://www.hackerone.com/vulnerability-and-security-testing-blog
https://www.synack.com/wp-content/uploads/2022/09/Crowdsourced-Security-Landscape-Government.pdf
CSET stuff (just double check we reference somehow):
-- https://cset.georgetown.edu/article/translating-ai-risk-management-into-practice/
-- https://cset.georgetown.edu/publication/repurposing-the-wheel/
-- https://cset.georgetown.edu/publication/adding-structure-to-ai-harm/
-- https://cset.georgetown.edu/article/understanding-ai-harms-an-overview/
-- https://cset.georgetown.edu/publication/ai-incident-collection-an-observational-study-of-the-great-ai-experiment/
https://www.scsp.ai/wp-content/uploads/2023/11/SCSP_JHU-HCAI-Framework-Nov-6.pdf
https://openai.com/research/building-an-early-warning-system-for-llm-aided-biological-threat-creation
https://c2pa.org/
https://aiverifyfoundation.sg/downloads/Cataloguing_LLM_Evaluations.pdf
https://partnershiponai.org/modeldeployment/
https://cdn.openai.com/openai-preparedness-framework-beta.pdf

https://dominiquesheltonleipzig.com/country-legislation-frameworks/

red-teaming section:

https://www.hackerone.com/thought-leadership/ai-safety-red-teaming
https://cset.georgetown.edu/article/what-does-ai-red-teaming-actually-mean/

jphall663 · 2024-03-15T01:11:48Z

Red teaming -- but do we want to start hosting papers?

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal (2024)
Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendryckshttps://arxiv.org/pdf/2402.04249.pdf

Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Michael Feffer, Anusha Sinha, Zachary C. Lipton, Hoda Heidarihttps://arxiv.org/pdf/2401.15897.pdf

Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models
Chengdong Ma, Ziran Yang, Minquan Gao, Hai Ci, Jun Gao, Xuehai Pan, Yaodong Yanghttps://arxiv.org/pdf/2310.00322.pdf

Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment (2023)https://arxiv.org/pdf/2308.09662.pdf

Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases
Rishabh Bhardwaj, Soujanya Poriahttps://arxiv.org/pdf/2310.14303.pdf

jphall663 · 2024-03-15T01:12:27Z

GAI Critiques:

reasoning gap: https://arxiv.org/pdf/2402.19450.pdf
stealing language models: https://arxiv.org/pdf/2403.06634.pdf
dialect prejudice: https://arxiv.org/pdf/2403.00742.pdf

jphall663 assigned datherton09 Oct 27, 2023

datherton09 closed this as completed Oct 31, 2023

jphall663 changed the title ~~Things to add for next week (week of 10/30)~~ [Ongoing] Knowledge base additions Jan 12, 2024

jphall663 reopened this Jan 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ongoing] Knowledge base additions #142

[Ongoing] Knowledge base additions #142

jphall663 commented Oct 27, 2023 •

edited

jphall663 commented Oct 30, 2023 •

edited

jphall663 commented Oct 30, 2023 •

edited

datherton09 commented Oct 30, 2023

jphall663 commented Oct 31, 2023 •

edited

datherton09 commented Feb 21, 2024 •

edited

jphall663 commented Mar 15, 2024 •

edited

jphall663 commented Mar 15, 2024 •

edited

[Ongoing] Knowledge base additions #142

[Ongoing] Knowledge base additions #142

Comments

jphall663 commented Oct 27, 2023 • edited

jphall663 commented Oct 30, 2023 • edited

jphall663 commented Oct 30, 2023 • edited

datherton09 commented Oct 30, 2023

jphall663 commented Oct 31, 2023 • edited

datherton09 commented Feb 21, 2024 • edited

jphall663 commented Mar 15, 2024 • edited

jphall663 commented Mar 15, 2024 • edited

jphall663 commented Oct 27, 2023 •

edited

jphall663 commented Oct 30, 2023 •

edited

jphall663 commented Oct 30, 2023 •

edited

jphall663 commented Oct 31, 2023 •

edited

datherton09 commented Feb 21, 2024 •

edited

jphall663 commented Mar 15, 2024 •

edited

jphall663 commented Mar 15, 2024 •

edited