diff --git a/_data/authors.yml b/_data/authors.yml
index daa61473f526..5a566726eb6c 100644
--- a/_data/authors.yml
+++ b/_data/authors.yml
@@ -578,4 +578,23 @@ Lyle Ungar:
       url: "mailto:ungar@cis.upenn.edu"
     - label: "Website"
       icon: "fas fa-fw fa-link"
-      url: "https://www.cis.upenn.edu/~ungar/"
\ No newline at end of file
+      url: "https://www.cis.upenn.edu/~ungar/"
+
+Arthur Wayne:
+  name    : "Arthur Wayne"
+  avatar  : "https://arthursolwayne.com/pfp.png"
+  bio     : "Researcher at Penn"
+  home    : https://arthursolwayne.com/
+  links:
+    - label: "Email"
+      icon: "fas fa-fw fa-envelope-square"
+      url: "mailto:artwayne@seas.upenn.edu"
+    - label: "Website"
+      icon: "fas fa-fw fa-link"
+      url: "https://arthursolwayne.com"
+    - label: "Twitter"
+      icon: "fab fa-fw fa-twitter-square"
+      url: "https://x.com/arthursolwayne"
+    - label: "GitHub"
+      icon: "fab fa-fw fa-github"
+      url: "https://github.com/arthursolwayne"
\ No newline at end of file
diff --git a/_posts/2025-02-19-BAP.md b/_posts/2025-02-19-BAP.md
new file mode 100644
index 000000000000..93a4d6e86f53
--- /dev/null
+++ b/_posts/2025-02-19-BAP.md
@@ -0,0 +1,699 @@
+---
+published: true
+title:  "Where’s the Bug? Attention Probing for Scalable Fault Localization"
+excerpt: "Attention probing LLMs allows for accurately locating bugs in code without direct labels, code execution, or the largest of LLMs."
+header:
+  overlay_color: "#000"
+  overlay_filter: "0.5"
+  overlay_image: assets/images/BAP/waldo.jpg
+  teaser: assets/images/BAP/bap_fig1.png
+  actions:
+    - label: "Paper"
+      url: "https://arxiv.org/abs/2502.13966"
+    - label: "Code"
+      url: "https://github.com/adaminsky/BAP" 
+authors:
+  - Adam Stein|equal
+  - Arthur Wayne|equal
+  - Aaditya Naik
+  - Mayur Naik
+  - Eric Wong
+
+step1:
+  - url: "/assets/images/BAP/codeprobes_1_blog.png"
+    image_path: "/assets/images/BAP/codeprobes_1_blog.png"
+    alt: ""
+    title: ""
+step2:
+  - url: "/assets/images/BAP/codeprobes_2_blog.png"
+    image_path: "/assets/images/BAP/codeprobes_2_blog.png"
+    alt: ""
+    title: ""
+step3:
+  - url: "/assets/images/BAP/codeprobes_3_blog.png"
+    image_path: "/assets/images/BAP/codeprobes_3_blog.png"
+    alt: ""
+    title: ""
+---
+
+<script type="text/javascript" async
+  src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML">
+</script>
+<script src="https://cdn.jsdelivr.net/npm/chart.js@3.7"></script>
+<script src="https://cdn.jsdelivr.net/npm/chartjs-chart-matrix@1.1"></script>
+<script src="https://cdn.jsdelivr.net/npm/chartjs-plugin-datalabels"></script>
+
+> Locating bugs in code remains a challenging problem for both humans and automated systems.
+> Existing methods for locating bugs either require executable test cases, training on costly line-level annotations, or resource-intensive LLMs. We present Bug Attention Probe (BAP), a method which learns to localize bugs without direct localization labels through an attention mechanism over latent code tokens. BAP identifies buggy lines of code by assigning high attention weights to the tokens comprising the buggy line. BAP not only
+outperforms traditional fault localization baselines and prompting of large-scale LLMs, but it also requires a fraction of the computational cost of large LLMs.
+
+Pinpointing the location of bugs in code, known as fault localization (FL), is a central challenge in software engineering. As large language models (LLMs) become increasingly performant on code-related tasks, automating FL becomes even more useful and important. For example, LLMs are now used to propose complete bug fixes from only a user’s bug report, but their effectiveness is limited by their ability to first locate the bug.
+
+In the field of software engineering, this problem has been extensively studied, but the proposed methods often require code execution and provide overall poor performance. 
+Deep learning has recently enabled test and execution free FL by training models on line-level supervision (i.e. which lines have a bug and which do not have a bug), but this form of supervision is costly to collect. Such training can now even be avoided by just prompting the best LLMs.
+
+The limitations of existing FL approaches make their use on real-world code challenging.
+LLMs enable FL on arbitrary code, empowering the creation of tools which autonomously fix bugs without special setup or training, but this comes at a cost.
+While LLM costs are low for individual calls, the use of the best LLMs becomes highly costly on large and rapidly-evolving codebases where the LLM may be evaluated on all code files for every code change. This leads us to our central question:
+
+How do we achieve strong FL performance without tests, strong supervision, or large-scale models?
+{: .notice--info}
+
+To this end, we propose the *Bug Attention Probe* (BAP) which elicits strong localization performance from a small LLM with a lightweight form of supervision (not requiring line-level localization labels). Before presenting BAP, we will first introduce the problem of FL.
+
+## Fault Localization (FL)
+What is a bug in the first place? For our purposes, a code fragment can either have a bug or not, so we define a bug as a property $$b: \mathcal{P}\rightarrow \{0, 1\}$$ where $$\mathcal{P}$$ is the space of all possible programs. For a fragment of code $$c\in\mathcal{P}$$, the code is buggy if $$b(c)=1$$.
+
+We can now understand what it means to localize a bug. We use the notion of a counterfactual explanation to define bug localization.
+In particular, a program has a bug localized to line $$i$$ if modifying only line $$i$$ would remove the bug. Not all bugs can be localized to a single line, meaning one must modify multiple lines of code to make a code fragment bug free. Therefore, the groundtruth FL labels for buggy code fragments consist of one or more numbers corresponding to the lines of code which must be modified to remove the bug.
+
+We can think of an FL method as simply a mapping from a buggy code fragment to a list of numbers representing the lines we should change to remove the bug, $$l: \mathcal{P}\rightarrow \mathbb{Z}^+$$. An FL method is highly useful since it directly outputs which lines of code should be modified to fix a bug, but performing FL in practice presents several challenges.
+
+### Challenges in FL
+
+We identify three main challenges faced by existing FL methods.
+
+1. **Need for Strong Supervision**: Learning-based FL methods train on datasets consisting of code fragments paired with the fault localization line numbers. Such datasets, however, require large amounts of manual effort to create since it requires already having a working method for fault localization. It is instead much easier to collect large amounts of code labelled as buggy or not since there are many accurate bug detection methods such as failing tests.
+2. **Localizing Multi-line Bugs**: Most real-world bugs are localized to multiple lines, but existing FL methods mostly focus on the single-line case.
+3. **Resource Efficiency**: LLMs are state-of-the-art for FL, but only the most powerful LLMs which are expensive, especially for repeated calls.
+
+To address these challenges, we next present our method based on LLM probing.
+
+
+## Attention Probing for FL
+
+BAP is built around the insight that an LLM's attention mechanism can implicitly reveal the most suspicious parts of code. If we train a small model to classify code as buggy or not (which is weak supervision for the task of FL), its learned attention weights may reveal which tokens lead to that decision.
+Summing those attention patterns at the line level produces a human-interpretable localization—no explicit line-level bug label needed.
+
+BAP consists of 3 steps: training with weak supervision, token-level attention extraction, and line-level attention aggregation. Click through the tabs below for an explanation of each step.
+
+<div class="tabs">
+  <input type="radio" id="tab1" name="tab-control" checked>
+  <input type="radio" id="tab2" name="tab-control">
+  <input type="radio" id="tab3" name="tab-control">
+  
+  <ul>
+    <li title="Weak Supervision">
+      <label for="tab1" role="button">
+        <span>1. Weak Supervision</span>
+      </label>
+    </li>
+    <li title="Attention Extraction">
+      <label for="tab2" role="button">
+        <span>2. Attention Extraction</span>
+      </label>
+    </li>
+    <li title="Line-level Ranking">
+      <label for="tab3" role="button">
+        <span>3. Line-level Ranking</span>
+      </label>
+    </li>
+  </ul>
+  
+  <div class="slider"><div class="indicator"></div></div>
+  <div class="content">
+    <section data-tab-title="Weak Supervision">
+      <p>BAP trains on a bug <i>detection</i> dataset: each sample is labeled just "buggy" or "clean" and no line-level information is needed. While the training signal for BAP is a binary label, BAP uses an attention mechanism over a frozen LLM's latent representations to implicitly learn to localize the bug.</p>
+      <img src="/assets/images/BAP/codeprobes_1.png" alt="Weak Supervision">
+    </section>
+    <section data-tab-title="Attention Extraction">
+      <p>After BAP is trained on the detection task, we evaluate BAP on a code fragment and collect each code token's attention score to the final token to get token-level localization information. This information is not immediately interpretable since programmers operate on the code-level or line-level rather than the token-level.</p>
+      <img src="/assets/images/BAP/codeprobes_2.png" alt="Attention Extraction">
+    </section>
+    <section data-tab-title="Line-level Ranking">
+      <p>For each code line, we sum the attention scores for all tokens belonging to the line to get line-level localization scores. The line(s) with the highest sum is flagged as most likely containing the bug.</p>
+      <img src="/assets/images/BAP/codeprobes_3.png" alt="Line-level Ranking">
+    </section>
+  </div>
+</div>
+
+<style>
+.tabs {
+  position: relative;
+  margin: 2rem 0;
+  background: #fff;
+  height: auto;
+  min-height: 26em;
+  border-radius: 8px;
+  box-shadow: 0 2px 10px rgba(0,0,0,0.05);
+}
+
+.tabs input[type=radio] {
+  position: absolute;
+  opacity: 0;
+  z-index: -1;
+}
+
+.tabs ul {
+  list-style: none;
+  padding: 0;
+  margin: 0;
+  display: flex;
+  justify-content: space-around;
+  position: relative;
+}
+
+.tabs ul li {
+  flex: 1;
+}
+
+.tabs ul li label {
+  display: block;
+  padding: 1rem;
+  background: #fafafa;
+  cursor: pointer;
+  text-align: center;
+  font-weight: bold;
+  transition: background 0.3s, color 0.3s;
+  border-radius: 8px 8px 0 0;
+  border-bottom: 2px solid #f0f0f0;
+}
+
+.tabs ul li label:hover {
+  background: #f0f0f0;
+}
+
+.tabs input[type=radio]:checked + label {
+  background: #fff;
+  color: #007bff;
+  border-bottom: 3px solid #007bff;
+}
+
+.tabs .slider {
+  position: absolute;
+  top: 0;
+  left: 0;
+  width: 33.33%;
+  height: 4px;
+  background: #007bff;
+  transition: transform 0.3s ease;
+  z-index: 5;
+}
+
+.tabs .content {
+  position: relative;
+  background: #fff;
+  padding: 1.5rem;
+  border-radius: 0 0 8px 8px;
+  min-height: 300px;
+}
+
+.tabs .content section {
+  display: none;
+  animation: fadeIn 0.6s ease;
+  position: relative;
+  width: 100%;
+  overflow-y: auto;
+  max-height: 600px;
+}
+
+@keyframes fadeIn {
+  from { opacity: 0; transform: translateY(5px); }
+  to { opacity: 1; transform: translateY(0); }
+}
+
+.tabs input[type=radio]:nth-of-type(1):checked ~ .slider {
+  transform: translateX(0%);
+}
+
+.tabs input[type=radio]:nth-of-type(2):checked ~ .slider {
+  transform: translateX(100%);
+}
+
+.tabs input[type=radio]:nth-of-type(3):checked ~ .slider {
+  transform: translateX(200%);
+}
+
+.tabs input[type=radio]:nth-of-type(1):checked ~ .content section:nth-of-type(1),
+.tabs input[type=radio]:nth-of-type(2):checked ~ .content section:nth-of-type(2),
+.tabs input[type=radio]:nth-of-type(3):checked ~ .content section:nth-of-type(3) {
+  display: block;
+}
+
+@media (max-width: 768px) {
+  .tabs ul {
+    flex-direction: column;
+  }
+  
+  .tabs .slider {
+    display: none;
+  }
+  
+  .tabs ul li label {
+    border-radius: 0;
+  }
+  
+  .tabs input[type=radio]:checked + label {
+    background-color: #e6f2ff;
+    color: #007bff;
+    border-left: 4px solid #007bff;
+    font-weight: 700;
+    padding-left: calc(1rem - 4px);
+  }
+}
+
+@media print {
+  /* Force ALL sections to display in print mode */
+  .tabs .content section {
+    display: block !important;
+    opacity: 1 !important;
+    visibility: visible !important;
+    page-break-inside: avoid;
+    margin-bottom: 2rem;
+    position: relative !important;
+    left: 0 !important;
+    transform: none !important;
+  }
+  
+  /* Add tab titles for context when printed */
+  .tabs .content section::before {
+    content: attr(data-tab-title);
+    display: block !important;
+    font-weight: bold;
+    font-size: 1.2em;
+    border-bottom: 1px solid #007bff;
+    margin-bottom: 1rem;
+    padding-bottom: 0.5rem;
+  }
+  
+  /* Hide tab interface elements in print */
+  .tabs ul, .tabs .slider, .tabs input[type=radio] {
+    display: none !important;
+  }
+  
+  /* Remove background colors and styles that don't print well */
+  .tabs, .tabs .content {
+    background: transparent !important;
+    box-shadow: none !important;
+    min-height: 0 !important;
+    height: auto !important;
+  }
+
+  /* Fix for canvas rendering in print mode */
+  canvas {
+    max-width: 100% !important;
+    height: auto !important;
+    page-break-inside: avoid;
+  }
+  
+  /* Force chart canvases to be visible */
+  #comparisonChart, #barChart, #scatterChart {
+    display: block !important;
+    visibility: visible !important;
+    opacity: 1 !important;
+    margin-bottom: 2rem;
+    width: 100% !important;
+  }
+  
+  /* Add additional space between tab content */
+  .tabs .content section {
+    margin-bottom: 3rem !important;
+    padding-bottom: 2rem !important;
+    border-bottom: 1px dashed #ccc;
+  }
+  
+  /* Control the width for print layout */
+  @page {
+    margin: 1cm 2cm;
+  }
+  
+  .page-content {
+    width: 100% !important;
+    max-width: 750px !important;
+    margin: 0 auto !important;
+  }
+  
+  pre, code {
+    white-space: pre-wrap !important;
+    word-wrap: break-word !important;
+    font-size: 10pt;
+  }
+  
+  /* Improve code blocks for print */
+  .code-container {
+    page-break-inside: avoid;
+    border: 1px solid #ddd !important;
+    margin-bottom: 1.5rem !important;
+  }
+  
+  /* Ensure images don't exceed page width */
+  img {
+    max-width: 80% !important;
+    height: auto !important;
+  }
+  
+  /* Better headings in print */
+  h1, h2, h3 {
+    page-break-after: avoid;
+    page-break-inside: avoid;
+  }
+  
+  p, h2, h3 {
+    orphans: 3;
+    widows: 3;
+  }
+}
+</style>
+
+
+
+
+
+
+## A Toy Example
+We now demonstrate BAP on a simple example.
+Consider the following Java code snippet which we wrote (without much thought):
+```java
+Integer vote;
+public void addVote(int age) {
+  if (age >= 18) {
+    System.out.println("You're a minor!");
+  } else {
+    vote++;
+    System.out.println("You can vote!");
+  }
+}
+```
+If you look closely at the code, there are two major bugs! The first is that the condition for the if statement is inverted (`>= 18` prints "You’re a minor!"). The second issue is slightly more subtle in that the `vote` variable is defined as an `Integer` reference which is `null` so it will be a problem if we try to increment its value. How would we apply BAP to this code and what will the output look like?
+
+### BAP Attends to Buggy Lines
+
+<!-- BEGIN HIGHLIGHTED CODE SNIPPET -->
+<div class="figure">
+<div class='code-container'>
+
+<div class='line-container'><span class='score'>0.03</span><span class='code' style='background-color: rgba(0, 204, 102, 0.05);'>Integer vote;</span></div>
+<div class='line-container'><span class='score'>0.07</span><span class='code' style='background-color: rgba(0, 204, 102, 0.10);'>public void addVote(int age) {</span></div>
+<div class='line-container'><span class='score' style='color: red'>0.28</span><span class='code buggy' style='background-color: rgba(0, 204, 102, 0.66666666);'>  if (age >= 18) {</span></div>
+<div class='line-container'><span class='score'>0.05</span><span class='code' style='background-color: rgba(0, 204, 102, 0.03);'>    System.out.println("You're a minor!");</span></div>
+<div class='line-container'><span class='score'>0.04</span><span class='code' style='background-color: rgba(0, 204, 102, 0.02);'>  } else {</span></div>
+<div class='line-container'><span class='score' style='color: red'>0.28</span><span class='code buggy' style='background-color: rgba(0, 204, 102, 0.66666666);'>    vote++;</span></div>
+<div class='line-container'><span class='score'>0.10</span><span class='code' style='background-color: rgba(0, 204, 102, 0.15);'>    System.out.println("You can vote!");</span></div>
+<div class='line-container'><span class='score'>0.02</span><span class='code' style='background-color: rgba(0, 204, 102, 0.0);'>  }</span></div>
+<div class='line-container'><span class='score'>0.01</span><span class='code' style='background-color: rgba(0, 204, 102, 0.0);'>}</span></div>
+
+</div>
+<figcaption><strong>Figure 1:</strong> BAP's attention visualization showing line-level bug detection scores. Higher scores (in red) indicate lines that the model identifies as likely containing bugs. The visualization correctly highlights both bugs: the inverted condition (<code>age >= 18</code>) and the potential null reference error (<code>vote++</code>).</figcaption>
+</div>
+
+<style>
+.figure {
+    margin: 2rem 0;
+    text-align: center;
+}
+
+/* Keep your existing code-container styles */
+.code-container {
+    font-family: monospace;
+    font-size: 16px;
+    padding: 5px;
+    background-color: #f5f5f5;
+    border-radius: 3px;
+    border: 1px solid #ddd;
+    margin-bottom: 20px;
+    margin-top: 20px;
+    display: inline-block;
+    text-align: left;
+}
+.line-container {
+    display: flex;
+    align-items: flex-start;
+}
+.score {
+    width: 40px;
+    text-align: right;
+    padding-right: 50px;
+    color: #666;
+    font-weight: bold;
+    font-size: 16px;
+}
+.code {
+    flex: 1;
+    padding-left: 3px;
+    transition: background-color 0.2s, color 0.2s;
+    white-space: pre;
+}
+.buggy:hover {
+    background-color: rgba(255, 0, 0, 0.4) !important;
+    color: white;
+    cursor: pointer;
+}
+</style>
+<!-- END HIGHLIGHTED CODE SNIPPET -->
+
+
+
+After training on code examples labelled with just "buggy" or "clean," Figure 1 shows that BAP’s line-level attention highlights `if (age >= 18)` and `vote++;` as top suspects for buggy lines. No test harness or line-level annotations were needed—only knowledge of what buggy and non-buggy code snippets tend to look like.
+
+In reality, the bugs we evaluate on are far less trivial and better resemble real-world bug encounters.
+
+
+## BAP Evaluation
+We evaluate BAP on eight diverse fault localization datasets including Python, Java, and C bugs, and we show that BAP outperforms the baselines at top-1 FL performance. See the first tab below for the plot of top-1 FL accuracy for all eight datasets of BAP compared to GPT-4o prompting as well as a recent deep learning FL approach called LLMAO.
+
+
+Because BAP produces a ranking for each line of code in a given snippet, multiple suspicious lines can be identified simultaneously. See the second tab below for BAP's performance on multi-line bug classification where we see that BAP outperforms the baselines.
+
+BAP is also significantly more efficient than prompting, outperforming large open-weight models at a small fraction of the computational cost. Even a 1B parameter base model with BAP outperforms zero-shot prompts on 90B parameter models and GPT-4o. See the third tab below.
+
+
+
+<html lang="en">
+<head>
+  <meta charset="UTF-8" />
+  <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
+  <title>All Charts Example</title>
+  <!-- Load Chart.js only once -->
+  <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
+</head>
+</html>
+
+
+<div class="tabs" style="height: 34em">
+  <input type="radio" id="tab4" name="tab-control1" checked>
+  <input type="radio" id="tab5" name="tab-control1">
+  <input type="radio" id="tab6" name="tab-control1">
+  <ul>
+    <li title="Top-1 Localization Accuracy"><label for="tab4" role="button"><span>Localization Accuracy</span></label></li>
+    <li title="Multiple Lines Precision"><label for="tab5" role="button"><span>Multi-line Bug Precision</span></label></li>
+    <li title="Resource Efficiency"><label for="tab6" role="button"><span>Resource Efficiency</span></label></li>
+  </ul>
+
+  <div class="slider"><div class="indicator"></div></div>
+  <div class="content">
+    <section data-tab-title="Localization Accuracy">
+      <!-- <h2>Localization Accuracy</h2> -->
+      <p>
+        Averaged across all eight datasets, BAP improves by 34.6% top-1 accuracy
+        compared to the strongest baseline and 93.4% over zero-shot prompting GPT-4.
+      </p>
+      <canvas id="comparisonChart" width="800" height="600"></canvas>
+      <script>
+        const ctx1 = document.getElementById('comparisonChart').getContext('2d');
+        const data1 = {
+          labels: ['D4J', 'GH-Py', 'GH-J', 'DeepFix', 'TSSB', 'MS4J', 'Juliet-J', 'Juliet-C', 'Avg.'],
+          datasets: [
+            {
+              label: 'BAP',
+              data: [0.334, 0.575, 0.568, 0.481, 0.327, 0.291, 0.096, 0.217, 0.350],
+              backgroundColor: 'darkblue'
+            },
+            {
+              label: 'LLMAO',
+              data: [0.144, 0.109, 0.122, 0.118, 0.116, 0.063, 0.113, 0.126, 0.126],
+              backgroundColor: 'skyblue'
+            },
+            {
+              label: 'GPT-4o',
+              data: [0.240, 0.375, 0.365, 0.240, 0.009, 0.240, 0.009, 0.026, 0.181],
+              backgroundColor: '#74AA9C'
+            }
+          ]
+        };
+        const config1 = {
+          type: 'bar',
+          data: data1,
+          options: {
+            responsive: true,
+            plugins: {
+              legend: { position: 'top' },
+              title: { display: true, text: 'Fault Localization Performance' }
+            },
+            scales: {
+              x: {
+                title: { display: true, text: 'Benchmark' }
+              },
+              y: {
+                title: { display: true, text: 'Top-1 Accuracy' },
+                min: 0.0,
+                max: 0.6
+              }
+            }
+          }
+        };
+        Chart.defaults.font.size = 16;
+        new Chart(ctx1, config1);
+      </script>
+    </section>
+    <section data-tab-title="Multi-line Bug Precision">
+      <!-- <h2>Multiple Lines Precision</h2> -->
+      <p>
+        On multi-line bugs, BAP captures more relevant lines than typical single-line
+        localizers, showing better precision at top-k lines.
+      </p>
+      <canvas id="barChart" width="800" height="600"></canvas>
+      <script>
+        const ctx2 = document.getElementById('barChart').getContext('2d');
+        const data2 = {
+          labels: ['P@2', 'P@3', 'P@5'],
+          datasets: [
+            {
+              label: 'Random',
+              data: [0.201, 0.231, 0.297],
+              backgroundColor: 'gray'
+            },
+            {
+              label: 'CodeLlama-70B',
+              data: [0.250, 0.284, 0.351],
+              backgroundColor: 'skyblue'
+            },
+            {
+              label: 'Llama-3.3-70B',
+              data: [0.240, 0.266, 0.355],
+              backgroundColor: 'royalblue'
+            },
+            {
+              label: 'Qwen2.5-72B',
+              data: [0.221, 0.271, 0.347],
+              backgroundColor: 'SlateBlue'
+            },
+            {
+              label: 'DeepSeek-R1-Distill-Llama-70B',
+              data: [0.245, 0.283, 0.336],
+              backgroundColor: '#4169E1'
+            },
+            {
+              label: 'GPT-4o',
+              data: [0.218, 0.288, 0.359],
+              backgroundColor: '#74AA9C'
+            },
+            {
+              label: 'BAP-Llama-3.2-11B',
+              data: [0.289, 0.298, 0.367],
+              backgroundColor: 'darkblue'
+            }
+          ]
+        };
+        const config2 = {
+          type: 'bar',
+          data: data2,
+          options: {
+            responsive: true,
+            plugins: {
+              legend: { position: 'top' },
+              title: { display: true, text: 'Precision at Top-K Lines' }
+            },
+            scales: {
+              x: { title: { display: true, text: 'Top-K Lines' } },
+              y: {
+                title: { display: true, text: 'Precision' },
+                min: 0.15,
+                max: 0.4
+              }
+            }
+          }
+        };
+        new Chart(ctx2, config2);
+      </script>
+    </section>
+    <section data-tab-title="Resource Efficiency">
+      <!-- <h2>Resource Efficiency</h2> -->
+      <p>
+        BAP’s smaller memory requirements and lower FLOPs mean it can be integrated
+        into continuous integration workflows or potentially even local development
+        IDEs.
+      </p>
+      <canvas id="scatterChart" width="800" height="600"></canvas>
+      <script>
+        const ctx3 = document.getElementById('scatterChart').getContext('2d');
+        const data3 = {
+          datasets: [
+            {
+              label: 'BAP',
+              data: [
+                { x: 6.2,  y: 0.282, label: 'BAP-Llama-3.2-1B'  },
+                { x: 24.2, y: 0.334, label: 'BAP-Llama-3.2-11B' }
+              ],
+              backgroundColor: 'royalblue',
+              pointStyle: 'triangle',
+              pointRadius: 9,
+              borderColor: 'black',
+              borderWidth: 1
+            },
+            {
+              label: 'Zero-Shot',
+              data: [
+                { x: 170.0, y: 0.269, label: 'Llama-3.2-90B' },
+                { x: 131.7, y: 0.212, label: 'CodeLlama-70B' },
+                { x: 138.9, y: 0.157, label: 'Qwen2.5-72B' },
+                { x: 154.0, y: 0.269, label: 'Llama-3.3-70B' },
+                { x: 141.2, y: 0.221, label: 'DeepSeek-R1-Llama-3.3-70B' }
+              ],
+              backgroundColor: 'red',
+              pointStyle: 'star',
+              pointRadius: 8,
+              borderColor: 'red',
+              borderWidth: 2
+            }
+          ]
+        };
+        const config3 = {
+          type: 'scatter',
+          data: data3,
+          options: {
+            responsive: true,
+            plugins: {
+              legend: { position: 'top' },
+              title: { display: true, text: 'Top-1 Accuracy vs GPU Cost' },
+              tooltip: {
+                callbacks: {
+                  label: function(context) {
+                    let dataPoint = context.raw;
+                    return `${dataPoint.label}: (${dataPoint.x}, ${dataPoint.y})`;
+                  }
+                }
+              }
+            },
+            scales: {
+              x: { title: { display: true, text: 'GPU Cost (GB)' } },
+              y: { title: { display: true, text: 'Top-1 Accuracy' } }
+            }
+          }
+        };
+        new Chart(ctx3, config3);
+      </script>
+    </section>
+  </div>
+</div>
+
+
+## Future Directions
+
+We are excited about future research directions in leveraging LLM probing for FL and related problems. For instance, a current limitation of BAP is that we only evaluated on code fragments of at most a couple hundred lines of code, but we are interested in future work on extending BAP to full repositories. In addition, we have yet to incorporate BAP into a full bug repair solution where the FL information from BAP is used by an LLM to propose bug fixes. Finally, beyond code bugs, can we use an approach like BAP for finding "bugs" in non-code data? For instance, can an approach like BAP which probes an LLM's latent representations allow us to localize a mistake in an answer to a particular erroneous reasoning step?
+
+
+## Conclusion
+
+The best LLMs are now highly capable at FL, enabling FL in many real-world settings where existing techniques are inapplicable due to their requirement for execution scripts, test-cases, or custom models. These LLMs, however, are highly expensive.
+BAP demonstrates we can uncover strong line-level FL from a lightweight LLM by probing using just weak supervision. We show that BAP is highly performant across eight FL benchmarks, localizes multi-line bugs, and makes use of significantly less computational resources than LLM prompting with comparable performance.
+
+Check out our **[Paper](https://arxiv.org/abs/2502.13966)** and **[Code](https://github.com/adaminsky/BAP)** if interested in learning more or using BAP.
+
+```bibtex
+@article{stein2025s,
+  title={Where's the Bug? Attention Probing for Scalable Fault Localization},
+  author={Stein, Adam and Wayne, Arthur and Naik, Aaditya and Naik, Mayur and Wong, Eric},
+  journal={arXiv preprint arXiv:2502.13966},
+  year={2025}
+}
+```
\ No newline at end of file
diff --git a/assets/images/BAP/bap_fig1.png b/assets/images/BAP/bap_fig1.png
new file mode 100644
index 000000000000..44c91185dd8a
Binary files /dev/null and b/assets/images/BAP/bap_fig1.png differ
diff --git a/assets/images/BAP/codeprobes_1.png b/assets/images/BAP/codeprobes_1.png
new file mode 100644
index 000000000000..811084b80893
Binary files /dev/null and b/assets/images/BAP/codeprobes_1.png differ
diff --git a/assets/images/BAP/codeprobes_2.png b/assets/images/BAP/codeprobes_2.png
new file mode 100644
index 000000000000..757e9f827299
Binary files /dev/null and b/assets/images/BAP/codeprobes_2.png differ
diff --git a/assets/images/BAP/codeprobes_3.png b/assets/images/BAP/codeprobes_3.png
new file mode 100644
index 000000000000..6efe116eb48f
Binary files /dev/null and b/assets/images/BAP/codeprobes_3.png differ
diff --git a/assets/images/BAP/waldo.jpg b/assets/images/BAP/waldo.jpg
new file mode 100644
index 000000000000..da123a361363
Binary files /dev/null and b/assets/images/BAP/waldo.jpg differ