---
title: "Why WHERE clauses can break your vector search"
author: "Safouane Chergui"
date: "2026-02-04"
format: html
toc: true
toc-location: body
toc-depth: 4
categories: [Python, NLP, vector search]
---

The goal is of this blog post is to show you how adding a WHERE clause to your semantic search can break it & how some vector DBs solve the filtering problem.

This blog post will have the following layout:

# 1. What is the filtering problem ?

Imagine yourself working on implementing vector search for an ecommerce website.

A user might want to look for an Apple or Google **wireless earbuds**. 

As a system developer, you would think of a system that looks for results semantically close to **wireless earbuds** and apply a filter to only select `Apple` or `Google` products.
While this filtering step looks benigh and harmless on the surface, it might really break your whole vector search.

The goal of this blog post is to show how this filtering breaks vector search & how some (and only some) vector DBs or libraries solve this issue.

# 2. quick recap on graphs

## 2.1. What is a graph ?

Before explaining HNSW, the main method behind vector search, we'll take a quick look at graphs.

A *graph* is a structure that is composed of:
- nodes: the points representing the entities in your graph
- edges: lines linking nodes together

A subway station can be represented as a *graph*. The stations are nodes. The stations connected together can have edges linking them in the graph.

Now, if there exists a path between two nodes, one is reachable from the other.

# 2.2. Connected graphs

A graph is connected if any node is reachable from any other through the edges in the graph.

The 4 subway stations are all reachable

<div align="center">

```{mermaid}
flowchart TB
    A["A"] --- B["B"]
    B --- C["C"]
    A --- D["D"]
    B --- D
    C --- D
```
<div>

If a node is unreachable from some starting node, this make the graph disconnected. In the example below, there is no path that goes from A to C.

<div align="center">

```{mermaid}
flowchart TB
    A["A"] --- B["B"]
    C["C"] --- D["D"]
```

</div>

In [None]:
**Removing