Semantica: Semantic Code Search Using Vectorized Abstract Syntax Trees

Wouldn't it be cool to use vector DBs to search for semantically (not synactically) similar code in the public domain?

For real?

It's not a game changer or anything. This is just a fun experiment that I built to get familiarized with Tree-sitter and pgvector :) There's still a lot of room for improvement to achieve high accuracy. This is not productized, this was just an intellectual escapade.

Instructions to run

Use node 18 and run npm install
Go to your supabase dashboard and get the 2 environment variables that you need on your .env file. Thos are:

NEXT_PUBLIC_SUPABASE_URL=YOUR_URL
NEXT_PUBLIC_SUPABASE_ANON_KEY=YOUR_API_KEY

Demo

Semantica works with 2 very basic functionalities:

You Save a code snippet to the DB. This codebase is now searchable by other users.
You retrieve the most semantically similar code snippets from the DB, given a snippet of your own.

Behinds the scenes, Semantica:

Converts the code snippet to a vectorized AST using Tree-sitter. Right now JS is the only language for which Semantica has a grammar.
Normalizes and stores the vectorized AST.
Uses dot product to search for the code snippets with the most similar embeddings. The match threshold is 0.9.

For example, you can add two numbers with the addition operator or by using an array and reducing it. They are syntactically different but semantically similar snippets that Semantica matches.

semantica-demo-compressed.mp4

Material that was useful to build this

How to install Tree-sitter on a NextJS project
What are embeddings?
Measuring Similarity From Embeddings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Semantica: Semantic Code Search Using Vectorized Abstract Syntax Trees

Instructions to run

Demo

Material that was useful to build this

Files

README.md

Latest commit

History

README.md

File metadata and controls

Semantica: Semantic Code Search Using Vectorized Abstract Syntax Trees

Instructions to run

Demo

Material that was useful to build this