From 4e7816d97967bec4f3fe9ddeb5810904659ec53a Mon Sep 17 00:00:00 2001 From: Benedict Lee Date: Sat, 20 Apr 2024 12:48:52 +0900 Subject: [PATCH 01/11] Rename tab model to code completion model --- index.md | 2 +- where-we-are-today/code-completion.md | 3 +++ where-we-are-today/tab.md | 3 --- 3 files changed, 4 insertions(+), 4 deletions(-) create mode 100644 where-we-are-today/code-completion.md delete mode 100644 where-we-are-today/tab.md diff --git a/index.md b/index.md index e593d24..1292b01 100644 --- a/index.md +++ b/index.md @@ -18,7 +18,7 @@ What we call “AI copilots” are much more than a single LLM. As the Berkeley As more of software development is automated, we are seeing more human engineering time go into monitoring, maintaining, and improving the different components that make up AI software development systems. That said, most copilots to date have been black box, SaaS solutions with roughly the same components, which you often have little to no ability to understand or improve. -- [Tab model](./where-we-are-today/tab.md) +- [Code Completion model](./where-we-are-today/code-completion.md) - [Chat model](./where-we-are-today/chat.md) - [Local context engine](./where-we-are-today/local.md) - [Server context engine](./where-we-are-today/server.md) diff --git a/where-we-are-today/code-completion.md b/where-we-are-today/code-completion.md new file mode 100644 index 0000000..104e627 --- /dev/null +++ b/where-we-are-today/code-completion.md @@ -0,0 +1,3 @@ +# Code Completion model + +The “code completion” model component is used to power autocomplete suggestions and is typically a 1-15B parameter model. As a result, you can run these models on your laptop or on a server. Because developers need a suggestion within 500ms, you generally need to use a smaller model in order to meet the latency requirements. However, the quality of suggestions you get from models that are too small is bad. Thus, the tab-autocomplete model is optimized primarily with these two constraints in mind. Examples of models used for tab-autocomplete include Codex, DeepSeek Coder Base, StarCoder 2, Replit Code, etc. diff --git a/where-we-are-today/tab.md b/where-we-are-today/tab.md deleted file mode 100644 index 2c95fa2..0000000 --- a/where-we-are-today/tab.md +++ /dev/null @@ -1,3 +0,0 @@ -# Tab model - -The “tab” model component is used to power autocomplete suggestions and is typically a 1-15B parameter model. As a result, you can run these models on your laptop or on a server. Because developers need a suggestion within 500ms, you generally need to use a smaller model in order to meet the latency requirements. However, the quality of suggestions you get from models that are too small is bad. Thus, the tab-autocomplete model is optimized primarily with these two constraints in mind. Examples of models used for tab-autocomplete include Codex, DeepSeek Coder Base, StarCoder 2, Replit Code, etc. From 23034daa3cb0db29a099109fd355c46d8de3e6d1 Mon Sep 17 00:00:00 2001 From: Benedict Lee Date: Sat, 20 Apr 2024 13:55:04 +0900 Subject: [PATCH 02/11] Rename tab model to code completion model --- where-we-are-today/chat.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/where-we-are-today/chat.md b/where-we-are-today/chat.md index a5c1b6d..697582a 100644 --- a/where-we-are-today/chat.md +++ b/where-we-are-today/chat.md @@ -1,3 +1,3 @@ # Chat model -The “chat” model component is used to power question-answer experiences and is typically a 30B+ parameter model. Latency is not as important as it is for the “tab” model, so most people choose the one that gives them the best possible responses, oftentimes opting for SaaS API endpoints. When SaaS isn’t possible or preferred, open-source models are self-hosted on a server for the entire team to use. Examples of models used for chat experiences include GPT-4, DeepSeek Coder 33B, Claude 3, Code Llama 70B, etc. +The “chat” model component is used to power question-answer experiences and is typically a 30B+ parameter model. Latency is not as important as it is for the “code completion” model, so most people choose the one that gives them the best possible responses, oftentimes opting for SaaS API endpoints. When SaaS isn’t possible or preferred, open-source models are self-hosted on a server for the entire team to use. Examples of models used for chat experiences include GPT-4, DeepSeek Coder 33B, Claude 3, Code Llama 70B, etc. From 169630ae0bb927a71d2a06030695174d5be5a79b Mon Sep 17 00:00:00 2001 From: Benedict Lee Date: Sat, 20 Apr 2024 13:58:29 +0900 Subject: [PATCH 03/11] Update code completion model description with new examples --- where-we-are-today/code-completion.md | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/where-we-are-today/code-completion.md b/where-we-are-today/code-completion.md index 104e627..5878f41 100644 --- a/where-we-are-today/code-completion.md +++ b/where-we-are-today/code-completion.md @@ -1,3 +1,23 @@ # Code Completion model +The "code completion" model component is used to support autocomplete suggestions and is also referred to as the "tab-autocomplete model". Tab-autocomplete models use models trained with special templates, such as FiM(Fill in the Middle), that specialize in code infilling, and typically use small models, on the order of 1-15B for low latency. Because developers need a suggestion within 500ms, you generally need to use a smaller model in order to meet the latency requirements. However, the quality of suggestions you get from models that are too small is bad. Thus, the tab-autocomplete model is optimized primarily with these two constraints in mind. Examples of models used for tab-autocomplete are shown below: -The “code completion” model component is used to power autocomplete suggestions and is typically a 1-15B parameter model. As a result, you can run these models on your laptop or on a server. Because developers need a suggestion within 500ms, you generally need to use a smaller model in order to meet the latency requirements. However, the quality of suggestions you get from models that are too small is bad. Thus, the tab-autocomplete model is optimized primarily with these two constraints in mind. Examples of models used for tab-autocomplete include Codex, DeepSeek Coder Base, StarCoder 2, Replit Code, etc. +OpenSource Models +2B Class +- codegemma-2b +- deepseek-coder-1.3b-base +- starcoder2-3b + +7B Class +- codegemma-7b +- codellama-7b +- deepseek-coder-6.7b-base +- starcoder2-7b + +10B+ Class +- codellama-13b +- deepseek-coder-33b-base +- starcoder2-15b + +ClosedSource Model +- Codex-001 +- Codex-002 From 8446218f940ccc868a10864c4ee3df05af9f0b70 Mon Sep 17 00:00:00 2001 From: Benedict Lee Date: Sat, 20 Apr 2024 14:00:46 +0900 Subject: [PATCH 04/11] Update code completion model description with new examples --- where-we-are-today/code-completion.md | 23 +---------------------- 1 file changed, 1 insertion(+), 22 deletions(-) diff --git a/where-we-are-today/code-completion.md b/where-we-are-today/code-completion.md index 5878f41..dbc6beb 100644 --- a/where-we-are-today/code-completion.md +++ b/where-we-are-today/code-completion.md @@ -1,23 +1,2 @@ # Code Completion model -The "code completion" model component is used to support autocomplete suggestions and is also referred to as the "tab-autocomplete model". Tab-autocomplete models use models trained with special templates, such as FiM(Fill in the Middle), that specialize in code infilling, and typically use small models, on the order of 1-15B for low latency. Because developers need a suggestion within 500ms, you generally need to use a smaller model in order to meet the latency requirements. However, the quality of suggestions you get from models that are too small is bad. Thus, the tab-autocomplete model is optimized primarily with these two constraints in mind. Examples of models used for tab-autocomplete are shown below: - -OpenSource Models -2B Class -- codegemma-2b -- deepseek-coder-1.3b-base -- starcoder2-3b - -7B Class -- codegemma-7b -- codellama-7b -- deepseek-coder-6.7b-base -- starcoder2-7b - -10B+ Class -- codellama-13b -- deepseek-coder-33b-base -- starcoder2-15b - -ClosedSource Model -- Codex-001 -- Codex-002 +The "code completion" model component is used to support autocomplete suggestions and is also referred to as the "tab-autocomplete model". Tab-autocomplete models use models trained with special templates, such as FiM(Fill in the Middle), that specialize in code infilling, and typically use small models, on the order of 1-15B for low latency. Because developers need a suggestion within 500ms, you generally need to use a smaller model in order to meet the latency requirements. However, the quality of suggestions you get from models that are too small is bad. Thus, the tab-autocomplete model is optimized primarily with these two constraints in mind. Examples of models used for tab-autocomplete include Codex, CodeGemma(2,7B), CodeLlama(7,13B), DeepSeek Coder Base(1.3,6.7,33B), StarCoder2(3,7,15B), ReplitCode(3B), etc. \ No newline at end of file From aa2110139adf412556b8d0d8ca47b9c53f9af25a Mon Sep 17 00:00:00 2001 From: Benedict Lee Date: Sat, 20 Apr 2024 14:17:49 +0900 Subject: [PATCH 05/11] Added llama3 as an example chat model. --- where-we-are-today/chat.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/where-we-are-today/chat.md b/where-we-are-today/chat.md index 697582a..2a3a422 100644 --- a/where-we-are-today/chat.md +++ b/where-we-are-today/chat.md @@ -1,3 +1,3 @@ # Chat model -The “chat” model component is used to power question-answer experiences and is typically a 30B+ parameter model. Latency is not as important as it is for the “code completion” model, so most people choose the one that gives them the best possible responses, oftentimes opting for SaaS API endpoints. When SaaS isn’t possible or preferred, open-source models are self-hosted on a server for the entire team to use. Examples of models used for chat experiences include GPT-4, DeepSeek Coder 33B, Claude 3, Code Llama 70B, etc. +The “chat” model component is used to power question-answer experiences and is typically a 30B+ parameter model. Latency is not as important as it is for the “code completion” model, so most people choose the one that gives them the best possible responses, oftentimes opting for SaaS API endpoints. When SaaS isn’t possible or preferred, open-source models are self-hosted on a server for the entire team to use. Examples of models used for chat experiences include GPT-4, DeepSeek Coder 33B, Claude 3, Code Llama 70B, Llama3 70B etc. From 6ee22e693bfc10ac04fd00eda52e295db208f730 Mon Sep 17 00:00:00 2001 From: Benedict Lee Date: Mon, 22 Apr 2024 09:03:17 +0900 Subject: [PATCH 06/11] rename autocomplete model name Co-authored-by: Ty Dunn --- index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.md b/index.md index 1292b01..e372247 100644 --- a/index.md +++ b/index.md @@ -18,7 +18,7 @@ What we call “AI copilots” are much more than a single LLM. As the Berkeley As more of software development is automated, we are seeing more human engineering time go into monitoring, maintaining, and improving the different components that make up AI software development systems. That said, most copilots to date have been black box, SaaS solutions with roughly the same components, which you often have little to no ability to understand or improve. -- [Code Completion model](./where-we-are-today/code-completion.md) +- [Autocomplete model](./where-we-are-today/autocomplete.md) - [Chat model](./where-we-are-today/chat.md) - [Local context engine](./where-we-are-today/local.md) - [Server context engine](./where-we-are-today/server.md) From 7000220dfd1c4d55ba85579864f3943455ab6748 Mon Sep 17 00:00:00 2001 From: Benedict Lee Date: Mon, 22 Apr 2024 09:03:43 +0900 Subject: [PATCH 07/11] rename autocomplete model name Co-authored-by: Ty Dunn --- where-we-are-today/chat.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/where-we-are-today/chat.md b/where-we-are-today/chat.md index 2a3a422..ddfadaf 100644 --- a/where-we-are-today/chat.md +++ b/where-we-are-today/chat.md @@ -1,3 +1,3 @@ # Chat model -The “chat” model component is used to power question-answer experiences and is typically a 30B+ parameter model. Latency is not as important as it is for the “code completion” model, so most people choose the one that gives them the best possible responses, oftentimes opting for SaaS API endpoints. When SaaS isn’t possible or preferred, open-source models are self-hosted on a server for the entire team to use. Examples of models used for chat experiences include GPT-4, DeepSeek Coder 33B, Claude 3, Code Llama 70B, Llama3 70B etc. +The “chat” model component is used to power question-answer experiences and is typically a 30B+ parameter model. Latency is not as important as it is for the “autocomplete” model, so most people choose the one that gives them the best possible responses, oftentimes opting for SaaS API endpoints. When SaaS isn’t possible or preferred, open-source models are self-hosted on a server for the entire team to use. Examples of models used for chat experiences include GPT-4, DeepSeek Coder 33B, Claude 3, Code Llama 70B, Llama 3 70B etc. From 1204b752ed73d2b39032a6b5078dfef6eb209b1f Mon Sep 17 00:00:00 2001 From: Benedict Lee Date: Mon, 22 Apr 2024 09:03:57 +0900 Subject: [PATCH 08/11] rename autocomplete model name Co-authored-by: Ty Dunn --- where-we-are-today/code-completion.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/where-we-are-today/code-completion.md b/where-we-are-today/code-completion.md index dbc6beb..a2ff9c4 100644 --- a/where-we-are-today/code-completion.md +++ b/where-we-are-today/code-completion.md @@ -1,2 +1,2 @@ -# Code Completion model +# Autocomplete model The "code completion" model component is used to support autocomplete suggestions and is also referred to as the "tab-autocomplete model". Tab-autocomplete models use models trained with special templates, such as FiM(Fill in the Middle), that specialize in code infilling, and typically use small models, on the order of 1-15B for low latency. Because developers need a suggestion within 500ms, you generally need to use a smaller model in order to meet the latency requirements. However, the quality of suggestions you get from models that are too small is bad. Thus, the tab-autocomplete model is optimized primarily with these two constraints in mind. Examples of models used for tab-autocomplete include Codex, CodeGemma(2,7B), CodeLlama(7,13B), DeepSeek Coder Base(1.3,6.7,33B), StarCoder2(3,7,15B), ReplitCode(3B), etc. \ No newline at end of file From b8f78aaa15dfc9f19e5e9a6d43620244e11097e3 Mon Sep 17 00:00:00 2001 From: Benedict Lee Date: Mon, 22 Apr 2024 09:04:46 +0900 Subject: [PATCH 09/11] rename autocomplete model name and add model paper ling Co-authored-by: Ty Dunn --- where-we-are-today/code-completion.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/where-we-are-today/code-completion.md b/where-we-are-today/code-completion.md index a2ff9c4..127e67b 100644 --- a/where-we-are-today/code-completion.md +++ b/where-we-are-today/code-completion.md @@ -1,2 +1,2 @@ # Autocomplete model -The "code completion" model component is used to support autocomplete suggestions and is also referred to as the "tab-autocomplete model". Tab-autocomplete models use models trained with special templates, such as FiM(Fill in the Middle), that specialize in code infilling, and typically use small models, on the order of 1-15B for low latency. Because developers need a suggestion within 500ms, you generally need to use a smaller model in order to meet the latency requirements. However, the quality of suggestions you get from models that are too small is bad. Thus, the tab-autocomplete model is optimized primarily with these two constraints in mind. Examples of models used for tab-autocomplete include Codex, CodeGemma(2,7B), CodeLlama(7,13B), DeepSeek Coder Base(1.3,6.7,33B), StarCoder2(3,7,15B), ReplitCode(3B), etc. \ No newline at end of file +The "autocomplete" model component is used to power code completion suggestions and is typically a 1-15B parameter model. The models are run on your laptop or on a server and have generally been trained with special templates like fill-in-the-middle (FIM) for code infilling. Because developers need a suggestion within 500ms, you generally need to use a smaller model in order to meet the latency requirements. However, the quality of suggestions you get from models that are too small is bad. Thus, the tab-autocomplete model is optimized primarily with these two constraints in mind. Examples of models used for code completion include [Codex](https://arxiv.org/pdf/2107.03374.pdf), [CodeGemma](https://developers.googleblog.com/2024/04/gemma-family-expands.html), [Code Llama](https://arxiv.org/pdf/2308.12950.pdf), [DeepSeek Coder Base](https://deepseekcoder.github.io/), [StarCoder 2](https://arxiv.org/pdf/2402.19173.pdf), etc. \ No newline at end of file From 21841ee5c2b7c036fe8488aaf4685f40a52d7633 Mon Sep 17 00:00:00 2001 From: Benedict Lee Date: Mon, 22 Apr 2024 09:10:18 +0900 Subject: [PATCH 10/11] Rename code-completion.md to autocomplete.md --- where-we-are-today/{code-completion.md => autocomplete.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename where-we-are-today/{code-completion.md => autocomplete.md} (95%) diff --git a/where-we-are-today/code-completion.md b/where-we-are-today/autocomplete.md similarity index 95% rename from where-we-are-today/code-completion.md rename to where-we-are-today/autocomplete.md index 127e67b..9b751c2 100644 --- a/where-we-are-today/code-completion.md +++ b/where-we-are-today/autocomplete.md @@ -1,2 +1,2 @@ # Autocomplete model -The "autocomplete" model component is used to power code completion suggestions and is typically a 1-15B parameter model. The models are run on your laptop or on a server and have generally been trained with special templates like fill-in-the-middle (FIM) for code infilling. Because developers need a suggestion within 500ms, you generally need to use a smaller model in order to meet the latency requirements. However, the quality of suggestions you get from models that are too small is bad. Thus, the tab-autocomplete model is optimized primarily with these two constraints in mind. Examples of models used for code completion include [Codex](https://arxiv.org/pdf/2107.03374.pdf), [CodeGemma](https://developers.googleblog.com/2024/04/gemma-family-expands.html), [Code Llama](https://arxiv.org/pdf/2308.12950.pdf), [DeepSeek Coder Base](https://deepseekcoder.github.io/), [StarCoder 2](https://arxiv.org/pdf/2402.19173.pdf), etc. \ No newline at end of file +The "autocomplete" model component is used to power code completion suggestions and is typically a 1-15B parameter model. The models are run on your laptop or on a server and have generally been trained with special templates like fill-in-the-middle (FIM) for code infilling. Because developers need a suggestion within 500ms, you generally need to use a smaller model in order to meet the latency requirements. However, the quality of suggestions you get from models that are too small is bad. Thus, the tab-autocomplete model is optimized primarily with these two constraints in mind. Examples of models used for code completion include [Codex](https://arxiv.org/pdf/2107.03374.pdf), [CodeGemma](https://developers.googleblog.com/2024/04/gemma-family-expands.html), [Code Llama](https://arxiv.org/pdf/2308.12950.pdf), [DeepSeek Coder Base](https://deepseekcoder.github.io/), [StarCoder 2](https://arxiv.org/pdf/2402.19173.pdf), etc. From 9d310f825c3eb29bf099eb78b2469d95b5db6141 Mon Sep 17 00:00:00 2001 From: Ty Dunn Date: Mon, 22 Apr 2024 08:42:53 -0700 Subject: [PATCH 11/11] Update where-we-are-today/autocomplete.md --- where-we-are-today/autocomplete.md | 1 + 1 file changed, 1 insertion(+) diff --git a/where-we-are-today/autocomplete.md b/where-we-are-today/autocomplete.md index 9b751c2..a3b639e 100644 --- a/where-we-are-today/autocomplete.md +++ b/where-we-are-today/autocomplete.md @@ -1,2 +1,3 @@ # Autocomplete model + The "autocomplete" model component is used to power code completion suggestions and is typically a 1-15B parameter model. The models are run on your laptop or on a server and have generally been trained with special templates like fill-in-the-middle (FIM) for code infilling. Because developers need a suggestion within 500ms, you generally need to use a smaller model in order to meet the latency requirements. However, the quality of suggestions you get from models that are too small is bad. Thus, the tab-autocomplete model is optimized primarily with these two constraints in mind. Examples of models used for code completion include [Codex](https://arxiv.org/pdf/2107.03374.pdf), [CodeGemma](https://developers.googleblog.com/2024/04/gemma-family-expands.html), [Code Llama](https://arxiv.org/pdf/2308.12950.pdf), [DeepSeek Coder Base](https://deepseekcoder.github.io/), [StarCoder 2](https://arxiv.org/pdf/2402.19173.pdf), etc.