Merge pull request #42 from cosmonaut-nz/re-arch

Google provider using Gemini-pro
cosmonaut-nz · Dec 21, 2023 · 26dbde1 · 26dbde1
2 parents 8b1e8b2 + f7b2dd9
commit 26dbde1
Show file tree

Hide file tree

Showing 12 changed files with 308 additions and 191 deletions.
diff --git a/Cargo.toml b/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "cosmonaut_code"
-version = "0.1.2"
+version = "0.2.0"
 edition = "2021"
 license = "CC BY-NC-ND 4.0"
 readme = "README.md"
@@ -38,10 +38,11 @@ sha2 = "0.10.8"
 handlebars = "4.5.0"
 # linguist-rs = "1.1.2" # Using direct repository fetch in place of crates.io as version is out of date (local code changes)
 linguist-rs = { git = "https://github.com/cosmonaut-nz/linguist-rs.git", version = "1.1.2" }
+gcp_auth = "0.10.0"
 
 
 [dev-dependencies]
 tempfile = "3.8.1"
 
 [build-dependencies]
-linguist-rs-build = { git = "https://github.com/cosmonaut-nz/linguist-rs.git", version = "1.1.1" } 
+linguist-rs-build = { git = "https://github.com/cosmonaut-nz/linguist-rs.git", version = "1.1.1" } 
diff --git a/README.md b/README.md
@@ -8,15 +8,15 @@ it's a code explorer, explainer and assessment tool.
 
 to be honest, we built this tool because we needed it for the work we do. sure there are some great tools out there, but none of them quite hit the mark for our needs.
 
-it's in pure rust! so it gotta be good, right? :roll_eyes:
+generative ai will maximise outputs and 10x developer output. what about the quality? this tool aims to address this - i.e., use the source of the problem to provide a solution.
 
 ### goals
 
 1. provide a viable tool for local codebase analysis.
 2. helps new developers to quickly get up to speed on a large or legacy codebase.
 3. provide a tool that will help developers and code maintainers manage their code and start a conversation on quality.
 4. allow code owners to check the overall health of the code in a simple way.
-5. output an actionable report that will improve the code base.
+5. output an actionable report with auto-PRs that will improve the code base over time.
 
 ### non-goals
 
@@ -33,22 +33,37 @@ it's in pure rust! so it gotta be good, right? :roll_eyes:
 5. entry-point for due diligence of technology assets.
 6. code owner reporting on technical-debt and general health of asset.
 
+## current state
+
+currently, most things are working and a solid report is produced in either `json` or `html`.
+
+the most stable and tested provider is openai. The best results, by far, are with the `gpt-4` service, which uses the latest `preview` model. the openai `gpt-3.5` works, but tends to over state issues and the quality of resolution offered to isses is not as good. it does run faster, however and is cheaper to run.
+
+google are late to the party, but have come in the door with a half-drunken bottle. the public instance of `gemini-pro` is both faster, cheaper and produces better results that openai's `gpt-3.5`. it is slightly behind the `gpt-4` `preview` model, but not far. do your own testing; we've found the late 2023 comparisons online to be highly misleading. the google provider is not as tested as openai, so you will see more errors in the log output. it should recover from these errors, but it is less robust.
 
 ## disclaimer
 
 this is really early days. running over a really big repo with the latest model will be super slow and possibly fail. we've tested it up to ~1500 code files, what with timeout retries etc., takes a couple of hours, cost about 5 usd. your mileage may vary. we think the value will come when it can be run over multiple models and compared and filtered.
 
-it produces false flags. it overplays or (rarely downplays) security issues. there is significant variation between review runs on the same repository, particularly with older models.
+as with all similar tools, it does produce false flags. it overplays or (rarely) downplays security issues. in some cases it may flag so many issues that the response is truncated, creating an error. we are working on this.
+
+there is significant variation between models and even review runs on the same repository with the same model, particularly with older models. some models are silent on obvious issues and transfixed on trivial issues.
+
+there are issues with the language file type matching via the github linguist regex. we will likely move to something more robust, or fix the crate that causes the mismatching.
+
+we recommend that you run it multiple times at first to gain a base line; fix the big issues and then let it run periodically.
 
-right now it's a barebones offering. it works, and we have gotten value from it, but there is a lot more to do. but it's been fun to do.
+right now it's deliberately a barebones offering. it works well, and we have gotten value from it, but there is a lot more to do. it's been fun to do.
 
-use it as it is intended, as a start-point to a conversation on quality and current practices.
+the google public api provider works, but is less robust than openai.
+
+there is a local instance wired up. it does work, but it highly fragile and unlikely to complete. it currently uses lm studio.
 
 ## usage
 
 download pre-release
 
-[MacOS Apple Silicon](https://github.com/cosmonaut-nz/cosmonaut-code/releases/download/v0.1.2/cosmonaut_code_0.1.2_macos-aarch64)
+[MacOS Apple Silicon](https://github.com/cosmonaut-nz/cosmonaut-code/releases/download/v0.2.0/cosmonaut_code_0.2.0_macos-aarch64)
 
 ### configuration
 
@@ -58,33 +73,31 @@ configure: add a `settings.json`, maybe in the `settings` folder, with the follo
 
 {
     "sensitive": {
-        "api_key": "[YOUR_OPENAI_API_KEY]",
-        "org_id": "[YOUR_OPENAI_ORG_ID]",
-        "org_name": "[YOUR_OPENAI_ORG_NAME]"
+        "api_key": "[YOUR_API_KEY]"
     },
     "repository_path": "[FULL_PATH_TO_REPO]",
     "report_output_path": "[FULL_PATH_TO_OUTPUT]",
-    "chosen_service": "gpt-4",
-    "output_type": "html",
-    "review_type": "general",
+    "chosen_provider": "[CHOICE OF PROVIDER]",
+    "chosen_service": "[CHOICE OF SERVICE]",
+    "output_type": "html"
 }
 
 ```
 
+`chosen_provider` is in:
+
+1. `openai` (default)
+2. `google` (note API key only, ADC does not work as this is the public version)
+
 `chosen_service` is in:
 
 1. `gpt-4` (default)
 2. `gpt-3.5`
-
-`review_type` is in:
-
-1. "general" = full review - (default)
-2. "security" = security review only
-3. "stats" = mock run, not using LLM for code review
+3. `gemini-pro` (for google provider)
 
 `output_type` is in:
 
-1. `HTML`
+1. `html`
 2. `json` - (default)
 
 run:
@@ -94,11 +107,12 @@ run:
 export SENSITIVE_SETTINGS_PATH=[PATH_TO_YOUR_SETTINGS.JSON]
 
 ```
+
 download release above
 
 ```bash
 
-mv cosmonaut_code_0.1.2_macos-aarch64 cosmonaut_code
+mv cosmonaut_code_0.2.0_macos-aarch64 cosmonaut_code
 
 ```
 
@@ -110,7 +124,7 @@ mv cosmonaut_code_0.1.2_macos-aarch64 cosmonaut_code
 
 ## via rust locally
 
-### tldr;
+### tldr
 
 install rust; clone the repo; cd repo; add config (see above); `cargo run`.
 
@@ -151,25 +165,25 @@ see [contributing](CONTRIBUTING.md) for the rules, they are standard though.
 
 we do our best to release working code. we hacked this out pretty quickly so the code's quality is not all that right now.
 
-status today is: *"it works, but it is not that pretty or that user-friendly."*
+status today is: *"it works, and the happy path is pretty solid. deviate from the path and there be dragons"*
 
 ## outline tasks
 
 - [X] load local repository
 - [X] enable open review of code
 - [X] output in json
 - [X] output in html
-- [ ] packaging so user can either install via `cargo install` or download the binary
-- [ ] output in pdf
+- [X] packaging so user can either install via `cargo install` or download the binary (macos apple silicon only)
 - [X] (fine) tune the prompts for clarity and accuracy
 - [X] more configuration and adjustment of prompts
-- [ ] github actions integration
-- [ ] enable private llm review of code (likely llama-based) run on a cloud service
+- [X] enable google gemini review of code
+- [ ] enable a private google gemini review of code using vertex ai (coming soon)
+- [ ] github actions integration (coming soon)
+- [X] enable private llm review of code (likely llama-based) run on a cloud service. (not fully tested, but wired in to use lm studio)
+- [ ] better collation of static data from `git` and the abstract source tree (ast) to feed the generative ai
 - [ ] proper documentation
 - [ ] gitlab pipeline integration
-- [ ] enable google palm review of code
-- [ ] enable anthropic claud review of code
-- [ ] enable meta llama review of code
+- [ ] make adding other providers easy and robust - e.g, a anthropic claud review of code
 - [ ] comparison of different llms review output on same code (this could be very cool!)
 
 `>_ we are cosmonaut`

diff --git a/src/dev_mode/mod.rs b/src/dev_mode/mod.rs
@@ -157,6 +157,7 @@ pub mod test_providers {
                 content: _get_code_str(test_source_file)?,
             }],
         };
+        info!("Prompt data: {:?}", prompt_data);
         let result = review_or_summarise(request_type, settings, provider, &prompt_data).await?;
         info!("Result: {:?}", result);
         Ok(())
@@ -187,6 +188,7 @@ pub mod test_providers {
                 content: _get_code_str(test_source_file)?,
             }],
         };
+        info!("Prompt data: {:#?}", prompt_data);
         let result = review_or_summarise(request_type, settings, provider, &prompt_data).await?;
         info!("Result: {:?}", result);