Validator, logging and modelling improvements #127

MartBakler · 2024-01-23T22:14:00Z

Fixed a bug regarding lists of pydantic objects in validator
Added a lot of logging

JackHopkins

Heya, I've left a few questions that we should go through.

JackHopkins · 2024-01-24T13:59:05Z

src/tanuki/function_modeler.py

@@ -453,6 +455,9 @@ def _check_finetuning_condition(self, func_hash):
            # if havent read in the patch dataset size, read it in
            patch_dataset_size = self._get_dataset_info(PATCHES, func_hash, type="length")
            self.dataset_sizes[PATCHES][func_hash] = patch_dataset_size
+            logging.info(f"Function {function_description.__name__} [{align_dataset_size} aligns | {patch_dataset_size} runs] will be finetuned from"\
+                         f" {self.function_configs[func_hash].teacher_models.model_name} using {self.function_configs[func_hash].distilled_model.provider} in"\


Isn't teacher_models a list? If so, how can you get model_name from it?

JackHopkins · 2024-01-25T19:00:10Z

src/tanuki/language_models/language_model_manager.py

@@ -129,20 +136,27 @@ def get_generation_case(self, args, kwargs, function_description, llm_parameters
        # no examples needed, using a finetuned model. Dont save to finetune dataset
        if is_distilled_model and suitable_for_distillation:
            prompt = self.construct_prompt(f, args, kwargs, [], distilled_model)
+            if func_hash not in self.current_generators:
+                self.current_generators[func_hash] = {"model": "finetuned_model_placeholder", "examples": []}


Why is this a placeholder?

JackHopkins · 2024-01-25T19:00:50Z

src/tanuki/language_models/language_model_manager.py

            return prompt, distilled_model, suitable_for_distillation, True

        else:
            aligns = self.function_modeler.get_symbolic_alignments(function_description.__hash__(), max=16)
            examples = [f"Inputs:\nArgs: {align['args']}\nKwargs: {align['kwargs']}\nOutput: {align['output']}" for align in
                 aligns]

+            if func_hash not in self.current_generators:
+                self.current_generators[func_hash] = {"model": "", "examples": examples}


Can we change the variable name current_generators to something more intuitive?

JackHopkins · 2024-01-25T19:09:01Z

src/tanuki/language_models/llm_configs/__init__.py

+                                        context_length = 128000,
+                                        instructions="You are given below a function description and input data. The function description of what the function must carry out can be found in the Function section, with input and output type hints. The input data can be found in Input section. Using the function description, apply the function to the Input and return a valid output type, that is acceptable by the output_class_definition and output_class_hint.\nINCREDIBLY IMPORTANT: Only output a JSON-compatible string in the correct response format. Use the [END] tokens to specify when the output ends.",
+                                        parsing_helper_tokens={"start_token": "[START]", "end_token": "[END]"}),
+            "gpt-3.5-turbo-1106-finetune": OpenAIConfig(model_name = "", context_length = 14000),


Why is the model_name empty?

JackHopkins · 2024-01-25T19:12:54Z

src/tanuki/language_models/openai_api.py

-            finetuning_response: FineTuningJob = self.client.fine_tuning.jobs.create(training_file=training_file_id,
-                                                                      model="gpt-3.5-turbo",
+        finetuning_response: FineTuningJob = self.client.fine_tuning.jobs.create(training_file=training_file_id,
+                                                                      model="gpt-3.5-turbo-1106",


Why is this model hardcoded?

@MartBakler

…on of possible students

MartBakler added 3 commits January 23, 2024 22:07

fixed a bug regarding lists of pydantic objects

2e0bcd8

updated tests and fixed a tuple bug

8ef494e

Added loggings

dd174bb

MartBakler changed the title ~~fixed a bug regarding lists of pydantic objects~~ Pydantic and validator bugfixes Jan 24, 2024

MartBakler changed the title ~~Pydantic and validator bugfixes~~ Validator and logging additions Jan 24, 2024

MartBakler marked this pull request as ready for review January 24, 2024 13:39

MartBakler requested a review from JackHopkins as a code owner January 24, 2024 13:39

MartBakler added 6 commits January 24, 2024 14:19

added strip for llama output trimming

af896a4

fixed small bugs with logging

2f806db

updates to logging and adding new turbo model for finetuning

2b54602

updateing how we treat helper tokens

8139f97

added turbo model to default models

e46cead

Fixed issue when losing class definitions with Union or Optional

aabbdc4

MartBakler changed the title ~~Validator and logging additions~~ Validator, logging and modelling additions Jan 25, 2024

MartBakler changed the title ~~Validator, logging and modelling additions~~ Validator, logging and modelling improvements Jan 25, 2024

added tests for register output class definitions

ac2e368

JackHopkins reviewed Jan 25, 2024

View reviewed changes

MartBakler added 4 commits January 25, 2024 20:38

separated default teacher and student models for cleaner implementati…

204d61a

…on of possible students

updated the generator name for logging

f3e5279

added new turbo model

ff3df48

un hardcoded distilled model

2a1316e

JackHopkins approved these changes Jan 29, 2024

View reviewed changes

MartBakler merged commit 21672ec into master Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validator, logging and modelling improvements #127

Validator, logging and modelling improvements #127

MartBakler commented Jan 23, 2024 •

edited

JackHopkins left a comment

JackHopkins Jan 24, 2024

JackHopkins Jan 25, 2024

JackHopkins Jan 25, 2024

JackHopkins Jan 25, 2024

JackHopkins Jan 25, 2024

JackHopkins Jan 29, 2024

JackHopkins Jan 29, 2024

Validator, logging and modelling improvements #127

Validator, logging and modelling improvements #127

Conversation

MartBakler commented Jan 23, 2024 • edited

JackHopkins left a comment

Choose a reason for hiding this comment

JackHopkins Jan 24, 2024

Choose a reason for hiding this comment

JackHopkins Jan 25, 2024

Choose a reason for hiding this comment

JackHopkins Jan 25, 2024

Choose a reason for hiding this comment

JackHopkins Jan 25, 2024

Choose a reason for hiding this comment

JackHopkins Jan 25, 2024

Choose a reason for hiding this comment

JackHopkins Jan 29, 2024

Choose a reason for hiding this comment

JackHopkins Jan 29, 2024

Choose a reason for hiding this comment

MartBakler commented Jan 23, 2024 •

edited