-
Notifications
You must be signed in to change notification settings - Fork 3
added mypy to pre-commit and improved typing #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds mypy type checking to the pre-commit hooks and significantly improves typing across the codebase, while standardizing on the use of Python lists instead of numpy arrays for better type consistency.
- Added mypy configuration and pre-commit integration with proper exclusions
- Standardized data structures to use
List[str]andList[float]instead of numpy arrays - Enhanced type annotations across all modules with proper imports and overloads
Reviewed Changes
Copilot reviewed 42 out of 43 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Added mypy dependency and configuration settings |
| .pre-commit-config.yaml | Integrated mypy into pre-commit hooks |
| promptolution/tasks/base_task.py | Major refactor with type annotations and list-based data structures |
| promptolution/predictors/base_predictor.py | Updated predictor interface with proper typing |
| promptolution/llms/base_llm.py | Added comprehensive type annotations for LLM interface |
| tests/mocks/mock_task.py | Updated mock implementation to match new typing patterns |
| Various test files | Updated test assertions to work with list-based returns |
Comments suppressed due to low confidence (1)
| if shape is not None: | ||
| results_array = results_array.reshape(shape) | ||
|
|
||
| return results_array |
Copilot
AI
Jul 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method returns a numpy array but the type annotation indicates it should return List[str]. This creates a type inconsistency.
| return results_array | |
| return results_array.tolist() |
| candidates = list(compress(candidates, survivor_mask)) | ||
| block_scores = list(compress(block_scores, survivor_mask)) |
Copilot
AI
Jul 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using compress on block_scores (List[List[float]]) with a boolean mask will not work as expected. This should filter the list elements by index, not use compress which expects an iterable.
| candidates = list(compress(candidates, survivor_mask)) | |
| block_scores = list(compress(block_scores, survivor_mask)) | |
| candidates = [candidates[idx] for idx, mask in enumerate(survivor_mask) if mask] | |
| block_scores = [block_scores[idx] for idx, mask in enumerate(survivor_mask) if mask] |
| self.xs = [x for i, x in enumerate(self.xs) if i not in indices] | ||
| self.ys = [y for i, y in enumerate(self.ys) if i not in indices] |
Copilot
AI
Jul 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using 'i not in indices' in a list comprehension has O(n*m) complexity where m is the size of indices. For better performance, convert indices to a set first: 'indices_set = set(indices)'.
| self.xs = [x for i, x in enumerate(self.xs) if i not in indices] | |
| self.ys = [y for i, y in enumerate(self.ys) if i not in indices] | |
| indices_set = set(indices) | |
| self.xs = [x for i, x in enumerate(self.xs) if i not in indices_set] | |
| self.ys = [y for i, y in enumerate(self.ys) if i not in indices_set] |
| xs: List[str] = [] | ||
| ys: List[str] = [] | ||
| for label, num_samples in zip(unique_labels, samples_per_class): | ||
| indices = np.where(task.ys == label)[0] |
Copilot
AI
Jul 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using numpy operations on task.ys which is now a List[str]. This will fail since numpy comparison doesn't work the same way with Python lists.
| indices = np.where(task.ys == label)[0] | |
| indices = np.where(np.array(task.ys) == label)[0] |
| ) -> Union[np.ndarray, Tuple[np.ndarray, Union[List[Any], np.ndarray]]]: | ||
| ) -> Union[List[float], List[List[float]], Tuple[List[List[float]], List[List[str]]]]: | ||
| """Collects all results for the current batch from the cache and formats them.""" | ||
| assert not (return_agg_scores and return_seq), "Cannot return both aggregated scores and sequences" |
Copilot
AI
Jul 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion is duplicated at line 253. The validation should be done once at the beginning of the method rather than twice.
| assert not (return_agg_scores and return_seq), "Cannot return both aggregated scores and sequences" | |
| # Removed duplicate assertion as it is already validated at the beginning of the method. |
Uh oh!
There was an error while loading. Please reload this page.