Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

record per-key types for heterogenous dicts and suggest TypedDict annotation #105

Closed
carljm opened this issue Jul 31, 2018 · 9 comments
Closed

Comments

@carljm
Copy link
Contributor

carljm commented Jul 31, 2018

Currently MonkeyType assumes dicts are homogenous. When heterogenous dicts are seen, we just combine all the value types into a Union (and then if that Union gets too big, by default we rewrite it to Any). It'd be more useful in such cases to suggest a TypedDict annotation with per-key value types.

The trick here is that we don't really know the intention, so there'd be some level of heuristic involved. Like, if there's multiple non-None value types (and probably if the number of keys is less than some threshold), rather than recording just a single value type, we record per-key value types. Then we'd also need some smarts when combining traces into an annotation: if we have any traces with per-key value types, try to construct a consistent TypedDict, but if we find that key names and value types aren't matching up consistently, fallback to normal Dict.

There's enough fuzziness in the heuristics here and this'd be enough work that it might not be worth it, but for codebases that use lots of heterogenous dicts, it'd make the generated annotations a lot more useful.

@devdoomari3
Copy link

devdoomari3 commented Dec 3, 2018

how about having a wrapper that 'acts like a dict' and shows intention-to-be-TypedDict ?

eg). TypedDictPlaceHolder: https://gist.github.com/devdoomari3/3e6e9e8177d2ff97c4ef0091209984f5

usage:

test1 = TypedDictPlaceHolder(
    a=1,
    b=2,
)

test2 = TypedDictPlaceHolder(**{  // wrap a dictionary
    'a': 1,
    'b': 3,
})

import json
print(json.dumps(test2)) // json-encoding friendly ("quacks like a dict")

@devdoomari3
Copy link

devdoomari3 commented Dec 10, 2018

@carljm my co-worker @Anylee2142 came up with an improvisation:

  • monkeytype extracts 'type-details' for dictionaries
  • custom clients can do whatever they want from 'type-details':
    • eg. a vscode-plugin for "would you like to convert to TypedDict?"
    • eg. pycodegen-monkeytype for extracting type-info from monkeytype sqlite file

Here's an example of monkeytype "type_details":

{
'qualname' : 'Dict', 'elem_types': [... some existing things ...],
'type_details': {
     name: str,
     age: int,
     id: int,
     ...
     }
}

what do you think about this? I'm hoping this can get merged (so we don't have to maintain a separate version of MonkeyType)

@carljm
Copy link
Contributor Author

carljm commented Dec 10, 2018

@devdoomari3 Yes, recording type_details is what's needed; just need to be careful not to drown the backing store if we run into some massive dicts, so there should probably be a dict size threshold beyond which we don't record details.

And we would want MonkeyType itself to automatically generate TypedDict annotation if it finds a dict for which the set of keys and their types is "consistent enough" across traces, not just leave that to custom clients.

One interesting choice is whether to ever generate total=False TypedDict, or always bail back to regular Dict if the set of keys isn't consistent across all traces. For simplicity in the initial version I'd probably lean towards only generating TypedDict if the same set of keys are always present.

Thanks for working on this issue!

@devdoomari3
Copy link

@carljm thank @Anylee2142 --- it's his idea & implementation

I'll get him to do a pull-request today.

ps: I personally don't believe in heuristics though...
( num. of consistent keys / infering 'intentions' are difficult to get right...)
but I might be wrong

@carljm
Copy link
Contributor Author

carljm commented Dec 11, 2018

I personally don't believe in heuristics though

MonkeyType is all about best-effort annotations to be reviewed and improved by a human. We don't ever expect that MonkeyType can get things annotated correctly in a purely automated way, because good annotations require more knowledge about the intent of the code than MonkeyType can have access to. This isn't new with this feature, it's already how MonkeyType works; there are a lot of heuristics in the existing type rewriters (e.g. "Union larger than 5 elements becomes Any," for one example.)

@devdoomari3
Copy link

@carljm then would an IDE-plugin for reviewing/improving generated types be helpful?

I don't like looking at git diffs... / 'applied' types getting mixed up with current edits

@carljm
Copy link
Contributor Author

carljm commented Dec 11, 2018

would an IDE-plugin for reviewing/improving generated types be helpful?

Not sure, I guess if it would be useful to you you can build it and see if it's useful to others? No reason for it to live in this repo, but if it works well and gains some interest we could certainly link it from the MonkeyType docs.

I don't like looking at git diffs... / 'applied' types getting mixed up with current edits

Haven't had that problem or heard that complaint in using MonkeyType at Instagram. You generate draft annotations for a module using MonkeyType, open up the module in your editor, and edit the annotations as needed until they are ready for commit.

@devdoomari3
Copy link

devdoomari3 commented Dec 11, 2018

@carljm you're right that the IDE plugin shouldn't live in this repo.
I can give a go with making a VSCode plugin, but making one for pycharm might be a bit off for me.

I don't like looking at git diffs... / 'applied' types getting mixed up with current edits

this is just a personal opinion (I'm too lazy).

@carljm
Copy link
Contributor Author

carljm commented Nov 7, 2019

Fixed with merge of #143.

@carljm carljm closed this as completed Nov 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants