New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtration support #62
Conversation
Ahh!!! This is gonna be annoying, but I am actually working on a filter system myself :-) See my branch here: https://github.com/bee-san/pyWhat/tree/bee-filter However!! I really like some of the things here. The arguments, listing tags, and docs are great!! Here's how I wanted to do it: So, we have 1 application which requires 2 filters:
By implementing filtering at the identifier level, every time ciphey runs it'd have to re-create the filters with the API. This means that if it runs 10 times a second, it re-makes the filters 10 times a second. You can get around this by filtering at the object attributes level, so you make the object once and the filters are there -- but this prevents you from changing the filters or adding new ones. You can of course cheat and manually make the object, but that's cheating and I'd prefer a high level API for users. I propose adding distributions. A distribution is a regex list which has been filtered (see my branch https://github.com/bee-san/pyWhat/tree/bee-filter/pywhat/filtration_distribution ). Then, the regex_identifier (or identifier) takes this distribution object and uses the This means we can:
I also envisioned adding magic methods to distributions like So going back to the Ciphey example, I imagine something like this:
filters = {"Tags": ["Networking", "Credentials"], "Min_Rarity": 0.6}
distribution = Distribution(filters) And then adding decoding stuff: filters1 = {"Tags": ["Needs Decoding", "IPv6]} # This tag doesn't exist yet, we also want short names for each regex as a tag in the future
distribution2 = Distribution(filters1) In total: filters = {"Tags": ["Networking", "Credentials"], "Min_Rarity": 0.6}
distribution = Distribution(filters)
filters1 = {"Tags": ["Needs Decoding", "IPv6]} # This tag doesn't exist yet, we also want short names for each regex as a tag in the future
distribution2 = Distribution(filters1)
identifier("text here", distribution) So we pass the identifier the distribution object. If no distribution object is passed, we should make a distribution object at the start of the program which is everything with no filters Hope that makes sense!! Sorry I haven't had much time to do this, I've been busy 😅 If you fancy picking up my branch you can always merge what I have with yours? 🥺🙏 Thank you so much for contributing!!! |
Sounds interesting! |
Are you interested in picking this up or should I finish it? 😄 |
I will finish it. However, need some time😀 |
I am thinking about making API look like this:
What are your thoughts, @bee-san? |
I can see where you're going, but also I think repeatedly switching the variables over is a bad idea? In an ideal world the API would look like: distribution1 = Distribution(filter1)
distribution2 = Distribution(filter2)
id = identifier.Identifier() # No distribution
id.identify(text, distribution1)
id.identify(text, distribution2)
id.identify(text) # Uses no distribution, which means it uses "everything". You can achieve that last one by creating a distribution for the whole program with everything,. and setting it as the default in the function like: class Identifier():
def __init__(self):
self.default_distribution = distribution()
def identify(self, distribution=self.default_distribution) That way it either uses:
I just dislike the idea of repeatedly changing a variable in an object, I'd much rather it be more functional like 😄 Thanks so much for the ❓ ! |
Can you explain how this works? I'm not familiar with this pattern 😄 |
After applying the changes, API will be like this: import pywhat
id = pywhat.Identifier()
dist = pywhat.Distribution(some_filter)
id.identify(some_text, dist) Importing |
@bee-san, I guess supporting distribution both as an attribute of |
Sure! That makes sense :) I can see why having multiple-identifiers might be handy if the program has hundreds of filters 😄 But also I see why having one identifier is cool if it only has one or two :) |
Now I am really confused on what went wrong. |
Soo, now it is possible to import API by using Distributionsfrom pywhat import *
dist = Distribution({"Tags": ["Identifiers"], "ExcludeTags": ["Credentials"], "MinRarity": 0.2, "MaxRarity": 0.8})
id = Identifier()
res = id.identify(DATA, distribution=dist)
dist2 = dist | Distribution({"Tags": ["Finance", "Media"]}) # not supported yet :(
id.distribution = dist2
id.identify(DATA) Tests are urgently important and it would be great if someone could help me with that 😏 P.S.: I really dislike how CLI option parsing looks like, I should move parsing to another function or check if click has some advanced solutions to offer. |
Sure! I'll perhaps work on some tests later, I have been very sick this week but I am feeling better 😄 |
@piatrashkakanstantinass are you in the discord? http://discord.skerritt.blog :) |
Sure! r49behind#6377 :) |
So, I have written some tests for distributions. Tests for identifiers still need to be written. Also, I decided not to create a Filter helper class since it does not offer any significant functionality. |
hii!!! nice!!! do we not already have identifier tests? Or do you mean identifiers with Filters:tm: tests? :) |
Yes, we do, but I want to test the actual behaviour of API with Filters™️XD |
@bee-san Well, it is done, I guess... |
Let me review 😄 !!!!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!!!! I'll write some docs based on your tests 😄 Thanks so much! Just a few comments and questions :-)
Small update! I checked out this branch, will test manually and update docs then accept :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some bugs with the CLI tool itself, but the API strangely works fine! Will write docs for the API :)
Okay, going to work on that😀 |
Aww, I forgot to return distribution in parse_options() 😅 |
Ahh always the way! The great news is that after this (and after I write some docs 😉 ) we can release!!! |
On it! ✍🏻 |
Add filtration support(#29)
pywhat --tags
orpywhat --rarity min:max --include_tags tag1,tag2 --exclude_tags tag1,tag2 TEXT
'min' and 'max' can be omitted.
ToDo: