Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashtagify transformation #246

Merged
merged 9 commits into from
Oct 1, 2021

Conversation

pithysr
Copy link
Contributor

@pithysr pithysr commented Aug 31, 2021

This transformation adapts an input sentence by identifying named entities and other common words and turning them into hashtags, as often used in social media.

tasks = [
TaskType.TEXT_CLASSIFICATION,
TaskType.TEXT_TO_TEXT_GENERATION,
TaskType.TEXT_TAGGING,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TEXT_TAGGING will not be applicable here due to the difference in the number of tokens between the input sentence and the transformed sentence.(New Delhi --> #NewDelhi)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I remove it and commit again?

nltkdl("maxent_ne_chunker")
nltkdl("punkt")
nltkdl("averaged_perceptron_tagger")
nltkdl("stopwords")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to move this to the constructor.


def __init__(self, seed=666, max_outputs=1):
super().__init__(seed, max_outputs=max_outputs)
self.nlp = spacy.load("en_core_web_sm")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use spacy like this

#NewDelhi is among the many famous places in India.
```


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to add the robustness evaluation section too.

return perturbed_texts


class HashtagifyTransformation(SentenceOperation):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add docstring for the class HashtagifyTransformation.

@mille-s mille-s self-requested a review September 30, 2021 11:14
@kaustubhdhole
Copy link
Collaborator

Hi @pithysr, these changes look great! I would strongly recommend you to do the robustness evaluation too for your PR like other PRs. I am merging this now. I think the failed build has been fixed in another PR (not an issue with your PR).

@kaustubhdhole kaustubhdhole merged commit bb4dc6c into GEM-benchmark:main Oct 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants