Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting point for TidierText.jl #118

Closed
kdpsingh opened this issue Aug 22, 2023 · 4 comments
Closed

Starting point for TidierText.jl #118

kdpsingh opened this issue Aug 22, 2023 · 4 comments

Comments

@kdpsingh
Copy link
Member

If we decide to make a TidierText.jl package, here are two great resources as a starting point:

@drizk1
Copy link
Member

drizk1 commented Oct 7, 2023

Over the last week or so, I have been building on the above work to get this draft of TidierText.jl. I thought i would share it here, any/all are welcome to join in/share thoughts.

bind_tf_idf, most of the unnest_*, and anti_join are implemented with some optional arguments and basic macros to work with TidierData.jl as below:

@chain sentences begin
    #@unnest_tokens(word, text, to_lower = true)
    @unnest_regex(word, text, "\\.? ")
    @group_by(doc_id, word)
    @summarize(n = n())
    @ungroup()
    @arrange(doc_id)
    @antijoin(get_stopwords())
    @bind_tf_idf(word, doc_id, n)
end

@kdpsingh
Copy link
Member Author

kdpsingh commented Oct 7, 2023

This looks amazing, @drizk1! Do you want to transfer this repo into TidierOrg? Then we can work together to get this in shape for release.

@drizk1
Copy link
Member

drizk1 commented Oct 8, 2023

Sounds good! I'll get it transferred over shortly

@kdpsingh
Copy link
Member Author

Now that TidierText.jl is up and off to the races, will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants