Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more code examples / tutorials #117

Open
bdewilde opened this issue Jul 2, 2017 · 4 comments
Open

Add more code examples / tutorials #117

bdewilde opened this issue Jul 2, 2017 · 4 comments

Comments

@bdewilde
Copy link
Collaborator

bdewilde commented Jul 2, 2017

Expected Behavior

Users expect to learn from code examples and tutorials more so than from reading an API reference. We should oblige.

Current Behavior

Fairly brief usage examples are embedded throughout the code in docstrings, but there are few "end-to-end" examples to follow along with.

Possible Solution

Create a separate directory for tutorials, and add more detailed examples (in jupyter notebooks?) there. Create additional rst files to include in the official docs. Examples that have been conveyed to me:

  • Generating a terms list (to pass to a Vectorizer), using the textacy.extract and textacy.keywords modules, like textacy.extract.pos_regex_matches() and textacy.keyterms.sgrank.
  • How to apply text_utils.clean_terms() to a terms list.
  • How to remove specific terms from a terms list, e.g. custom stop words.

Context

I've gotten more than one email about this... Clearly there's a need.

@theSage21
Copy link

theSage21 commented Mar 1, 2018

I wanted to pick up this issue. Besides the three you mentioned is there anything else that I should look out for? Things I should avoid doing or make sure that I do?

@bdewilde
Copy link
Collaborator Author

bdewilde commented Mar 1, 2018

Hey @theSage21 , thanks for signing up! 👍

The workflow for topic modeling is mostly standard and well-covered in textacy; it includes file io, preprocessing, spacy parsing, tokenization into terms, vectorization, model training, and visualization of results. This is another good candidate.

Investigating similarities of documents / sentences using metrics in the similarity module might be interesting, and could also incorporate some of the network module.

Really, though, I recommend just doing an analysis that's interesting to you, using textacy. Write it up, and if there are rough spots in terms of usability or gaps in functionality, be sure to let me know! ;)

@tmthyjames
Copy link

Awesome library! Just discovered it at work and am about to give it a go! I'm about to train a topic model and would love to post a tutorial here soon. Great stuff!

@scarroll32
Copy link

I'm having an issue with the TF-IDF function, but I think it is possibly that I am using / understanding how to use incorrectly. Have posted here on SO but would be happy to write a usage doc once I understand properly.

https://stackoverflow.com/questions/55764766/calculate-td-idf-for-a-single-word-in-textacy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants