Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add files via upload #48

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Add files via upload #48

wants to merge 3 commits into from

Conversation

kritim13
Copy link

Snorkel RE example/EDA notebook

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

Review Jupyter notebook visual diffs & provide feedback on notebooks.


Powered by ReviewNB

@cthoyt
Copy link
Member

cthoyt commented Jun 29, 2020

Is there any chance you can add some of the code in these notebooks as python modules that are imported in the notebooks? This has several benefits:

  1. Allows for code quality checks (on top of review-notebook-app)
  2. Allows for code review directly in this PR
  3. Makes notebooks more reproducible
  4. Removes clutter from notebook so the notebooks can be focused on the scientific story rather than the code that accomplishes it

for example, the code block

@preprocessor(pre=[spacy])
def get_source_target(cand: DataPoint) -> DataPoint:
    """
    Returns the source and target mentioned in the sentence
    """
    person_names = []

    source = [token.text for token in cand.doc if token.text in example_sources]
    target = [token.text for token in cand.doc if token.text in example_targets]
    
    try:
        cand.source_target = (source[0],target[0])
    except:
        cand.source_target = (np.nan,np.nan)
    return cand

would be much better if it were parametrized like this, and stored in a python module for reuse:

def make_source_target_preprocessor(spacy, sources, targets):
    @preprocessor(pre=[spacy])
    def get_source_target(cand: DataPoint) -> DataPoint:
        """Returnsthe source and target mentioned in the sentence."""
        person_names = []

        source = [token.text for token in cand.doc if token.text in sources]
        target = [token.text for token in cand.doc if token.text in targets]
    
        try:
            cand.source_target = (source[0], target[0])
        except:
            cand.source_target = (np.nan, np.nan)
        return cand
    return get_source_target

Then in your notebook you can do something like:

from coronawhy_vt.kriti import make_source_target_preprocessor
get_source_target = make_source_target_preprocessor(spacy, example_sources, example_targets)
...

@kritim13
Copy link
Author

Sure, that's a great idea @cthoyt !

@cthoyt
Copy link
Member

cthoyt commented Jun 29, 2020

Please let me know if there’s something I can do to support you. I’ll definitely be able to give feedback as you add pieces as python modules

@kritim13
Copy link
Author

Yes, sure, will do. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants