Skip to content

Conversation

@KenelmQLH
Copy link
Collaborator

@KenelmQLH KenelmQLH commented Aug 31, 2021

Thanks for sending a pull request!
Please make sure you click the link above to view the contribution guidelines,
then fill out the blanks below.

Description

  1. add params in sif to decided whether to check formulas
  2. modify is_sif,to_sif,sif4sci to reduce time consumption

What does this implement/fix? Explain your changes.

  1. add params in sif to decided whether to check formulas
  2. modify is_sif,to_sif,sif4sci to reduce time consumption

Pull request type

  • [DATASET] Add a new dataset
  • [BUGFIX] Bugfix
  • [FEATURE] New feature (non-breaking change which adds functionality)
  • [BREAKING] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [STYLE] Code style update (formatting, renaming)
  • [REFACTOR] Refactoring (no functional changes, no api changes)
  • [BUILD] Build related changes
  • [DOC] Documentation content changes
  • [OTHER] Other (please describe):

Changes

EduNLP/SIF/parser/parser.py
EduNLP/SIF/sif.py
tests/test_sif/test_sif.py

Does this close any currently open issues?

Y, #37 ([FEATURE] Add an option for checking formulas in is_sif)

Any relevant logs, error output, etc?

N

Checklist

Before you submit a pull request, please make sure you have to following:

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [FEATURE], [BREAKING], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage and al tests passing
  • Code is well-documented (extended the README / documentation, if necessary)
  • If this PR is your first one, add your name and github account to AUTHORS.md

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@codecov-commenter
Copy link

codecov-commenter commented Aug 31, 2021

Codecov Report

Merging #79 (13159a5) into dev (73232ad) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##               dev       #79   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           46        46           
  Lines         1365      1371    +6     
=========================================
+ Hits          1365      1371    +6     
Impacted Files Coverage Δ
EduNLP/SIF/parser/parser.py 100.00% <100.00%> (ø)
EduNLP/SIF/sif.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 73232ad...13159a5. Read the comment docs.

@tswsxk
Copy link
Contributor

tswsxk commented Sep 1, 2021

It seems that this PR is relevant to issue #37 ?

@tswsxk tswsxk linked an issue Sep 1, 2021 that may be closed by this pull request
@KenelmQLH KenelmQLH self-assigned this Sep 1, 2021
@KenelmQLH KenelmQLH added the enhancement New feature or request label Sep 1, 2021
@KenelmQLH
Copy link
Collaborator Author

examples need to be changed if the codes are approved

return sif_item


def sif4sci(item: str, figures: (dict, bool) = None, safe=True, symbol: str = None, tokenization=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you sepecify some modes such as ast in tokenzation_params, is it possible to set check_formula as False?

Copy link
Collaborator Author

@KenelmQLH KenelmQLH Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ok, but tokenzation_params mainly work during 'tokenize', and check_formula is used in is_sif before 'tokenize'. Maybe it is better to use seperately for clear direction.
Or we can use it within 'safe':
when safe = 0, don't use is_sif and don't check anything
when safe = 1, use is_sif but don't check formula
when safe = 2, use is_sif and check formula

Copy link
Contributor

@tswsxk tswsxk Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safe looks like a boolean variable, maybe safe_mode is a better choice?

Copy link
Contributor

@tswsxk tswsxk Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can also set a wise safe_mode?

return True
return False
return True, item
return False, item_parser.text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return text?



def is_sif(item):
def is_sif(item, check_formula=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is sif should only return True or False. If it is necessary to modify the item for reducing duplicate opertaion in the following procedure, an extra argument is expected.

True if check the validity of formulas in item
False if not check the validity of formulas in item, which is faster
cache: bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache is not intuitive, maybe return_parser is a better choice

figures:
when it is a dict, it means the id-to-instance information for figures in 'FormFigureID{...}' format,
when it is a bool, it means whether to instantiate figures in 'FormFigureBase64{...}' format
safe_mode: int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe mode is better than safe_mode



def to_sif(item):
def to_sif(item, check_formula=True, cache_parser: Parser = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache_parser -> parser

@KenelmQLH KenelmQLH merged commit a294dc0 into bigdata-ustc:dev Oct 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add an option for checking formulas in is_sif

3 participants