Skip to content

Conversation

@KenelmQLH
Copy link
Collaborator

@KenelmQLH KenelmQLH commented Aug 25, 2021

Thanks for sending a pull request!
Please make sure you click the link above to view the contribution guidelines,
then fill out the blanks below.

Description

(Brief description on what this PR is about)
fix the np.error in gensim_vec.W2V.infer_vector

What does this implement/fix? Explain your changes.

numpy cannot deal with list with different width.
For examples, tokens = [ [ [1,2,..256], [1,2,..,256] ],[ [1,2,..,256] ] ] , which cannot apply np.mean(tokens, axis = 1)

Pull request type

  • [DATASET] Add a new dataset
  • [BUGFIX] Bugfix
  • [FEATURE] New feature (non-breaking change which adds functionality)
  • [BREAKING] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [STYLE] Code style update (formatting, renaming)
  • [REFACTOR] Refactoring (no functional changes, no api changes)
  • [BUILD] Build related changes
  • [DOC] Documentation content changes
  • [OTHER] Other (please describe):

Changes

EduNLP/Vector/gensim_vec.py
examples/vectorization/i2v.ipynb
tests/test_vec/test_vec.py

Does this close any currently open issues?

N

Any relevant logs, error output, etc?

the error log in current version:

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order, subok=True)

AxisError: axis 1 is out of bounds for array of dimension 1

Checklist

Before you submit a pull request, please make sure you have to following:

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [FEATURE], [BREAKING], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage and al tests passing
  • Code is well-documented (extended the README / documentation, if necessary)
  • If this PR is your first one, add your name and github account to AUTHORS.md

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@tswsxk
Copy link
Contributor

tswsxk commented Aug 26, 2021

Your example is a list with same width? I cannot get it.

@tswsxk
Copy link
Contributor

tswsxk commented Aug 26, 2021

The batch items inputted into infer_* are expected to have the same length.

@tswsxk tswsxk linked an issue Sep 1, 2021 that may be closed by this pull request
@KenelmQLH KenelmQLH self-assigned this Sep 1, 2021
@KenelmQLH KenelmQLH added the bug Something isn't working label Sep 1, 2021
@tswsxk tswsxk merged commit cfd1ae3 into bigdata-ustc:dev Sep 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants