Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Data-iterator tutorial made python3 compatible. #9460

Merged
merged 2 commits into from Jan 25, 2018
Merged

Data-iterator tutorial made python3 compatible. #9460

merged 2 commits into from Jan 25, 2018

Conversation

pracheer
Copy link
Contributor

Description

Faced 2 main issues while executing this http://mxnet.incubator.apache.org/tutorials/basic/data.html tutorial on python3:

  1. Zip function has changed in python3. It returns an iterator which gets exhausted after it is iterated over. More info: https://stackoverflow.com/questions/31683959/the-zip-function-in-python-3/31684038#31684038
  2. Some of the methods in MXNet assume the parameter to be of type string in python2
    but as bytes in python3.

Checklist

Essentials

  • Passed code style checking (make lint)
  • [Y] Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • [Y] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@pracheer
Copy link
Contributor Author

string to bytes in python 3 environment

```python
def str_or_bytes(str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mxnet.base.py_str

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, the output of py_str is a string and input can be either a string or bytes. What we are looking for here is the other way around with input always a string but output can either be a string or bytes.

We can optionally add a similar function to the base.py to get this kind of job done. Let me know your thoughts.


@property
def provide_label(self):
return self._provide_label
return zip(self._label_names, self._label_shapes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific about your comment here ? :-)

The zip function in python2 used to provide a list of tuples which are not exhausted if you have traversed through them once. In python3 zip returns an iterator which gets "exhausted" if we have traversed through once. We, therefore, are trying to create a new iterator (via zip) for every batch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already done in __init__?
self._provide_data = zip(data_names, data_shapes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a guarantee that the functions provide_data and provide_label will be called only once by the caller? If yes, then we don't need to make the change to the function. If no, then we'll have to explicitly call zip every time.

In Python2, zip returns a list. So, the second time you call provide_data fn, the list is used again.
However, in Python 3, zip it returns an iterator. The second time you call provide_data fn, the iterator throws an error since we already exhausted the iterator the first time we called provide_data fn. We, therefore, need to re-initialize via zip everytime we call provide_data.

@pracheer pracheer requested a review from szha as a code owner January 24, 2018 18:37
Ubuntu added 2 commits January 24, 2018 22:51
Faced 2 main issues while executing this http://mxnet.incubator.apache.org/tutorials/basic/data.html
tutorial on python3:
1. Zip function has changed in python3. It returns an iterator which gets exhausted
after it is iterated over. More info:
https://stackoverflow.com/questions/31683959/the-zip-function-in-python-3/31684038#31684038
2. Some of the methods in MXNet assume the parameter to be of type string in python2
but as bytes in python3.
@piiswrong piiswrong merged commit 5166e57 into apache:master Jan 25, 2018
yuxiangw pushed a commit to yuxiangw/incubator-mxnet that referenced this pull request Jan 25, 2018
* Data-iterator tutorial made python3 compatible.

Faced 2 main issues while executing this http://mxnet.incubator.apache.org/tutorials/basic/data.html
tutorial on python3:
1. Zip function has changed in python3. It returns an iterator which gets exhausted
after it is iterated over. More info:
https://stackoverflow.com/questions/31683959/the-zip-function-in-python-3/31684038#31684038
2. Some of the methods in MXNet assume the parameter to be of type string in python2
but as bytes in python3.

* Create list of zipped elements to simplify SimpleIter.
rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018
* Data-iterator tutorial made python3 compatible.

Faced 2 main issues while executing this http://mxnet.incubator.apache.org/tutorials/basic/data.html
tutorial on python3:
1. Zip function has changed in python3. It returns an iterator which gets exhausted
after it is iterated over. More info:
https://stackoverflow.com/questions/31683959/the-zip-function-in-python-3/31684038#31684038
2. Some of the methods in MXNet assume the parameter to be of type string in python2
but as bytes in python3.

* Create list of zipped elements to simplify SimpleIter.
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
* Data-iterator tutorial made python3 compatible.

Faced 2 main issues while executing this http://mxnet.incubator.apache.org/tutorials/basic/data.html
tutorial on python3:
1. Zip function has changed in python3. It returns an iterator which gets exhausted
after it is iterated over. More info:
https://stackoverflow.com/questions/31683959/the-zip-function-in-python-3/31684038#31684038
2. Some of the methods in MXNet assume the parameter to be of type string in python2
but as bytes in python3.

* Create list of zipped elements to simplify SimpleIter.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants