Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds new CI, Covers Multiple OS, Python Versions and Packaging #226

Closed
wants to merge 25 commits into from
Closed

Adds new CI, Covers Multiple OS, Python Versions and Packaging #226

wants to merge 25 commits into from

Conversation

oke-aditya
Copy link
Contributor

@oke-aditya oke-aditya commented Jul 26, 2020

  • With the current change I added CI for more OS systems. It is good idea to build on multiple OS before publishing.

  • This technique is taken from PyTorch Lightning.

  • I have added CI for releasing to both PyPi test server and PyPi official. This is again taken from Lightning

  • We need a test to build package as well. And test PyPi releasing with a dummy release.

@codecov-commenter
Copy link

codecov-commenter commented Jul 26, 2020

Codecov Report

Merging #226 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #226   +/-   ##
=======================================
  Coverage   80.17%   80.17%           
=======================================
  Files         101      101           
  Lines        1897     1897           
=======================================
  Hits         1521     1521           
  Misses        376      376           
Flag Coverage Δ
#unittests 80.17% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b2084e...a2b57cf. Read the comment docs.

@oke-aditya
Copy link
Contributor Author

E AttributeError: module 'torch.distributed' has no attribute 'group'

I was too able to reproduce this error on my windows Machine
It arises from conftest.py code in our repo
Rest all, Linux, MacOs build for python 3.6, 3.7 and 3.8
This is a problem if torch.distributed is not initialized properly. Which is strange coz it builds on linux and not on windows.

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Jul 26, 2020

@lgvaz can you fix up the windows build issue of efficient det repo ? Once that is solved and I fix up the packaging We can then add these CI changes.

I will fix up the setup.py issue.
Also we haven't solved how to include efficient det in our settings.ini so that is still a bit of concern.

Helps in #182 #99 #135 #209

@lgvaz
Copy link
Collaborator

lgvaz commented Jul 27, 2020

@oke-aditya This is very very hard for me to solve, because, well.. I don't have windows hahahhah

Does windows support multi-gpu after all?

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Jul 27, 2020

Windows does support torch.distributed the issue maybe I am using cpu only which might not ship it. I am raising an issue with PyTorch itself as I tried from torch.distributed import group locally also did not work.

We are installing the same package in Linux and MacOs as It is also CPU only. It should work. Microsoft provides nice support for windows with pytorch now so it should be fixed.

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Jul 27, 2020

I'm facing a very small issue with Packaging install CI.
What it does is checks if the package can be installed properly from the wheel file.

Current scenario: -
I built the wheel, it gets saved in the dist folder. I cannot install that wheel using pip install dist/
More issues are it is version specific wheel name is mantisshrimp.0.0.8.whl

Possible solutions: -
Somehow if I can name the wheel as mantisshrimp-test.whl then I can install it by pip install dist/mantisshrimp-test.whl
Or some way to install wheel like *.whl. I am unable to do that as of now.

Other possible solution: -
Run pip install . Install the setup. This isn't a good thing, if we want to test the.whl.

It's very strange as same workflow runs for PyTorch Lightning. And it doesn't run for us.
Version key error I couldn't understand, it was being paseesd though. When I tried explicit passing, It resulted in error duplicate key found.

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Jul 27, 2020

So Windows CI error is linked to pytorch/pytorch#42095.
I guess torch itself does not support distributed GPU / CPU on windows but maybe soon it will. Let's see if there is a workaround or for now we can drop support for windows altogether.

Another option is mark these tests that fail on windows for now and skip them on windows OS. This is slightly poor technique as its somewhere in between of being supported or not.

Windows users have option of docker container with WSL2, code on windows; train on Linux VMs, This isn't our issue anyways coz if PyTorch doesn't support it. We cannot do it either.

@oke-aditya
Copy link
Contributor Author

AAh the install bug was fixed with MANIFEST.in file. Why did we exclude it anyways ? It was there in nbdev template. We should have kept it.

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Jul 27, 2020

Again the build failed coz we tried to import pandas as pd. We are getting hit by #181 again and again.
I am removing pandas from imports file. I guess we do not need it.

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Jul 27, 2020

Nice, we are getting there. New issue. We import PyTorch Lightning. Note we do not package this with our repo. It is not in our requirements as well, it is in additional requirements. So when we build the wheel, we do not package this dependency or take care that it is installed. The same strategy that of HuggingFace as well. They do not package PyTorch or Tensorflow or take care it is installed. Again this error arises from nasty imports folder, which is causing lot of concern.

  1. Do the tests required Fastai and PyTorch Lightning?
  2. Do we need this in imports ?

Fixes possible.

  1. I can install it while checking our wheel in package.yml file.
  2. We can bundle this in requirements (bad idea limits the minimal installation).

Note that egg or wheel file will always install minimal version of mantisshrimp.

@lgvaz
Copy link
Collaborator

lgvaz commented Jul 27, 2020

  1. Tests requires fastai, lightning, efficientdet, albumentations, everything, so for now let's only use with [all]

  2. We don't need it on imports =)

@oke-aditya
Copy link
Contributor Author

Tests run on all.
Wheel build check, checks our code only on our dependencies as imports.

@oke-aditya
Copy link
Contributor Author

Okay. So the final broken things are Tests on Windows. This time it is not due to the distributed issue. It is due to refactor of COCO metrics. Windows we use different COCO Api repository. @lgvaz can you have a look it seems small dtype conversion error.

@oke-aditya
Copy link
Contributor Author

Maybe this is the cause rbgirshick/py-faster-rcnn#481. We are using an upgraded version of numpy but the coco API needs a lower one ? The COCO APi is always painful and a bottleneck.

@oke-aditya
Copy link
Contributor Author

Currently we don't have any test to let us know if it is fixed. So I guess better will be to merge this and try to fix it up. It will be easier to fix as then we will know if Windows CI passes.

@oke-aditya oke-aditya changed the title Revamps CI Adds new CI, Covers Multiple OS, Python Versions and Packaging Jul 27, 2020
@oke-aditya
Copy link
Contributor Author

Following tests fail only on Windows

tests\metrics\test_coco_metric.py FF                            
tests\models\efficient_det\test_coco_metric.py F   

@lgvaz @ai-fast-track as we said Windows support is experimental. And reason is PyCoco Tools and PyTorch both have its limitations.

Suggestions to Proceed: -

  1. Merge this PR and try fixing up windows.
  2. Skip these tests for now. Mark them as COCO in PyTest and skip it in windows.
  3. Not support windows for now.

@ai-fast-track
Copy link
Collaborator

Does this (pip install) fix the issue with Windows when spinning the WIndows test machine?

C.2- Installing cocoapi in Windows:

pycoco cannot be installed using the command above (see issue-185 in the cocoapi repository). We are using this workaround:

pip install "git+https://github.com/philferriere/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI"

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Jul 28, 2020

I am using the same COCO API on workflow file. Still I'm getting error. You can have look at workflow here

Also in a few days. We need to bump to PyTorch 1.6

The reason is COCO API is very old and maybe there is some internal numpy typecast issue from our side or from this github repo itself. This is a liability to be carried forward.

@oke-aditya
Copy link
Contributor Author

Bumping all the tests to PyTorch 1.6 with the new relase today,

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Jul 28, 2020

Wow. Additional trouble now. We can't bump to PyTorch 1.6. @lgvaz this needs multi-headed attention now (pun).

Problem is with Ross's EfficientDet tests. Also, a test of fastai is failing.
Maybe raise an issue with Ross?
Error is

E       RuntimeError: Integer division of tensors using div or / is no longer supported, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.

from
preds = efficientdet.predict(model=fridge_efficientdet_model, batch=batch)

To avoid this from next time. We should run CI tests on PyTorch current as well as nightly. This was not very expected. But this increases our test count 2x. Which will now be too many.

The Fastai error is

E AttributeError: '_FakeLoader' object has no attribute 'generator'

arising from learner.fine_tune()

A fix for now maybe to ship with PyTorch 1.5 and try to cover 1.6.
Once we reach 1.6 we will start building on PyTorch nightlies.

@oke-aditya
Copy link
Contributor Author

K converting this to draft PR. Will raise 2 new PRs one for CI and one for the PyPI release.

@oke-aditya oke-aditya closed this Jul 30, 2020
@oke-aditya
Copy link
Contributor Author

Do let me know if we need to delete this branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants