New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSOC 2019: Make sure top 50 PyPI packages are supported #229

Open
kayhayen opened this Issue Jan 26, 2019 · 28 comments

Comments

Projects
None yet
6 participants
@kayhayen
Copy link
Member

kayhayen commented Jan 26, 2019

Nuitka works with most software. The aim of this project is to make sure it's true for the top 50 packages on PyPI, by compiling and using their example codes.

In a first stage, you would identify and report the issues to the bug tracker, in a second stage develop tools that help to narrow down issues, e.g. what extension module fails to load precisely, even with a segfault happening, and put them to use and try to fix a few of the simpler issues.

Setting up these as automated tests would be the ultimate goal, so we can follow these top 50 packages with Nuitka over time and make sure they continue to work.

In the past it has happened e.g. that Jinja2 was breaking for Python3.7, and it would be cool to discover this immediately.

Skills: Python programming, pip installation, Linux and/or Windows installs of Python, one is good, both would be great.

@aditya-hari

This comment has been minimized.

Copy link

aditya-hari commented Feb 3, 2019

Hey @kayhayen

This is something that sounds really interesting to me and I'd love to be able to contribute to this.
But I am completely new to Open source as a whole. I don't expect to be able to do something significant enough to qualify for GSoC or whatever but I'd love for this to be my first foray into open source and to gain experience.

Since this and #230 both are labelled as easy in terms of difficulty, I think I should start here.

But I don't even know what "Linux and/or Windows installs of Python" means in terms of skills, which both the easy difficulty ideas call for, and I don't know how to begin.

Could you please guide me to some resources or tell me how I should go about finding my bearings? I am competent with Python and consider myself to be a quick learner, I am sure I will be able to get up to speed soon.

@kayhayen

This comment has been minimized.

Copy link
Member Author

kayhayen commented Feb 3, 2019

Welcome @aditya-hari and I agree this is not yet detailed enough. Actually today I was going to flesh this further out with pointers.

As for the skill, more of an asset. We would prefer you did this on those platforms, doing it on macOS would e.g. be too difficult for us to help you with. We do not care if you use Windows or Linux, however in my experience the later works better for the debugging that will be needed, i.e. in case of a crash of standalone compiled software.

As for more pointers, they are coming later today, or so I hope.

@kayhayen

This comment has been minimized.

Copy link
Member Author

kayhayen commented Feb 3, 2019

So, first of all, we need to define what we mean by top 50 PyPI packages. For Python downloading the packages via pip is very widespread. So the first task would be to look at the top 100 and identify 50 packages that are even relevant to Nuitka.

Of course pip itself, leading the list, we will skip that, nobody needs that compiled or uses it as a library in their software. But urllib3 is top 2, and very important piece of software that people rely on, so breaking it in Nuitka compilation would be bad.

What the student would be expected to do, is to go and read its documentation in search for a tutorial, small examples, that make it work. Distill a minimal test out of that and add it to the existing pool of tests in tests/standalone say as tests/standalone/Urllib3Using.py and get it to work in standalone mode, using the Nuitka argument --standalone.

Then, if possible and easy, the tests should be made to work. For py.test and nose2 there is a good chance to get them working against a compiled package. For that you would be using the Nuitka argument --package, throw in magic environment variables if necessary, if the tests live below the package name space, say urllib3.tests and then run the tests. Looking at the test results, which hopefully pass and fail equally well (never assume a released software passes all the tests when not compiled in your environment or any), you then try to identify the issue, or report it as a Nuitka issue, or sometimes as an issue of the software we are testing.

When those work, we should try and turn this into a re-usable test as well, so we can apply them in an automated fashion.

Then on to next package on the list. The main benefit to the student will be to get to know the 50 most important software packages of Python on at least a cursory level. Something the mentors won't even do. And that will teach you a lot and the mentors too. And it will prevent people using Nuitka from encountering things that our testing will then find before Nuitka releases.

@aditya-hari

This comment has been minimized.

Copy link

aditya-hari commented Feb 4, 2019

Thanks for the elaboration, definitely helps a lot.

But I am still unsure of how to begin. Would going through the existing tests be of any help to me? And how do I set up a developer environment, or something along those lines to work? I am terribly sorry but I am a complete beginner to this

@kayhayen

This comment has been minimized.

Copy link
Member Author

kayhayen commented Feb 4, 2019

A good example of a test for a standalone usage is this one:

app = QCoreApplication([])

Here PyQt is used to create an application and signal a slot, which as a compiled function could be problematic. And in fact it was, until we reported to upstream with a potential patch to their source. The later would be fantastic if you manage, but not required.

@kayhayen

This comment has been minimized.

Copy link
Member Author

kayhayen commented Feb 4, 2019

And this one is the current test runners test, with only one small package:

One of the goals would be to document the secret environment variables to expand the package namespace for tests, and then to add more packages. This will be lower priority in general though, as currently this is the least fleshed out, and standalone mode is vastly more popular and important to the users.

Running those tests has the advantage though to find more errors in compiled code. But to e.g. make a PyQt.so out of PyQt/__init__.py and below is possible, but a manual task to copy needed other extension modules to the right directories. Surely we would provide such enhancements if we see it becomes usable as part of this project. Or if you wish, we can guide you through it. But as I said, that wouldn't be required.

@aditya-hari

This comment has been minimized.

Copy link

aditya-hari commented Feb 4, 2019

So I'm definitely showing my inexperience here, but how do I start? Apologize for the constant banal questions, just really want to help

@kayhayen

This comment has been minimized.

Copy link
Member Author

kayhayen commented Feb 4, 2019

You would start by making a minimal urllib3 using program, compile it with nuitka, and run it standalone. You would then integrate it into the test suites mentioned above. Then, optionally try to package urllib3 as a compiled package, and run its tests against that. Then next package.

@kayhayen

This comment has been minimized.

Copy link
Member Author

kayhayen commented Feb 4, 2019

And obviously, urllib3 documentation is most definitely going to contain a tutorial example, some prime use case, of what people usually use it. In that case, it probably is going to be downloading files via http and/or https protocols.

@PrajwalM2212

This comment has been minimized.

Copy link

PrajwalM2212 commented Feb 4, 2019

@kayhayen I am interested in this project and the oppurtunity it gives to know top 50 packages.

@aditya-hari

This comment has been minimized.

Copy link

aditya-hari commented Feb 4, 2019

Alright, I'll get right to that

@sannanansari

This comment was marked as off-topic.

Copy link

sannanansari commented Feb 5, 2019

And obviously, urllib3 documentation is most definitely going to contain a tutorial example, some prime use case, of what people usually use it. In that case, it probably is going to be downloading files via http and/or https protocols.

Hi @kayhayen I am Mohammed Sanan currently pursing my B.tech in computer science. I am interested in this project. I can learn more than 50 packages from this gsoc project.
Documentation helps.
screenshot 69
But after all fixing this error is occuring.

@kayhayen

This comment was marked as resolved.

Copy link
Member Author

kayhayen commented Feb 5, 2019

@sannanansari make sure to use the right kind of MinGW64, the one linked. This error is typical of using MinGW. The user manual has links to what is working.

@aditya-hari

This comment has been minimized.

Copy link

aditya-hari commented Feb 5, 2019

@kayhayen I cannot seem to get it to work with standalone mode. Get the warning "Cannot find urllib3 as relative or absolute import", which I am not sure what to make of, and it doesn't produce the .bin file. Works without standalone though.

Edit - Alright maybe I just have't been able to figure out how to use Nuitka. I tried to run PyQt5Using.py with --standalone and still don't get the .bin file. No warning though

@sannanansari

This comment has been minimized.

Copy link

sannanansari commented Feb 6, 2019

Ok. I will get right their.

@sannanansari

This comment has been minimized.

Copy link

sannanansari commented Feb 6, 2019

You would start by making a minimal urllib3 using program, compile it with nuitka, and run it standalone. You would then integrate it into the test suites mentioned above. Then, optionally try to package urllib3 as a compiled package, and run its tests against that. Then next package.

@kayhayen I have gone through this step. Nuitka compiler is compiling urllib3 code and generating.exe file also but --standalone is give an warning and not generating .exe file.
screenshot 72
Explain about this part? How can I fix this error?

@kamgha

This comment has been minimized.

Copy link
Member

kamgha commented Feb 6, 2019

@sannanansari It means that six.py is using the __import__() statement in line 82: https://github.com/urllib3/urllib3/blob/master/src/urllib3/packages/six.py#L82 . You'll have to find out what name (a Python module) can be in this case and pass it in via --include-plugin-files or --include-plugin-directory to your Nuitka command line parameters.

This is just a Warning and may not be mandatory in every case. Did you let Nuitka finish? Your screenshot does not look like it has returned to the command prompt, yet.

@sannanansari

This comment has been minimized.

Copy link

sannanansari commented Feb 7, 2019

Sorry for uploading incomplete but no output shown after that
screenshot 75
As you mention about finding python module it is builtins module. But it is giving an error when I am including --include-plugin-files/directory.
please ignore line from "trackback".

@kamgha

This comment has been minimized.

Copy link
Member

kamgha commented Feb 7, 2019

@sannanansari I think this is drifting off-topic. I've just compiled the example script from https://urllib3.readthedocs.io/en/latest/ with python -m nuitka --standalone test.py and it compiles fine and generates the executable. The warning is just a warning, it may cause trouble during runtime. builtins is part of Python, I doubt that you'd need it here, and it is not a directory.

@kayhayen

This comment has been minimized.

Copy link
Member Author

kayhayen commented Feb 8, 2019

Just to give the necessary background @sannanansari beyond what @kamgha said.

The six module is a popular Python2/Python3 compatibility layer, that often is included in the package itself, therefore living in the urllib3 namespace here. It does abstract some of the moved imports of Python3, and uses __import__ to execute its import.

Nuitka has a hard time to see through tricks like this, which is why it gives a warning, as an indication to the user, that something might go astray here. But it definitely doesn't have to be the case.

Sometimes programs, libraries have folders that they scan and load on the fly, e.g. plugins. These are also done via __import__ and in that case, Nuitka wants you to tell about such directories.

This is one of the usability quirks. I think Nuitka ought to be changed to not warn about "*.six" modules doing __import__ as these are all going to be harmless if they are six code. Plugins can suppress warnings, and such a check could easily be added. However, this wouldn't be on your plate with this issue.

You most often will see this warning and be able to ignore it. And where that is not the case, an ImportError will occur at runtime in standalone mode, and will be best explained by looking at this kind of warning.

I hope it clarifies things a bit. It is kind of expected that you won't know these yet. This idea will also expose you to this kind of usability issues of Nuitka naturally, but they are easy to learn.

@sannanansari

This comment has been minimized.

Copy link

sannanansari commented Feb 8, 2019

Ok.
I want you to see this two things.
screenshot 78
In first part C:\Users\Ansari\Anaconda3\python -m nuitka testing.py. It is giving me a warning but an executable is created and it is running.
But on second part >C:\Users\Ansari\Anaconda3\python -m nuitka --standalone testing.py. It is giving warning but no executable is created.

Being new I am not able to get concept of plugin. But as you say plugin suppressed warning here main problem is that it is not creating executable. Warning is coming in first part also but executable is created. I'm not understanding that.

@kamgha

This comment has been minimized.

Copy link
Member

kamgha commented Feb 8, 2019

@sannanansari Please give us the output of dir /S after you have run Nuitka with --standalone.

@sannanansari

This comment has been minimized.

Copy link

sannanansari commented Feb 9, 2019

screenshot 82
This Screenshot is of C:\Users\Ansari\Anaconda3\python -m nuitka testing.py but after seeing into directory I have seen new folder. In first testing.build and executable is created but testing.dist in not.

screenshot 81
C:\Users\Ansari\Anaconda3\python -m nuitka --standalone testing.py here testing.build and testing.dist is created not executable.

sorry for that I was only concentrating on executable file.

@sannanansari

This comment has been minimized.

Copy link

sannanansari commented Feb 11, 2019

Why standalone is important? When I can directly use Nuikta. Which is providing executable.

@JorjMcKie

This comment has been minimized.

Copy link
Member

JorjMcKie commented Feb 11, 2019

@sannanansari - the executable produced in non-standalone mode only works in presence of a (fully!) compatible Python installation.
A stanalone compile can be deployed to any compatible operating system - with or without a Python installation.

@JorjMcKie

This comment has been minimized.

Copy link
Member

JorjMcKie commented Feb 11, 2019

@sannanansari - when I say "a standalone can be deployed ...", then this means that the dist folder containing the executable is the deployable item: copy it to any compatible OS, and you will find the exe file in it usable.

Please also refer to the discussion around "One-File" distribution in this context (#230):
There is a request to transform the dist folder to a (big) exe-file that can be used directly to execute the original compiled script within it on the deployed-to system.
This request can be viewed as solved since a few days - for platforms Windows and Linux (thanks to @akshai9899), still missing Mac OSX for this.

@sannanansari

This comment has been minimized.

Copy link

sannanansari commented Feb 12, 2019

I have not seen it really sorry for that. It is working. I have shown my inexperienced here I apologized for that. So what should I do next?

@sannanansari

This comment has been minimized.

Copy link

sannanansari commented Feb 12, 2019

@kayhayen I can email you directly if you don't mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment