Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v.1.1.28.1. Added support for running Opy utility in an alternate co… #32

Merged
merged 23 commits into from
Dec 14, 2018

Conversation

BuvinJT
Copy link

@BuvinJT BuvinJT commented Oct 15, 2018

Added support for employing Opy in an alternate context, i.e. as an import that another python script can use to provide a more robust packaging process. The obfuscation can then acts as one, programmatically driven, "stage" within that.

Added mask_external_modules option/feature to slightly improve the obfuscation.

JdeH and others added 6 commits December 16, 2017 15:03
…text, i.e. as an import from another script.

Enabled installation as a site-packages library.  Changed setup.py to use distutils.core rather than setuptools.
Added __init__.py to make this into a standard library.  Added opymaster.py module as a bridge between the library and the original module.
Changed reference in opy.py to __builtins__ to __builtin__ (no "s") after importing that module.  This resolved a problem with supporting the alternate contexts.
Added unit test for the new context.
…ls.core back to setuptools module. Refactored opymaster.py module into settings.py module, and eliminated the need to copy that into a project when implementing opy in the original manner. Expanded upon / improved the readme.
…is able to employ for the extended configurations rather than requiring an external hard file.
…y provides aliases for those external imports which are set to not be obfuscated, before the main obfuscation process. When the main process then runs, the result is to obfuscate those aliases, thereby increasing the total amount of obfuscation.
BuvinJ added 3 commits October 27, 2018 16:09
…into a separate module. Added options features: dry_run and subset_files. Added tracking of import identifiers which are obfuscated or not. Added analyze() function to library, drawing upon each of these enhancements.
@JdeH
Copy link
Member

JdeH commented Nov 5, 2018

Hi, thanks for all the work.
Currently I am a bit busy but I'll look into you PR's as soon as I find the time.
Thanks in advance.

Jacques

@BuvinJT
Copy link
Author

BuvinJT commented Nov 5, 2018

Great! No rush.

This is a really cool program, btw. There is a definitive need for this, and great many engineers could benefit from it.

I have more enhancements (and likely patches to those!) on the way. I'm working on this in parallel to another library (currently closed source) which uses this as a resource, and that is helping to drive the development here. I'm trying my best to minimize my changes to your code, and as you'll see, on a few occasions I changed or added something, and then found a cleaner way to get the original work back closer to what it was.

If you find you that you have the time, I'd like to discuss a few things. I have both suggestions and questions.

@geatec
Copy link

geatec commented Nov 5, 2018

Ok, I'll get back to you on this, probably within 14 days!
Currently working against a deadline for a customer...
Jacques (sorry, used my company git account)

BuvinJ added 9 commits November 5, 2018 16:31
…tched glitch in library method of wrapping the utility (i.e. using module reload rather than simply import).
…es of known problems to be resolved. Updated documentation.
…, this will allow for leaving all publicly accessible module members in clear text. The value in that option is for library obfuscation, where many (public) identifiers must be preserved in clear text. Perhaps more importantly, this includes the first example implementation of the ast module (a built-in comprehensive Python language parser), which may have many applications moving forward.
…rnal modules" references in clear text, you may instead opt to bundle the source of those into your obfuscated version of the code, so that you can obfuscate that as well. Where upon you might find it necessary (or easier) to modify your imports during the obfuscation process.

Added new beta feature "prepped_only".  Similar to dry_run, prevents the production of obfuscated results. Instead, the clear text, "pre-obfuscation stage" of the files will be produced.  This includes module "replacements", "masks", string obfuscations, etc.
… patches" to obfuscated results. When the utility (or the user configuration) isn't quite working as desired, this let's you just tweak specific file lines using functions e.g. "replaceInLine".
…y 2 & 3. Added six library as requirement.
…ry analyze function. This allows wrapper scripts to map clear text paths to the obfuscated results. The primary purpose at this point being for use with OpyFile patching.
@JdeH
Copy link
Member

JdeH commented Dec 2, 2018

Hi,

How time flies... Do you feel all of this has stabilized enough for me to take a thorough look in anticipation of merging the pull requests or do you expect there's more to come / change? In the latter case maybe I'd better still wait a little. What do you think?

I'd like to do as much as possible in one go, since it'll require some testing (which no doubt you'll also have done already).

Kind regards
Jacques

@BuvinJT
Copy link
Author

BuvinJT commented Dec 2, 2018

Hi Jacques,

It's pretty reliable right now. I've documented the known bugs and weakness, both in the readme, and in examples. I was focused on Python 2 to begin with, but I just ironed things out for Py3 and pushed that. The dual support is one of the major advantages of your project over other such work, so it was important to me that be preserved.

I am developing this as a supporting component to a larger project, for building distributions. I'm close to posting a preliminary version of that. I think you'll really want to check that out when I make it available, because the two of these support each other. (should be a matters of days now max, before I post that)

One thing I really want to get working here is the need for leaving "external" imports in clear text. In my wrapper project, I've started rolling in pip to gather source code from such libraries, which can then be bundled into a distribution, and the whole thing obfuscated via Opy. In a dream scenario, I wish that could be made to work with minimal effort from the client.

I added a "parser" module as you'll see. In part, I do a lot of ugly, painful, manual parsing of the language. But then, in another feature I started employing the "ast" library - which is the right way to achieve such. The ugly parts of my code should be rewritten to use that awesome library instead at some point.

As for the specific merge question, I recommend merging my fork into your project on an alternate "develop" branch. That provides the benefit of including my work, but there is no immediate need to brand it "release" quality today.

@BuvinJT
Copy link
Author

BuvinJT commented Dec 2, 2018

Perhaps, bump this "develop" branch to v 1.2?

Note that everything I added is optional, btw. So nothing is required to be used that really diverges from your work. Other than the new "masking" feature, I believe that I defaulted all those things to NOT being employed automatically.

…ing the same values as provided by (the "dry run") analyze function.
BuvinJ added 3 commits December 6, 2018 20:55
…bfuscate and analyze functions, rather than a tuple, making the return values more easily expandable / "future proof" compared to unpacking a tuple with an expected order / length.
@BuvinJT BuvinJT force-pushed the master branch 2 times, most recently from 87e3c62 to 39ab183 Compare December 11, 2018 13:54
@BuvinJT
Copy link
Author

BuvinJT commented Dec 11, 2018

I'm hoping that I can leave this alone for the moment now, to give you time to review and merge it. I discovered today that I had been adding commits with emails/user names that I didn't want in an open source / public repo, however. As such, I had to run a script to fix that and perform a forced push. If this rendered it difficult to merge those commits into your original repo, let me know. I'll figure out how to correct it by forking from your work again then reapplying my changes on top of that.

@BuvinJT
Copy link
Author

BuvinJT commented Dec 11, 2018

I released a version of the "counterpart" project to Opy I've been talking about, i.e. 'Distribution Builder". Here's the url: https://github.com/BuvinJT/distbuilder
It's still an early beta, but it does function, and will give you a good idea about what I'm doing there and how I'm wrapping and building upon Opy.

@JdeH
Copy link
Member

JdeH commented Dec 13, 2018

@BuvinJT

Since you want to use Opy as part of another Python application c.q. library, I've made it easier to import Opy as a module. This work is in the opy_as_a_module branch. Sorry that this comes so late but I couldn't find the time until now.

Example on how to use Opy as a module, from opy/developmen/tests/dog_walker/obfuscate.py:

import sys

sys.path.append ('../../../..')

import opy

print (opy.run ('plain_code', 'obfuscated_code', 'obfuscate.cnf'))

I've read to your additions and I'd like to provide some feedback, starting with what in my view is most urgent. If I understood right you've introduced the use of the AST module. However, abstract syntax trees are Python version dependent, e.g. AST's for Python 3.6 and 3.7 differ and for 2.7 and 3.7 differ quite a lot.

While a much more powerful version of Opy is possible using AST's, they were deliberately avoided upto now to prevent version dependencies, as Opy should be usable from Python 2.7 upto 3.7 and further.

If AST's are used, it's a whole different ballgame: more flexibility in obfuscation, much simpler code, but also: no version independency anymore.

What's your take on this?

Kind regards
Jacques

@BuvinJT
Copy link
Author

BuvinJT commented Dec 13, 2018

Thanks for reviewing this work and getting started on incorporating it, Jacques!

First, regarding the import:

I converted Opy into an actual library, installable via pip and then importable from anywhere. That is a necessity, especially when it is nested inside another library which also works that way. So, we should preserve that and not require something like appending on to the sys.path.

Also note that when my fork runs opy, it returns a collection of results (like a clear to obfuscated file paths dictionary, the words which were obfuscated, etc.) - or an "analyze" function can be called which works in similar way, but employs my "dry run" option to not actually create any files. Both of those of hard requirements for my wrapper project.

Regarding AST:

I will not presume to know much on the subject. I never used AST until I did so here. But, the little bit that I used it, there were no issues with crossing versions.

I love that Opy works in 2 and 3. Every competitor only works in 3 that I've seen. Yet, I personally, have tons of v2 legacy code I would like to employ this on as well. I think for sure that feature must be preserved, as a way to differentiate your work from others.

That said, if a better product can be produced using AST, I'd have to argue in favor of that. Better trumps shorter/cleaner. I want to seriously protect proprietary code, or code which is security sensitive. I love Python, but the fact that it is inherently in clear text, and even a standalone version of a program (via pyinstaller or py2exe) can be reversed engineered back to the original code, is a HUGE problem.

If using AST requires more code, and/or explicit checks for v2 / 3, then so be it. Speed is not overly relevant. No one needs this process to be lightning quick. Also, the present length of the code is pretty minimal right now (great job by the way doing so much with so little!), if the code base swells to double or triple that's still very short.

@BuvinJT
Copy link
Author

BuvinJT commented Dec 13, 2018

I don't know if you ever used Qt? I do daily, as I'm also a C++ developer. Qt just released a whole new version of "PySide", called "PySide 2". That is Qt for Python! It lets you create interfaces using all of the Qt Library, or even QML. That's like a dream come true, because Python is the "king of the backend", and Qt is the "king of the frontend" arguably. Now you can use both in one project.

Anyway, I've recently spoken with the lead developer of PySide 2 face-to-face. What they are actively working on right now is exactly what my "Distribution Builder" library does. I asked if they would build in an obfuscator, and the answer was they didn't yet plans for it, but the concept is intriguing. When I shared that I already wrote what they are planing (wrapping PyInstaller, Qt Installer, etc), plus an obfuscator built-in, there was interest in getting my work if I wanted to share it open source (as they would in turn).

My point being, I'd like to present my project to Qt, along with this awesome Opy component. If we have this well enough developed, the work could get rolled into the PySide project potentially. In which case, it's likely to be employed by a gigantic number of users (even if they don't know it!).

@BuvinJT
Copy link
Author

BuvinJT commented Dec 13, 2018

For my own personal plans, the reason I'm developing these libraries is because I'm creating a large scale project in Python that I want to sell commercially. I can't have the source readily accessible, thus this tool is critical.

Also, I'm breaking my project into a slew of libraries, which all come together in to form the final product. I want to be able to share obfuscated versions of the libraries with multiple collaborators, as they work on their own library components in clear text. Then, they could build and run the big main product, with the new code they are developing on the fly, to confirm that it works in the target context. But, each of those collaborators would not need to have the clear text work from everyone else, or the option of stealing the project on the whole.

@BuvinJT
Copy link
Author

BuvinJT commented Dec 13, 2018

I skipped over the point you made regarding "future proofing" with AST. That is important for sure. It's very hard to be confident that any code is future proof though. Python 4 could break away from 3 in any number of manners. Some of the major Python libraries created for 2 were not immediately made available in v3. For instance "Twisted" was only fully function in v3 for a long time (and I'm not sure if it's finally squared now?). Anyway, I think that the utility could just write a warning to stderr when run on a new Python release, stating that it has not yet been tested and confirmed.

Potentially, we might want to consider developing formal unit tests too. Where we run a series of short, atomic tests, and then confirm the results are as expected. Displaying SUCCESS/FAILURE on the screen as each is tried. With that in place, we'd readily identify bugs in future Python releases.

@JdeH JdeH changed the base branch from master to opy_db December 14, 2018 11:24
@JdeH JdeH merged commit 81335da into QQuick:opy_db Dec 14, 2018
@JdeH
Copy link
Member

JdeH commented Dec 14, 2018

@BuvinJT

Thanks for your clear explanation of what's required for the distbuilder project.
I think that the possibility to distribute obfuscated versions of Python software will contribute to the popularity of Python,
although I hope that most Python software will remain human readable!

About your additions to Opy:

Using the AST module has some drawbacks with regard to Python version independency.
But if you decide to use it in your fork it opens up a world of possibilities.

Opy's simplistic way of parsing, while version independent, results in a number of limitations.
At the core of Opy is a bag of tricks to circumvent these limitations.
Once you have the AST at your disposition, there's no need to perform those tricks,
like replacing strings by placeholders and then, after obfuscation, replacing them back again.

Also all of the following restrictions would completely disappear if you use the AST everywhere:

  • A comment after a string literal should be preceded by whitespace.
  • A ' or " inside a string literal should be escaped with \ rather then doubled.
  • If the pep8_comments option is False (the default), a # in a string literal can only be used at the start, so use 'p''#''r' rather than 'p#r'.
  • If the pep8_comments option is set to True, however, only a # cannot be used in the middle or at the end of a string literal
  • No renaming backdoor support for methods starting with __ (non-overridable methods, also known as private methods)

If you have the parse tree at your disposal, you can easily distinguish between e.g. names of variables and functions and the contents of string literals.
You can also easily find out what the imported modules are.
You can take apart expressions and put them back together again in an obfuscated way and many other things.
In short, obfuscation based on the AST is far superior to what Opy currently does.

So if you decide to use the AST anyhow, I think it's better to base the whole obfuscation on that.
I anticipate that, from where you are now, gradually you'll use the AST more and more,
moving away from Opy's regular expression based parsing scheme, since AST's will give you much more flexibility.
But since you've already invested quite some time, it may indeed be a gradual evolution.

My suggestion to leave Opy more or less "as is", maybe with some simple improvements,
and have separate branch for distbuilder, including AST based obfuscation, as it is a different approach.
For now I've called it opy_db and branched it from the current master branch, as that's what your code is currently based on.
Furthermore, maybe it would be a good thing to reserve the name opy_db at PyPi.
If you prefer I can do it for you.
Of course you can pick any other name you like, or just make it part of the distbuilder project.

KR
Jacques

@BuvinJT
Copy link
Author

BuvinJT commented Dec 17, 2018

Sorry for the slow response! I've been ill, and haven't able to work for several days.

Thank you for adding the branch! Also, I really appreciate it if you would be able to set up the PyPi hooks for it. I've never done that myself, but do plan to now for the distbuilder project. Since Opy is your project, and I'm just tacking on features, it probably makes sense for the PyPi registration etc. for it to be in your name / control.

1 minor request I will make is to change the name of the branch, PyPi project to opy_distbuilder or opy_dstbldr to reduce characters perhaps. The trouble with the suffix "db" is that's the standard abbreviation for database - so I didn't want to create confusion for anyone who might be lead to think this was related to something entirely different.

In theory, rewriting Opy to use AST at it's foundation would be a great thing to get done for sure. But, as this is a side-project on a side-project, and few layers deep, I can't expect to realistically get that done any time soon. Using AST selectively will have to suffice for the moment.

@BuvinJT
Copy link
Author

BuvinJT commented Dec 17, 2018

With this new long lived, parallel branch, we will need to work out a name / version resolution. Should the branch still install a library named "opy" or should it install one called "opy_distbuilder"? Should the new branch start over on the version number, be kept the same as opy, or skip ahead of it? This is extremely important for the distbuilder, because it will define package/version requirements and automatically manage them via pip.

@JdeH
Copy link
Member

JdeH commented Dec 21, 2018

@BuvinJT

I've renamed opy_db to opy_distbuilder for clarity, as you proposed.
It may be wise to consider that the master branch of the opy_distbuilder variety of opy.
So maybe your development should happen on branches derived from opy_distbuilder, aiming for opy_distbuilder itself to evolve into a stable, tested branch, I leave that to your own judgement.
In general I'd like to lay the responsibility for the opy_distbuilder branch with you.
In principle I'll accept pull requests from you for that branch without retesting.

Since these branches diverge, lets decouple the versioning.
It isn't a problem if e.g. there would be an opy 4.0.0 and an opy_distbuilder 4.0.0.
The distinction is still clear from the names opy and opy_distbuilder.

Having independent versions means you can hand out version numbers for opy_distbuilder at will,
keeping maximal control of the version relations between distbuilder and opy_distbuilder.

Having dependent versions, on the other hand, would suggest co-evolution,
which may not always be the case.

Of course (parts of) the code may converge in the future, one including fruitful new parts of the other and vice versa,, but that doesn't pose any problems.
I think of them as sibling projects, based on the same principles, but sufficiently different to have different (but related) names.

@BuvinJT
Copy link
Author

BuvinJT commented Dec 26, 2018

Thanks for the branch rename. I will treat that like the master for my fork now.

I setup both distbuilder and opy_distbuilder on PyPi. Executing pip install distbuilder will install both packages on your machine. Note that while the package which is installed is called opy_distbuilder, it is still being imported as opy. So long as the original is not installed into site packages, that will work without conflict generally speaking. Let me know if you'd prefer changing that.

I restarted the versioning on this branch / fork, branding it as a beta release v.0.9.0.

On the PyPi registration, I listed you as the primary author, followed by myself, and then I set the email address to point to me. I assigned the "home page" to your GitHub page, on the opy_distbuilder branch.

When you get a chance, perhaps you could check the PyPi details, and give me your approval? There are also a few meta data changes I made, and pushed to my fork, which should be merged into the dedicated branch on your repo sometime. With that done, I'm hoping to leave this alone for a little while (or at least not have a merge request for a few months to bother you with).

My distbuilder project will be actively developed for a little while. There are a few important things I still need to get on that to even bump it from "alpha" to "beta", but I think you'll find it interesting / useful in its present state if you want to play around with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants