Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove external dependencies #51

Closed
wants to merge 3 commits into from

Conversation

balr0g
Copy link
Contributor

@balr0g balr0g commented Jul 11, 2014

This removes the file and ssdeep dependencies from the source tree.
These dependencies are available in distro repositories and should not be embedded as this leads to various issues.

The deps.sh script has been updated to install these packages. The other two C libraries, which are fairly obscure, remain in the source tree.

autoreconf will have to be run. (Ideally, ./configure should be kept out of the repo via .gitignore and only provided in release tarballs.)

@balr0g balr0g mentioned this pull request Jul 11, 2014
@devttys0
Copy link
Collaborator

Originally, deps.sh did install libfuzzy/libmagic from the repos. This was a source of many distro-specific headaches, particularly WRT libmagic.

Some distros (especially long-term support distros like RHEL/CentOS) had old versions of libraries in their repos which did not support some of the newer features that binwalk required.

Several versions of libmagic had a NULL pointer dereference bug that was triggered under some circumstances by the binwalk API. This was fixed in the latest libmagic release, and was not present prior to v5.10 (or something like that, can't remember the exact version number ATM).

This meant that you either had to have the very latest libmagic (which no repos ever do), or an older version of libmagic (but not too old, or you'd have the problems already mentioned above).

Ultimately this made getting binwalk working a royal PITA for some people, so the C libraries were just included and built along with binwalk. The libmagic and libfuzzy libraries are installed under different names (libinmagic and libinfuzzy respectively) so as to prevent any conflicts with existing or future copies of these libraries on the system. Only binwalk should know about and be using the libinmagic/libinfuzzy libraries that were installed during the binwalk build.

I'm open to suggestions, but it will take quite a bit of convincing to get me to remove these from binwalk at this point. There is a configure option to prevent building any of the C libraries, I could add additional options to specifically disable building/installing libmagic/libfuzzy for those who want to use the libraries from their respective repos.

Regardless, this patch cannot be accepted as-is, since it breaks binwalk completely; the ctypes wrappers used to interface with these libraries are looking for libinmagic and libinfuzzy, not libmagic and libfuzzy as they would be named when installed from the repo. Though it probably wouldn't hurt to fall back to those default system libraries if libinmagic/libinfuzzy couldn't be found, especially if the suggested configure options are added.

@balr0g
Copy link
Contributor Author

balr0g commented Jul 12, 2014

Would it be reasonable for deps.sh to detect the repo version and install a local version if it's too old?

It would be nice if deps.sh had a prefix option as well...

@balr0g
Copy link
Contributor Author

balr0g commented Jul 12, 2014

I did not see the libinmagic/libinfuzzy distinction... will take care of this.

@devttys0
Copy link
Collaborator

I don't think that having deps.sh detect the repo version is a good option.

First, I don't think running deps.sh should be a requirement to get a working install of binwalk. It's really just a convenience script to install all the optional stuff that people might want, in particular all the extraction utilities. So this wouldn't help people running distros not supported by deps.sh, or who want to do stuff manually (or just don't want the "extras").

Second, even if the library versions are OK now, there is no guarantee that they'll be OK in the future. Let's say, for example, that the libmagic version is old enough to not have the NULL pointer bug, but new enough to support all the features needed by binwalk. Later, the repo is updated with a newer libmagic (but not the newest) which does have the NULL pointer bug and all of a sudden binwalk starts breaking.

Third, the deps.sh solution would still require maintaining copies of the libraries in the binwalk repo, and I can't think of any conflicts that would arise from always building and using the bundled libraries.

Having separate copies of these libraries, especially libmagic, also allows me the freedom of using the latest-greatest features from the libraries without having to worry about what versions might be available from various repos.

As for having a deps.sh prefix option, can apt/yum be told to install to a prefix directory? I've never tried, but I didn't see any obvious options to do this.

@balr0g
Copy link
Contributor Author

balr0g commented Jul 12, 2014

First, I don't think running deps.sh should be a requirement to get a working install of binwalk. It's really just a convenience script to install all the optional stuff that people might want, in particular all the extraction utilities. So this wouldn't help people running distros not supported by deps.sh, or who want to do stuff manually (or just don't want the "extras").

With other software, people running their own distros generally are responsible for installing the required versions using the package manager, in /usr/local, or in the prefix of their choice.

Second, even if the library versions are OK now, there is no guarantee that they'll be OK in the future. Let's say, for example, that the libmagic version is old enough to not have the NULL pointer bug, but new enough to support all the features needed by binwalk. Later, the repo is updated with a newer libmagic (but not the newest) which does have the NULL pointer bug and all of a sudden binwalk starts breaking.

The configure script should check the installed version against the required version and bail out if the required version is not installed. I can contribute such functionality. What other libraries and versions does binwalk require? (By the way, red hat really should backport that NULL pointer bug to their version.)

Third, the deps.sh solution would still require maintaining copies of the libraries in the binwalk repo, and I can't think of any conflicts that would arise from always building and using the bundled libraries.

Would it? I thought the deps.sh could download the tarballs from upstream and install them.
To use a specific library, rather than renaming it, you'd use CFLAGS/LDFLAGS, or have a --libmagic-path parameter in the configure script, etc. Again this should be something I could contribute.

I'm unsure whether apt/yum can be told to install in a prefix — probably not. But a prefix is usually used for stuff compiled by hand so it doesn't clutter the system directories. (/usr/local is used for this reason, but some of us want /usr/local to contain what we put there and not what other software puts there.) Things installed via apt/yum will get updated with the system while things installed in the prefix will not.

@balr0g balr0g closed this Jul 12, 2014
@balr0g balr0g deleted the remove-external-dependencies branch July 12, 2014 16:23
@balr0g balr0g reopened this Jul 12, 2014
@balr0g
Copy link
Contributor Author

balr0g commented Jul 12, 2014

Added logic to the configure.ac to check the installed versions of libmagic and ssdeep/libfuzzy.

To override, specify PATH/LDFLAGS/CPPFLAGS/CFLAGS. I wish libfuzzy specified its version in the header file...

(This is not quite complete; I need to check the python code that finds the libraries since I don't think it looks in a prefix specified with these flags.)

@balr0g
Copy link
Contributor Author

balr0g commented Jul 12, 2014

It would be nice to be able to specify the LDFLAGS dirs to the binwalk python module so it could find libraries that are installed in a non-default prefix. What would be the best way to do this?

I'm thinking a libpaths.py file that the configure script can write the paths to.

@balr0g
Copy link
Contributor Author

balr0g commented Jul 12, 2014

At this point it's simply necessary to modify setup.py and the python code to write the compile-time value of the LDFLAGS environment variable somewhere and use it first in find_library().

@devttys0
Copy link
Collaborator

If you just want to install to a specified prefix directory, you should be able to do that at this point:

 ./configure --prefix=/foo --disable-libmagic --disable-libfuzzy

Then install libmagic/libfuzzy manually. As long as they are installed to a directory in LD_LIBRARY_PATH, find_library should find them.

@balr0g
Copy link
Contributor Author

balr0g commented Jul 12, 2014

I'm now running into this issue:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=754317

All I can think of is that the libmagic magic file format has changed and that the magic files that binwalk bundles need to be regenerated (why can't these be generated from libmagic source at compile-time?)

@devttys0
Copy link
Collaborator

With other software, people running their own distros generally are responsible for installing the required versions using the package manager, in /usr/local, or in the prefix of their choice.

For optional features (extraction, graphing, etc), yes, users of non-debian/RH distros will need to install the extra software themselves. But I'd prefer to make the core functionality of binwalk work "out of the box", regardless of your distro or package manager of choice.

The configure script should check the installed version against the required version and bail out if the required version is not installed.

Yes, my point is that this does not prevent the installed version from changing to some non-compatible version in the future. Bundling a known-good copy prevents this.

Would it? I thought the deps.sh could download the tarballs from upstream and install them.

It technically could, and I originally tried this, but it's a PITA for libfuzzy, as it is hosted on sourceforge which uses javascript to redirect you to a dynamic download URL each time (AFAIK, this is the only way to download the libfuzzy tarball). It was easier to bundle libfuzzy rather than parse javascript in deps.sh.

I'm now running into this issue:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=754317

Are you using libmagic 5.18, which is the version bundled with binwalk?

the magic files that binwalk bundles need to be regenerated (why can't these be generated from libmagic source at compile-time?)

How would these be generated from the source? They're just text files.

@balr0g
Copy link
Contributor Author

balr0g commented Jul 12, 2014

For optional features (extraction, graphing, etc), yes, users of non-debian/RH distros will need to install the extra software themselves. But I'd prefer to make the core functionality of binwalk work "out of the box", regardless of your distro or package manager of choice.

That's what deps.sh is for.
I can cite many references not to bundle libs:
http://fedoraproject.org/wiki/Packaging:No_Bundled_Libraries
http://wiki.gentoo.org/wiki/Why_not_bundle_dependencies
https://blog.flameeyes.eu/2009/01/bundling-libraries-for-despair-and-insecurity

It's bad for security and maintainability.

Yes, my point is that this does not prevent the installed version from changing to some non-compatible version in the future. Bundling a known-good copy prevents this.

The software should not depend on the behaviour of a specific version of a library if at all possible. This seems to be the case for libmagic.

It technically could, and I originally tried this, but it's a PITA for libfuzzy, as it is hosted on sourceforge which uses javascript to redirect you to a dynamic download URL each time (AFAIK, this is the only way to download the libfuzzy tarball). It was easier to bundle libfuzzy rather than parse javascript in deps.sh.

Nope, SourceForge uses standard 302 redirects. If I use curl -L or wget for the URL given at the ssdeep page, which is http://sourceforge.net/projects/ssdeep/files/ssdeep-2.10/ssdeep-2.10.tar.gz/download, it redirects me to and downloads the correct tarball (not necessarily the correct filename as wget doesn't respect server-side naming to prevent overwriting local files).

Are you using libmagic 5.18, which is the version bundled with binwalk?

I was using libmagic 5.19. I figure this version is incompatible?

How would these be generated from the source? They're just text files.

They appear to be organized very differently from the ones packaged with the libmagic sources, which means some sort of postprocessing was applied.

@devttys0
Copy link
Collaborator

That's what deps.sh is for.

That's not what I want or intended deps.sh to be for.

The software should not depend on the behaviour of a specific version of a library if at all possible.

I agree, but that is not the case for libmagic specifically. Newer versions of libmagic will even break the file utility, so I don't think this is a binwalk-specific issue.

Nope, SourceForge uses standard 302 redirects.

I stand corrected.

They appear to be organized very differently from the ones packaged with the libmagic sources

I'll have to take a look at how libmagic processes them, I didn't think anything special was done to the magic files at build time but I could be wrong.

So, at this point, what is the problem you're trying to solve? Can you install things to a prefix directory, or are there still problems with that?

@balr0g
Copy link
Contributor Author

balr0g commented Jul 12, 2014

That's not what I want or intended deps.sh to be for.
Would you prefer there to be a separate prereqs script that downloads and installs the specified versions of libmagic/ssdeep to /usr/local or the given prefix?

I agree, but that is not the case for libmagic specifically. Newer versions of libmagic will even break the file utility, so I don't think this is a binwalk-specific issue.

In that case it would be better suited for binwalk to install these libs in a prefix (say /opt/binwalk/ or /usr/local/opt/binwalk) and then link to the version in that location.

I'll have to take a look at how libmagic processes them, I didn't think anything special was done to the magic files at build time but I could be wrong.

Where did they come from?

So, at this point, what is the problem you're trying to solve? Can you install things to a prefix directory, or are there still problems with that?

With the autoconf and Python changes in the last few commits, yes. I'd like to make it so that the bundled deps are done in a cleaner way: with the code in the fork, it's currently possible to install with the following commands and it will work nearly seamlessly:

PATH=/path/to/prereq-prefix/bin:$PATH CPPFLAGS=-I/path/to/prereq-prefix/include LDFLAGS=-L/path/to/prereq-prefix/lib ./configure --prefix=/path/to/install-prefix

PATH=/path/to/prereq-prefix/bin:$PATH CPPFLAGS=-I/path/to/prereq-prefix/include LDFLAGS=-L/path/to/prereq-prefix/lib make install

prereq-prefix and install-prefix can be the same, but they don't need to be.

(It would be cleaner, at least to me, if instead of installing libinfuzzy/libinmagic, these libs are installed in a private prefix somewhere with the correct names. What would happen if everyone named their libs libin*?)

The part that's not seamless is that PYTHONPATH has to be declared when running binwalk to make it look in the prefix. This should be fixable by having the setup script modify the binwalk start script during install-time to declare the prefix location in the PYTHONPATH. I'm still looking into this.

@devttys0
Copy link
Collaborator

WRT to the printf format errors you're receiving from libmagic:

The pre-processed magic files included with the libmagic source code for file-5.18 contain the offending format strings such as 'edition %lu'. These offending format strings are not present in the pre-processed magic files included with the libmagic source code for file-5.19.

Further, the only processing that I see being done on the magic signatures are that they are "compiled" using the 'file -C' command. Even then, the Makefile specifically checks to see if the installed file utility is the same version as the one it expects, else it refuses to run the command; this suggests that the magic file format is closely tied to the file/libmagic version.

@balr0g
Copy link
Contributor Author

balr0g commented Jul 12, 2014

The pre-processed magic files included with the libmagic source code for file-5.18 contain the offending format strings such as 'edition %lu'. These offending format strings are not present in the pre-processed magic files included with the libmagic source code for file-5.19.

libmagic chokes because it is provided the magic files in src/magic by binwalk. I can't figure out how these were derived, but they probably contain the offending format strings.

Further, the only processing that I see being done on the magic signatures are that they are "compiled" using the 'file -C' command. Even then, the Makefile specifically checks to see if the installed file utility is the same version as the one it expects, else it refuses to run the command; this suggests that the magic file format is closely tied to the file/libmagic version.

Are the files in src/magic processed this way, and by what criteria?

@devttys0
Copy link
Collaborator

In that case it would be better suited for binwalk to install these libs in a prefix (say /opt/binwalk/ or /usr/local/opt/binwalk) and then link to the version in that location.

Agreed, that's probably a cleaner option. I'll update the install to put the libs in /opt/binwalk.

This should be fixable by having the setup script modify the binwalk start script during install-time to declare the prefix location in the PYTHONPATH.

It would be preferable for this to be in a config file, rather than changing source code on the fly.

libmagic chokes because it is provided the magic files in src/magic by binwalk. I can't figure out how these were derived.

It sounds like libmagic chokes because it now doesn't like certain format strings which older versions of libmagic didn't mind (and, in fact, which older libmagic signatures themselves used). The file in /tmp that binwalk ultimately passes to libmagic is generated dynamically at runtime; some signatures are completely excluded depending on the include/exclude rules specified on the command line, but the signatures themselves are not modified.

Are the files in src/magic processed this way, and by what criteria?

No. The process that libmagic uses to "compile" signatures is not necessary and is only used for speeding up the loading of the magic file at runtime. Since binwalk dynamically generates its magic files based on runtime criteria, the "compiling" is not done.

@balr0g
Copy link
Contributor Author

balr0g commented Jul 12, 2014

Agreed, that's probably a cleaner option. I'll update the install to put the libs in /opt/binwalk.

Take a look at the changes to the python side of things that I made in this fork; they make it so that it looks for libs in the locations that were specified by LDFLAGS at compile time. In order to do this the setup.py creates a new .py file containing a list of LDFLAGS paths (there can be more than one).

I don't think the two libs should be bundled and installed by ./configure, but rather downloaded and installed by a separate script.

It would be preferable for this to be in a config file, rather than changing source code on the fly.

Where would a config file go? The purpose of this start script is to run the binwalk code located in the python prefix.
On the other hand it does seem that it's customary to have to specify PYTHONPATH before starting a python-based program that's been installed in a local prefix.

It sounds like libmagic chokes because it now doesn't like certain format strings which older versions of libmagic didn't mind (and, in fact, which older libmagic signatures themselves used).

I figured out as much.

No. The process that libmagic uses to "compile" signatures is not necessary and is only used for speeding up the loading of the magic file at runtime. Since binwalk dynamically generates its magic files based on runtime criteria, the "compiling" is not done.

I'm not talking about this in particular.

Rather, I'm asking where these signatures came from, since they're so differently organized from the ones that are packaged with libmagic.

Also, how does src/binwalk/magic/binwalk get generated from the files in src/magic? Does it even? There seem to be duplicate signatures between the two so I'm figuring it does some way.

@balr0g
Copy link
Contributor Author

balr0g commented Jul 12, 2014

Considering that is it customary to have to specify PYTHONPATH on the command line before starting a python-based program that's been installed in a local prefix, I think I'd rather leave the binwalk start script alone.

@devttys0
Copy link
Collaborator

Take a look at the changes to the python side of things that I made in this fork; they make it so that it looks for libs in the locations that were specified by LDFLAGS at compile time.

Thanks, I'll check them out!

Rather, I'm asking where these signatures came from

Some were written from scratch by me. Some were written by other users and contributed to the project. Some were copied from the original libmagic magic files, but most of those needed to be customized for things like false positive detection anyway.

Also, how does src/binwalk/magic/binwalk get generated from the files in src/magic? Does it even?

Yes, the files in src/magic are simply concatenated into a single file by setup.py.

@balr0g
Copy link
Contributor Author

balr0g commented Jul 12, 2014

I'm writing a prereqs.sh installer script.

Some were written from scratch by me. Some were written by other users and contributed to the project. Some were copied from the original libmagic magic files, but most of those needed to be customized for things like false positive detection anyway.

That's reasonable. Any reason bad false positives, and improvements in general shouldn't be contributed upstream? Having a format similar to upstream would make this much easier.

Yes, the files in src/magic are simply concatenated into a single file by setup.py.

If this is the case, why is src/binwal/magic/binwalk in the repo?

@balr0g balr0g mentioned this pull request Jul 12, 2014
@devttys0
Copy link
Collaborator

Any reason bad false positives, and improvements in general shouldn't be contributed upstream?

Binwalk adds "extensions" to the libmagic format, like false positive detection, that aren't supported by libmagic, and in the intended context of libmagic/file don't really add much.

If this is the case, why is src/binwal/magic/binwalk in the repo?

It shouldn't be. It's a hold-over from older binwalk versions which could update the signatures file (--update option), which would pull this file from the latest repository. Since this feature has since been removed, the src/binwalk/magic/binwalk file can be excluded from the repo.

@devttys0
Copy link
Collaborator

Ultimately, I still prefer having the libs included as they are, but you can opt not to build them using the --disable-libmagic and --disable-libfuzzy configure options if that's not what you want.

The --prefix should now be properly honored for both the install and uninstall makefile targets as well, so you can install binwalk wherever you see fit.

@devttys0 devttys0 closed this Jul 13, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants