New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lxml required, not in requirements or setup.py #19

Merged
merged 2 commits into from Aug 10, 2014

Conversation

Projects
None yet
6 participants
@deanmalmgren
Owner

deanmalmgren commented Aug 6, 2014

It will pip-install cleanly, but when you run it, it blows up.

On Ubuntu, required:
sudo apt-get install libxml2-dev libxslt1-dev
pip install lxml

I know the apt-get stuff is out of scope, but maybe it'll help someone if they do a search for this problem in the future.

@StevenMaude StevenMaude referenced this pull request Aug 4, 2014

Closed

Add lxml to requirements #22

@StevenMaude

This comment has been minimized.

StevenMaude commented Aug 4, 2014

pip install python-docx installs lxml too; just tested out in a virtualenv.

@deanmalmgren

This comment has been minimized.

Owner

deanmalmgren commented Aug 4, 2014

@ShawnMilo Can you explain the steps that you took to have it "blow up" and show the output? Was the problem that lxml wasn't installed or was it something else?

My understanding (as @StevenMaude points out) is that python-docx installs lxml through its requirements, so by listing python-docx in this package's requirements this should recursively install lxml by just running pip install textract. If you had a different experience, I'd love to get to the bottom of it. I've only really tested this on Ubuntu and in travis-ci and I'm sure there are some odds and ends to fix so this is more portable to other operating systems.

@clue

This comment has been minimized.

clue commented Aug 5, 2014

Tested this in a docker container (using the minimal Ubuntu 14.04 base image), and in fact libxml is being installed automatically. However zlib is not:

$ pip install textract
…
x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/src/lxml/lxml.etree.o -lxslt -lexslt -lxml2 -lz -lm -o build/lib.linux-x86_64-2.7/lxml/etree.so

/usr/bin/ld: cannot find -lz

collect2: error: ld returned 1 exit status

http://pastebin.com/M6jBbCwD

Not sure if this is the error you're running into (some logs would be helpful?), but this can easily be fixed by installing zlib:

$ sudo apt-get install zlib1g-dev

(The resulting installation instructions for my textract docker image can be found here)

@deanmalmgren

This comment has been minimized.

Owner

deanmalmgren commented Aug 5, 2014

@clue ++ for using docker. nice! I've never used it but I'm using a pretty bare bones Vagrant box myself (Ubuntu 12.04).

I'm a little surprised by your problem though and I suspect that it may have been due to the ambiguity of the installation instructions (?). I just merged some changes to the documentation in #21 that will hopefully clarify the necessary steps on Ubuntu. Can you verify that you first ran the sudo apt-get install python-dev libxml2-dev libxslt1-dev antiword poppler-utils before running pip install textract?

If this is still an issue for you, I'd be happy to hop on a google hangout to try and figure out whats going on.

@clue

This comment has been minimized.

clue commented Aug 5, 2014

Awesome, thanks for the update @deanmalmgren!

I've just confirmed your install instructions to work on an Ubuntu 12.04 LTS docker image. The only package that was missing in the list is python-pip. Also, zlib1g-dev is clearly being installed as part of the above instructions (I have yet to check which package adds the dependency).

However, the 14.04 LTS image does not install zlib1g-dev automatically and hence fails during the pip install textract phase. Installing this package manually fixes this issue.

It should be safe to explicitly add the packages to the install instructions even if they might already be installed implicitly. Putting this together, the complete install instructions that do work on both Ubuntu 12.04 and 14.04 are:

$ sudo apt-get install python-dev libxml2-dev libxslt1-dev antiword poppler-utils python-pip zlib1g-dev

Just in case you're new to Docker, you can use the following instructions to test this out in a temporary docker container that will be discarded automatically:

user $ docker run -it --rm ubuntu:12.04 bash # or 14.04 respectively
$ apt-get update
$ apt-get install python-dev libxml2-dev libxslt1-dev antiword poppler-utils python-pip zlib1g-dev
$ pip install textract
@deanmalmgren

This comment has been minimized.

Owner

deanmalmgren commented Aug 6, 2014

@clue Thanks for looking into the differences between 12.04 and 14.04. oy ve.

Adding zlib1g-dev to the dependencies sounds like a great idea. I'll attach a PR to this issue momentarily.

I'm inclined to not include python-pip in the same capacity as it isn't required for textract to work (but it is required to install the package). There are lots of ways to install pip—fabtools, for example, installs the most recent version of pip from this nifty script. python-pip is installed by default in the travis-ci machine and we handle it in the Vagrant development environment in the travis-mock.sh provisioning step.

@deanmalmgren

This comment has been minimized.

Owner

deanmalmgren commented Aug 6, 2014

Before I merge this in, @pr3d4t0r tweeted at me (sorry to rope you into the conversation). Is that zlib1g-dev requirement a bug with docker not having the right compression libraries installed or is it a general problem with Ubuntu 14.04?

From clicking through the dependencies of libxml2-dev on 14.04, it also looks like zlib1g should be installed. I guess I'm trying to decide if this should be a side note for people that are installing in docker or whether its a general 14.04 issue.

@pr3d4t0r

This comment has been minimized.

pr3d4t0r commented Aug 6, 2014

@deanmalmgren - The issue is that Ubuntu 14.04 LTS has a somewhat incompatible init system sequence that isn't quite ready for Docker containers, so instead it's recommended that you use the phusion/baseimage which strips out and describes a lot of Ubuntu start up stuff that may affect container behavior. I don't see how zlib1g-dev is removed, but that's what I have as background.

The fix, as you saw from my blog post, is easy. Just a little bit of snoop work sorting the dependency out.

I'm watching this repo now, I'll be happy to add any insights I can contribute.

Cheers!

@ShawnMilo

This comment has been minimized.

Contributor

ShawnMilo commented Aug 6, 2014

Docker works fine with Ubuntu 14.04 without any third-party tools using the instructions here:
http://docs.docker.com/installation/ubuntulinux/#ubuntu-trusty-1404-lts-64-bit

Search for: "If you'd like to try the latest version of Docker:"

It just has you add a new source to apt.

@pr3d4t0r

This comment has been minimized.

pr3d4t0r commented Aug 6, 2014

@deanmalmgren and @ShawnMilo - I venture that the issue manifests in boot2docker's stripped VM, not in the Linux version? The fact remains -- if zlib1g isn't installed, textract won't build in our environment. I will ask @ananelson to check if it builds without that; she runs Docker on a Linux workstation instead of OS X/boot2docker.

Wednesday Morning Mystery™ :)

@ananelson

This comment has been minimized.

ananelson commented Aug 6, 2014

I haven't tried installing textract on Docker yet, but where other packages have an lxml dependency I install that in advance because there tend to be problems.

Here's an example of how I have done it:
https://github.com/ananelson/oacensus/blob/develop/docker/Dockerfile#L37

@deanmalmgren deanmalmgren merged commit 34cc351 into master Aug 10, 2014

deanmalmgren added a commit that referenced this pull request Aug 10, 2014

@deanmalmgren

This comment has been minimized.

Owner

deanmalmgren commented Aug 10, 2014

Thanks for helping to clear this up, everyone! I ended up adding a note to the installation instructions for Ubuntu/Debian in the case that you're using Docker. I think this is the most appropriate way to handle this situation, but I'm open to other options.

For what its worth, I also included zlib1g-dev in requirements/debian so that at least the provisioning should work on docker instances if you're so inclined to reuse those scripts.

Cheers!

@deanmalmgren deanmalmgren deleted the ubuntu-1404 branch Aug 10, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment