New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add pandas #1522
add pandas #1522
Conversation
I'm only concerned about the ATLAS usage, which is a painful beast to Ciao, On 2 May 2015, at 19:12, Tai Sakuma wrote:
|
@ktf, Ok. I will try without ATLAS. |
@ktf, I updated the PR. It won't add ATLAS any more. |
Thanks. @Degano can you try this out? I would say let's get this PR in and then let's have a |
@ktf @TaiSakuma I'm testing it right now. |
@ktf @TaiSakuma I found out an issue while building the packages that depends on the ones modified with this PR: py2-pyfits was not completing correctly. |
Fix deletion of egg-info in pyfits.
@Degano, thank you for the test and fix. I merged your PR. |
+1 |
This PR adds pandas.
Following the presentation at the RECO/AT meeting on Thursday, 30 April 2015, I tried writing spec files for pandas and its requirements.
This is my first PR to this repo. I am not familiar with the development of this repo. I might not be doing right.
First, this PR is made to the branch IB/CMSSW_7_5_X/stable. I am not sure if this is the right branch. Please let me know if this PR should be made to another branch.
pandas depends on three other packages. It also has recommended and optional dependencies. These are listed at:
This PR upgrades NumPy and python-dateutil to meet the requirement.
The website above states that recommended dependencies provide large speedups for large data sets, which might be very useful. But this PR doesn't install those recommended dependencies. I could try to add them as a separate PR.
This PR adds ATLAS as NumPy uses it. The compilation of ATLAS is probably the most complicated part of this PR.Comments of packages
I will add comments below for several packages.
ATLAS
(This PR no longer adds ATLAS)
This PR adds ATLAS, which NumPy uses for fast calculation of linear algebra. ATLAS is in the branch IB/CMSSW_4_4_X/stable. I copied the spec file from the branch and updated to use the most recent verion of ATLAS.
Tuning
The performance of ATLAS depends on how it is tuned at the compile time. This PR uses only
-b 64
for the tuning. This might not be the best option. Or this might not work on OSX.Netlib LAPACK
This PR uses the "--with-netlib-lapack-tarfile" option, which will compile netlib's LAPACK and install it.
However, the cmsdist also has lapack.spec. So there will be two instances of Netlib's LAPACK.
Shared objects
I modified Makefile such that shared objects are built in the way in which NumPy and other packages appear to expect.
The configuration option --shared creates two share objects: libsatlas.so and libtatlas.so, which include all symbols for serial and parallel API's respectively, which are not other packages that use ATLAS expect and find by default. For example, see a stackoverflow post.
This PR builds those two shared objects but doesn't install them.
Instead, this PR builds seven share objects, each for one archive file and install them. Those share objects are libatlas.so, libf77blas.so, libcblas.so, liblapack.so, libptf77blas.so, libptcblas.so, and libptlapack.so.
NumPy
This website says that shared objects for BLAS, LAPACK, and ATLAS can be specified by environmental variables. However, this wasn't the case when I tried. So instead, this PR uses site.cfg to specify those.
pandas
I modified setup.py so that setuptools won't compile NumPy again. I think setuptools are supposed to check if requires are installed and if not then install them. However, it would always try to install NumPy even when NumPy is installed. I couldn't figure out why. So I just removed several lines in setup.py that indicate the requirement of NumPy.
I removed the NumPy requirement from both
setup_requires
andinstall_requires
. If it is insetup_requires
, the setuptools would compile NumPy before compiling pandas. If it is not insetup_requires
but ininstall_requires
, the setuptools would compile NumPy after compiling pandas.Test the branch
I tested my branch with the following commands on
lxpluscmsdev.pandas is compiled.
I don't know how to enter the environment that was built. So I didn't try to import pandas to a Python script.
Any comments or suggestions are welcome.