Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using pandas/numpy with Python language #525

Closed
ghost opened this issue Aug 11, 2016 · 27 comments

Comments

Projects
None yet
7 participants
@ghost
Copy link

commented Aug 11, 2016

Is it possible to make the pydata stack available to QuantConnect (especially pandas, numpy & scipy)? Thanks

@AlexCatarino

This comment has been minimized.

Copy link
Member

commented Aug 13, 2016

We will be working on that soon.
At the moment we are asking the QC community to vote for their favorite APIs:
Calling Python Experts - Proposed New QC Python API

@OpenTrading

This comment has been minimized.

Copy link

commented Aug 29, 2016

I opened a page in the Wiki
https://github.com/QuantConnect/Lean/wiki/PythonApiCodingStyle

and broke the forum thread up into sub-pages: I find forum threads that get long are hard to follow, and threads get abandonned; it's better to pull substantive topics into the Wiki for knowledge-capture.

I also contributed my own mixed naming suggestion, one that we use anyway in Python.
It would help keep the correspondance with the C# direct and clear:
https://github.com/QuantConnect/Lean/wiki/PythonApiCodingStyle3

Maybe you can post a note in the thread to refer to these wiki pages.

It would be good to get this resolved quickly so that we can progress on getting pandas/numpy/scipy useable, which will be a huge win as it's a prerequisite for most Python Quants.

@jaredbroad

This comment has been minimized.

Copy link
Member

commented Aug 29, 2016

@OpenTrading thanks for setting up the wiki. Hungarian notation gives me nightmares :)

The forum consensus seems to be keeping the current notation so the python and C# API's are similar would be more beneficial for learning than having a separate API.

We're going to complete the python documentation using this style; then add panda's, numpy and scipy integration. Anyone who wants to jump start this can follow the tutorial we found here (http://stackoverflow.com/questions/29397540/how-to-install-numpy-and-scipy-for-ironpython27-old-method-doenst-work) which is only 10mo old and should work.

@OpenTrading

This comment has been minimized.

Copy link

commented Aug 29, 2016

Call it Canadian prefix notation and you'll sleep like a baby :-)

Starting something other than a class with a capital letter will give Pythonistas, at the very least, bad dreams - there's no other easy way around that. But for sure, keep the python and C# API's similar.

@OpenTrading

This comment has been minimized.

Copy link

commented Aug 29, 2016

Thanks for the IronPython link, but I note that Enthought says: "IronPython repos are still available, but are no longer maintained." So this route may be problematic: it's a very old version, using an abandonned package manager: ironpkg.

It will probably be better to use the currently maintained IronPython:
http://ironpython.net/download/
which includes pip,
http://ironpython.net/blog/2014/12/07/pip-in-ironpython-275.html
or the nuget version,
https://www.nuget.org/packages/IronPython/
and try using pip to install numpy.

The numpy FAQ says that there are reports of success with numpy on IronPython
for 32bit systems using Ironclad:
https://code.google.com/archive/p/ironclad/
but that project is dead too.

@ghost

This comment has been minimized.

Copy link
Author

commented Aug 29, 2016

Forum consensus? Where?

@AlexCatarino

This comment has been minimized.

@ghost

This comment has been minimized.

Copy link
Author

commented Aug 29, 2016

I see 6 answers:
Current Style: 2 (Anony Mole, ChrisFg)
Proposed Style: 3 (Peter Newell, Alexis Petit, Lev Gorbunov)
Indifferent: 1 (Aaron Todd - prefers Proposed but ok with Current)

@jaredbroad

This comment has been minimized.

Copy link
Member

commented Aug 29, 2016

+2 hidden votes for style which is easy to copy paste from C# algorithms. If you have feedback please weigh in on forums @agiap.

@ghost

This comment has been minimized.

Copy link
Author

commented Aug 29, 2016

If you're talking about the votes on each comment, then here's what I see:

Current Style:

  • Anony Mole: +2
  • ChrisFg: +1

Proposed Style:

  • Alexandre Catarino: +3 (original msg for propsoed style)
  • Peter Newell: +1
  • Alexis Petit: +2
  • Lev Gorbunov: +0

Indifferent:

  • Aaron Todd: +0
@jaredbroad

This comment has been minimized.

Copy link
Member

commented Aug 29, 2016

@agiap: I'm talking about our internal team who aren't publicly weighing in on the voting so not to influence others opinions. We take the voting to help guide our decision about the active users of LEAN and what they want; but ultimately will make the decision we feel is best for the project.

Please keep this thread focused on topic (LEAN with pandas/numpy) & constructive.

@aajtodd

This comment has been minimized.

Copy link

commented Sep 3, 2016

I've looked into this and unfortunately the numpy/scipy/pandas story on IronPython is a sad state of affairs at the moment.

From what I gather there were essentially 2 major attempts that were taken in the past to getting these modules, which are heavy C/Cython extensions, to work on IronPython. The approaches are detailed below.

Approach 1: Port Numpy/Scipy/Pandas

This is essentially the approach taken by Enthrought from the tutorial Jared referenced.

I have followed that tutorial and was able to successfully install the packages referenced. I am including the steps for completeness but see my notes below.

  1. Install IronPython 2.7.5

  2. Add IronPython to your PATH

  3. Open Command prompt and verify it works ipy -c "print('hello, world!')"

  4. Download ironpkg

    1. Open https://store.enthought.com/repo/.iron/
    2. Should see a page not found displayed
    3. Click the link in the top right corner to create an account. Follow the steps and register
    4. Once you are registered you should be able to download ironpkg-1.0.0.py from above link
  5. Run ipy ironpkg-1.0.0.py --install

  6. Download all of the eggs from the repo link above

  7. Edit the .ironpkg file. Mine was at C:\Users\Aaron\.ironpkg. Change the location to wherever you placed the downloaded eggs. e.g.
    IndexedRepos = ['file://C:\Users\Aaron\Documents\eggs']

  8. Run the following to install numpy and scipy

    ironpkg scipy
    ironpkg numpy
    
  9. Check whether the install worked

ipy -X:Frames
IronPython 2.7.5 (2.7.5.0) on .NET 4.0.30319.42000 (32-bit)
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.__version__
'2.0.0'
>>> import scipy
>>> scipy.__version__
'1.0.0.dev'

Notes

This first approach was based off of now abandoned C# ports/refactors of numpy and scipy to work on .NET. This is fairly obvious just from the versions (latest numpy and scipy versions for CPYthon are 1.11.1 and 0.18.0 respectively). The following articles indicate this as well though.

http://blog.enthought.com/python/scipy-for-net/#.V8rZPvkrJD8
http://pytools.codeplex.com/wikipage?title=NumPy%20and%20SciPy%20for%20.Net

Likely sources for some of these tools/modules:
https://github.com/numpy/numpy-refactor
https://github.com/jasonmccampbell/scipy-refactor
https://github.com/ilanschnell/ironpkg

These haven't been touched since 2011 and as far as I can see don't include Pandas. This is not a viable approach in my mind and not likely to gain any support from the community at this point.

Approach 2: Stub CPython API

This second approach was taken by William Reade and Resolver Systems for a spreadsheet product that integrated tightly with Python. Essentially they stub out the CPython API and when the DLL is loaded they substitute IronPython C# objects and manage the lifetimes back and forth. This approach is interesting because theoretically you can talk to any C python extension. Example: You install numpy (or whatever C extension) into a normal CPython installation, play around with import paths, and ironclad takes care of the rest and detects when an attempt is made to a C extension. I'm sure I've glossed over the finer details a bit but that's the gist of the approach I believe.

I followed along with the build steps here. I ran into a few issues and targeted newer versions in some cases. The steps below reflect my changes.

  1. Download and install IronPython 2.7.6.3
    1. Add ironpython to your PATH. NOTE: If you followed along with approach 1 you'll want to remove IronPython 2.7.5 from your PATH.
    2. Install pip/setuptools ipy -X:Frames -m ensurepip
  2. Download and Install CPython 2.7.12
  3. Download and Install SCons 2.5.0
  4. Install mingw
    1. install and start mingw-get
    2. Under Basic setup select mingw32-base, mingw32-gcc-g++, msys-base
    3. Under All Packages select mingw32-pexports
      1. NOTE: I also had to select mingw32-pthreads-w32
    4. Apply Changes (upper left "Installation" dropdown)
    5. Add C:\MinGW\bin to PATH
    6. If you have cygwin ensure PATH has no references to it or at least make sure it is after MinGW
  5. Download and install CMake 3.0.2
  6. Download and install GCCXML
    1. Get the source
      git clone https://github.com/gccxml/gccxml

    2. Create a new directory for building.

      mkdir gccxml-build
      cd gccxml-build
      
    3. Generate makefiles with cmake

      cmake -G "MinGW Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_FLAGS="-fgnu89-inline -std=gnu++0x" ..\gccxml

    4. Compile mingw32-make.exe

      1. Note: On my system I ended up having to patch gccxml to get around strcasecmp undeclared in this scope:

        diff --git a/GCC_XML/KWSys/SystemTools.cxx b/GCC_XML/KWSys/SystemTools.cxx
        index 4d83293..1137539 100644
        --- a/GCC_XML/KWSys/SystemTools.cxx
        +++ b/GCC_XML/KWSys/SystemTools.cxx
        @@ -51,6 +51,7 @@
        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>
        +#include <strings.h>
        #include <sys/stat.h>
        #include <time.h>
    5. Install mingw32-make.exe install. Defaults to C:\Program Files (x86)\gccxml Note: might need administrative privilege

  7. Install pyexpat
    1. Download pyexpat.py

    2. Rename it to expat.py and save it to your iron python installation C:\Program Files (x86)\IronPython-2.7.6.3\Lib\xml\parsers

    3. Copy the following expat files from CPython installation to IronPython installation

      copy "C:/Python27/lib/xml/dom/expatbuilder.py" "C:/Program Files (x86)/IronPython-2.7.6.3/Lib/xml/dom"
      copy "C:/Python27/lib/xml/sax/expatreader.py" "C:/Program Files (x86)/IronPython-2.7.6.3/Lib/xml/sax"
      
  8. pygccxml 1.6.2
    1. Download from https://pypi.python.org/packages/source/p/pygccxml/pygccxml-v1.6.2.tar.gz
    2. Extract with something like 7zip
    3. After unpacking install into ironpython site-packages
      cd C:\Users\Aaron\Downloads\pygccxml-v1.6.2 ipy setup.py install --user
  9. Download and install nasm 2.11
    1. Once setup is complete, copy nasm.exe to MinGW/bin

      copy C:\Users\Aaron\AppData\Local\nasm\nasm.exe C:\MinWG\bin

  10. Build Ironcload
    1. Download IronClad

      git clone https://github.com/IronLanguages/ironclad.git

    2. Edit SConstruct

      1. Update paths that may differ on your system. Of particular note is MSVCR90_DLL
        Here is a diff of the changes I made
      diff --git a/SConstruct b/SConstruct
      index eb68ff5..4ea6275 100644
      --- a/SConstruct
      +++ b/SConstruct
      @@ -49,8 +49,8 @@ if WIN32:
      GCCXML_CC1PLUS = r'"C:\Program Files (x86)\gccxml\bin\gccxml_cc1plus.exe"'
      
          # standard location
      -    IPY = r'"C:\Program Files (x86)\IronPython 2.7\ipy.exe"'
      -    IPY_DIR = r'"C:\Program Files (x86)\IronPython 2.7"'
      +    IPY = r'"C:\Program Files (x86)\IronPython-2.7.6.3\ipy.exe"'
      +    IPY_DIR = r'"C:\Program Files (x86)\IronPython-2.7.6.3"'
      # private build
      # IPY = r'"C:\github\IronLanguages\bin\Debug\ipy.exe"'
      # IPY_DIR = r'"C:\github\IronLanguages\bin\Debug"'
      @@ -69,7 +69,7 @@ if WIN32:
      COPY_CMD = 'copy $SOURCE $TARGET'
      DLLTOOL_CMD = 'dlltool -D $NAME -d $SOURCE -l $TARGET'
      LINK_MSVCR90_FLAGS = '-specs=stub/use-msvcr90.spec'
      -    MSVCR90_DLL = r'C:\Windows\winsxs\x86_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.6161_none_50934f2ebcb7eb57\msvcr90.dll'
      +    MSVCR90_DLL = r'C:\Windows\winsxs\x86_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.8387_none_5094ca96bcb6b2bb\msvcr90.dll'
      PEXPORTS_CMD = 'pexports $SOURCE > $TARGET'
      RES_CMD = 'windres --input $SOURCE --output $TARGET --output-format=coff'
      
      @@ -77,7 +77,7 @@ if WIN32:
      MINGW_DIR = r'C:\MinGW'
      MINGW_LIB = os.path.join(MINGW_DIR, 'lib')
      MINGW_INCLUDE = os.path.join(MINGW_DIR, 'include')
      -    GCCXML_INSERT = '-isystem "%s" -isystem "%s"' % (MINGW_INCLUDE,          os.path.join(MINGW_LIB, 'gcc', 'mingw32', '4.8.1', 'include'))
      +    GCCXML_INSERT = '-isystem "%s" -isystem "%s"' % (MINGW_INCLUDE,         os.path.join(MINGW_LIB, 'gcc', 'mingw32', '5.3.0', 'include'))
      
           # Calculate DLLs dir of cpython - assume this is run from the cpython
           # If not, change to match your instalation, defaults to C:\Python27\DLLs
      
    3. Build using CPython C:\Python27\Scripts\scons.bat

      1. Clean with C:\Python27\Scripts\scons.bat -c
    4. Run the test suite
      Note: there are dependencies on pysvn and numpy for the test suite, you would install these into the CPYthon installation
      e.g. C:\Python27\python.exe -m pip install numpy==1.11.1

      1. To run the full test suite: C:\Python27\Scripts\scons.bat test
      2. To run a subset (I haven't tried this)
      set IRONPYTHONPATH=.;C:\Python27\DLLs;C:\Python27\Lib\site-packages
      ipy runtest.py tests.functionalitytest.BZ2Test.testFunctionsWork
      
      1. Try to import a C extension

        set IRONPYTHONPATH=C:\Users\Aaron\Documents\ironclad\build;C:\Python27\DLLs;C:\Python27\Lib\site-packages
        ipy
        IronPython 2.7.6.3 (2.7.6.3) on .NET 4.0.30319.42000 (32-bit)
        Type "help", "copyright", "credits" or "license" for more information.
        >>> import ironclad
        >>> import numpy
        detected unsupported member type HAVE_INPLACEOPS; ignoring
        Error: PyFrozenSet_New is not yet implemented
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
          File "C:\Python27\Lib\site-packages\numpy\__init__.py", line 180, in <module>
          File "C:\Python27\Lib\site-packages\numpy\add_newdocs.py", line 13, in <module>
          File "C:\Python27\Lib\site-packages\numpy\lib\__init__.py", line 8, in <module>
          File "C:\Python27\Lib\site-packages\numpy\lib\type_check.py", line 11, in <module>
          File "C:\Python27\Lib\site-packages\numpy\core\__init__.py", line 58, in <module>
          File "C:\Python27\Lib\site-packages\numpy\testing\__init__.py", line 12, in <module>
          File "C:\Python27\Lib\site-packages\numpy\testing\decorators.py", line 21, in <module>
          File "C:\Python27\Lib\site-packages\numpy\testing\utils.py", line 15, in <module>
          File "C:\Program Files (x86)\IronPython-2.7.6.3\Lib\tempfile.py", line 35, in <module>
          File "C:\Program Files (x86)\IronPython-2.7.6.3\Lib\random.py", line 49, in <module>
          File "C:\Program Files (x86)\IronPython-2.7.6.3\Lib\hashlib.py", line 134, in <module>
          File "<string>", line 21, in load_module
          NotImplementedError: PyFrozenSet_New
        
        
        

This is about as far as I expected to make it considering the same build wiki indicates numpy fails to import (likely a different error as the versions are different). Likely would need to install Python 2.7.8 as that looks like the last version of the C API that was updated or try to update the source to 2.7.12.

Overall I think this approach has the most merit but it would be good to reach out to the IronPython maintainers to see how/if they plan to address C extensions going forward. You'll want to align with that approach.

http://www.voidspace.org.uk/python/weblog/arch_d7_2009_01_24.shtml#e1055 (Interesting read if you want to get an idea of the black magic they are pulling around lifetimes and crossing boundaries)
http://www.johndcook.com/blog/2009/03/19/ironclad-ironpytho/

@OpenTrading

This comment has been minimized.

Copy link

commented Sep 6, 2016

Thanks aajtodd for that very detailed post with many hours of research and probably much pain!

Could I ask you to edit that post and copy the raw content to the about post and dump it into a wiki page named IronPython, and select Markdown as the wiki page format. Only you the author has access to the raw format of a comment by editing it, and it will then preserve all of the nice formatting you've done. If you have any formatting issues, just dump it in and I'll try to help.

The Wiki is the best place to preserve important chunks of information like this.

@aajtodd

This comment has been minimized.

Copy link

commented Sep 17, 2016

@ChrisFg

This comment has been minimized.

Copy link

commented Sep 22, 2016

IMHO the Quantconnect techs made great technology calls when they designed the LEAN architecture.

Some interoperability challenges -- e.g. how do we (lean users) get access to the exact same python libraries (given their maturity, knowledge of their behaviour etc) as python users -- are inherent in that choice. Any call involves trade-offs etc, etc. And Msft has also made some:

A consequence of Msft's .Net Core direction is that AppDomain is "going away" -- https://blogs.msdn.microsoft.com/dotnet/2016/02/10/porting-to-net-core/ . Thus on .Net Core (i.e. runs on Windows, Linux, mac, raspberry pi :-) ) the mechanisms for interoperability have changed affecting e.g. MarshalByRefObject. Also, from a commercial perspective .Net Core provides higher VM density.

The evolution of the Roslyn compiler technology -- its implementation in VS and VS Code -- and .Net Native may provide interoperability mechanisms not available in earlier .Net generations.

My 2cents worth.

btw -- for those who track software designers: Miguel de Icaza > mono > Ximian > Xamarin, > Msft. I see https://en.wikipedia.org/wiki/Mono_%28software%29 is up to date.

@jaredbroad

This comment has been minimized.

Copy link
Member

commented Nov 9, 2016

Just rethinking this a little; given the limitations of the other techniques; could we re-write the QCAlgorithm side of the stack in Python and use Python.Net to import it to LEAN?

The main reason Python.NET was discounted before was because we couldn't convert QCAlgorithm.cs to a python module -> use it to code in python with numpy -> then convert both of them back to a C# DLL.

Could Python.Net solve it if it were a 1 way conversion? (i.e. a QCAlgorithm.py naitive implementation build in python + the user's algorithm.py -> converted to a DLL for LEAN, and mapped/read with Impromtu Interface like we are now. This might be the least-ugly solution!

Anyone familiar with Python.Net?

@jaredbroad

This comment has been minimized.

Copy link
Member

commented Nov 21, 2016

Just an update: We did some initial testing and it looks like we can fully support Python & its libraries with Python.Net! There still some researching and testing to be done but @AlexCatarino is looking into it now.

The pro/con depending on your point of view is that the interface will precisely match the QCAlgorithm naming conventions -- so any existing python algorithms with IronPython will continue to work with the Python.Net implementation of the engine.

Additionally its running in CPython; so should be nice and snappy.

@ghost

This comment has been minimized.

Copy link
Author

commented Nov 21, 2016

That is excellent news! Thank you so much for your work!

@Kyxsune

This comment has been minimized.

Copy link

commented Nov 22, 2016

Excellent News indeed. Can't wait to see how this shakes out.

@ChrisFg

This comment has been minimized.

Copy link

commented Nov 22, 2016

Thanks for putting the design and proofing effort into this challenge. Looking forward too, to see how the proof of concept works out.
Best wishes.

@jaredbroad

This comment has been minimized.

Copy link
Member

commented Jan 17, 2017

If anyone would like to help beta test -- the new PythonNet branch of LEAN allows importing your favorite python libraries! https://github.com/QuantConnect/Lean/tree/pythonnet

We've also updated the LEAN docker container to install the background dependencies and a required symlink in linux. We're doing testing and integration now and it should be live on QuantConnect.com this month.

@AlexCatarino

This comment has been minimized.

Copy link
Member

commented Feb 8, 2017

Hello, beta testers!
We have implemented Custom Data import to our new python support. We would like to hear your opinion about it.
Like IronPython, pythonnet translated generics methods, e.g.: AddData<T>(String) into self.AddData[T](str). So we have two options here:

1:
self.AddData[Weather]("KNYC", Resolution.Daily)
Or 2
self.AddData("Weather", "KNYC", Resolution.Daily)

@tomhunter-gh

This comment has been minimized.

Copy link
Contributor

commented Feb 20, 2017

Hi @AlexCatarino, I was just wondering - which version of Python are we targeting? 2.X or 3.X, or both? Thanks

@jaredbroad

This comment has been minimized.

Copy link
Member

commented Feb 20, 2017

The pending PR targets 2.7
#710

This will be merged carefully into master soon.

@tomhunter-gh

This comment has been minimized.

Copy link
Contributor

commented May 6, 2017

Hi Jared, any chance you might consider switching to Python 3 by default? Or would it be possible to allow easy switching when running locally via e.g. solution/project configurations or some other mechanism? Thanks

@jaredbroad

This comment has been minimized.

Copy link
Member

commented Jun 10, 2017

Python integration with PythonNet completed and in production! Nice work everyone!

@jaredbroad jaredbroad closed this Jun 10, 2017

@ChrisFg

This comment has been minimized.

Copy link

commented Jun 13, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.