New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding things up using C/C++? #29

Open
rubdos opened this Issue Jul 12, 2015 · 32 comments

Comments

Projects
None yet
5 participants
@rubdos
Copy link
Contributor

rubdos commented Jul 12, 2015

Hi!

Some background: I'm working with a 270MB tdms file for debugging my programs. It takes about 4 seconds for my Python script to do all the work we need on that file. That's already a factor 25 from the previous implementation.

I'm thinking about reimplementing this library (or parts of it, mostly the loading/reading of the file) in C or C++. At first it will be an experiment (am I able to do this? Will it speed things up?).

Would you accept a C/C++ reimplementation, given that the API will stay exactly the same?

@adamreeve

This comment has been minimized.

Copy link
Owner

adamreeve commented Jul 14, 2015

I'd be happy to have a C or C++ implementation in npTDMS, but I'd like to keep the pure Python implementation available for when a user doesn't have a C compiler available (eg. on Windows installing C packages is a lot trickier). See cElementTree vs ElementTree packages for an example of where there's a pure Python and a C version available.

You might also want to consider writing a C library and then a Python wrapper on top of it using cffi rather than a Python specific library, so that other languages could make use of the C library too?

I do wonder if there's a lot of speed up that could be obtained by refactoring the Python implementation. It does two passes over the file, the first reads all the metdata and creates a whole lot of _TdmsSegment objects and then allocates space for all the arrays. Then the second pass actually fills in the data. It's that first pass through that takes all the time. If we could avoid creating all the _TdmsSegment objects but still be able to just allocate the data arrays once then reading files should be a lot faster. Or maybe it would be faster to just grow the arrays as data is read rather than doing two passes.

I'm also keen to see how the speed with pypy compares but last time I checked their numpy support was still pretty minimal.

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Jul 15, 2015

I'll try to read and interpret a whole file in a separate C++ library. I'll let you know whether that works and we'll then see how we can integrate them; keeping in mind future integration.

The only thing I can think of which could be annoying that way is the numpy integration. There is a C library to communicate with numpy and create the objects you have.

@DwayneSmurdon

This comment has been minimized.

Copy link

DwayneSmurdon commented Jul 15, 2015

I second the having the ability to wrap it with other langues; I'd really love a Java interface.

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Jul 16, 2015

Would you guys prefer to have a swig interface, or have me code everything by hand?
Any way I would make a separate library in C++ first to see whether it is faster than this implementation.

@adamreeve

This comment has been minimized.

Copy link
Owner

adamreeve commented Jul 21, 2015

A SWIG interface would be fine by me, use whatever works best for you.

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Jul 26, 2015

Good news: i finished the reading of the data in a separate C++ module.
Still a lot TODO:

  • Big endian is not implemented
  • String data is not implemented
  • Other tiny checks and bits
  • Some data types are not implemented
  • Exposure to Python

But the interesting part is:

[rsmet@s230ru build]$ time tdmsinfo datafile.tdms
[SNIP]
real    0m1.687s
user    0m1.436s
sys 0m0.253s
[rsmet@s230ru build]$ time tests/tdmsppinfo datafile.tdms
real    0m0.331s
user    0m0.203s
sys 0m0.129s

on a 270 MB TDMS file :-)
I think there are still a lot of performance improvements that can be done, but this is already an improvement at a factor 5.3, excluding bindings to Python.

I will now continue making the tdmsppinfo executable (which I used as unit test for now) so it prints out the exact same stuff as tdmsinfo does. The -d flag already prints about the same as tdmsinfo.

I will continue reporting on this issue. I'll probably do a git push in some minutes to a new repository of mine.

Ninja edit: https://github.com/rubdos/TDMSpp

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Jul 26, 2015

I propose using a git submodule in this repo to bind TDMSpp to npTDMS, if you agree.

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Jul 27, 2015

Working on the bindings to Python now. Using CFFI and a git-submodule.

It's already prepared so that not using/having CFFI isn't a problem. If CFFI and everything else is found, stuff gets compiled. If not, it's not compiled nor used.

I made a proxy class for TdmsFile and TdmsObject and added an optional constructor argument for TdmsFile to choose the implementation.

I still have a ~factor 3 performance improvement on this single file

The decreased performance is from loading Python, so when the filesize goes to infinity, performance gain goes to 5.3, if I remember my mathematical limits correctly ;-)

Comments on my C++ code are welcome. I had to use some ugly tricks to make dynamic types work out (void*, in particular), but it's not as ugly as I expected. I hope you can live with C++11. I got so used to it that I couldn't do it in an older C++ anymore.

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Jul 28, 2015

You can have a look at my branch here https://github.com/adamreeve/npTDMS/compare/adamreeve:master...rubdos:C_impl?expand=1

Working on the Python bindings now. Still have to think an efficient way to get timestamps from C++ to Python.

EDIT: latest patch contains the necessary stuff to read the properties out of the .tdms file.

Checking regularly with valgrind. Python 3 has lots of memory leaks... gosh. But it's not me.

@adamreeve

This comment has been minimized.

Copy link
Owner

adamreeve commented Jul 28, 2015

Cool that looks like a nice approach to switching out the implementations. Using C++11 seems reasonable and using a submodule also makes sense. 👍

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Jul 30, 2015

Rearranged your unit tests a little so it runs both implementations.
10 out of 17 tests go well already (34=17*2, which is 17 tests in Python, 17 tests in CFFI):

ERROR: check_incomplete_data (__main__.cTDMSTestClass)
Test incomplete last segment, eg. if LabView crashed
----------------------------------------------------------------------
Traceback (most recent call last):
  File "nptdms/test/tdms_test.py", line 706, in check_incomplete_data
    tdmsData = test_file.load(implementation=self.implementation)
  File "nptdms/test/tdms_test.py", line 77, in load
    return tdms.TdmsFile(self.file, *args, **kwargs)
  File "/home/rsmet/.local/lib/python3.4/site-packages/npTDMS-0.6.4-py3.4-linux-x86_64.egg/nptdms/tdms.py", line 50, in __init__
    self.__implementation = cnptdms.TdmsFile(filename)
  File "/home/rsmet/.local/lib/python3.4/site-packages/npTDMS-0.6.4-py3.4-linux-x86_64.egg/nptdms/cnptdms.py", line 75, in __init__
    handle_exception_if_null(self._ctdmsfile)
  File "/home/rsmet/.local/lib/python3.4/site-packages/npTDMS-0.6.4-py3.4-linux-x86_64.egg/nptdms/cnptdms.py", line 52, in handle_exception_if_null
    handle_exception()
  File "/home/rsmet/.local/lib/python3.4/site-packages/npTDMS-0.6.4-py3.4-linux-x86_64.egg/nptdms/cnptdms.py", line 48, in handle_exception
    raise Exception(ffi.string(what).decode('utf-8'))
Exception: Labview probably crashed, file is corrupt. Not attempting to read.

======================================================================
ERROR: check_interleaved (__main__.cTDMSTestClass)
Test reading interleaved data
----------------------------------------------------------------------
Traceback (most recent call last):
  File "nptdms/test/tdms_test.py", line 442, in check_interleaved
    tdmsData = test_file.load(implementation=self.implementation)
  File "nptdms/test/tdms_test.py", line 77, in load
    return tdms.TdmsFile(self.file, *args, **kwargs)
  File "/home/rsmet/.local/lib/python3.4/site-packages/npTDMS-0.6.4-py3.4-linux-x86_64.egg/nptdms/tdms.py", line 50, in __init__
    self.__implementation = cnptdms.TdmsFile(filename)
  File "/home/rsmet/.local/lib/python3.4/site-packages/npTDMS-0.6.4-py3.4-linux-x86_64.egg/nptdms/cnptdms.py", line 75, in __init__
    handle_exception_if_null(self._ctdmsfile)
  File "/home/rsmet/.local/lib/python3.4/site-packages/npTDMS-0.6.4-py3.4-linux-x86_64.egg/nptdms/cnptdms.py", line 52, in handle_exception_if_null
    handle_exception()
  File "/home/rsmet/.local/lib/python3.4/site-packages/npTDMS-0.6.4-py3.4-linux-x86_64.egg/nptdms/cnptdms.py", line 48, in handle_exception
    raise Exception(ffi.string(what).decode('utf-8'))
Exception: Reading inteleaved data not supported yet

======================================================================
ERROR: check_slash_and_space_in_name (__main__.cTDMSTestClass)
Test name like '01/02/03 something'
----------------------------------------------------------------------
Traceback (most recent call last):
  File "nptdms/test/tdms_test.py", line 818, in check_slash_and_space_in_name
    self.assertEqual(len(tdmsData.groups()), 2)
  File "/home/rsmet/.local/lib/python3.4/site-packages/npTDMS-0.6.4-py3.4-linux-x86_64.egg/nptdms/tdms.py", line 57, in __getattr__
    return getattr(self.__implementation, name)
AttributeError: 'TdmsFile' object has no attribute 'groups'

======================================================================
ERROR: check_string_data (__main__.cTDMSTestClass)
Test reading a file with string data
----------------------------------------------------------------------
Traceback (most recent call last):
  File "nptdms/test/tdms_test.py", line 755, in check_string_data
    tdmsData = test_file.load(implementation=self.implementation)
  File "nptdms/test/tdms_test.py", line 77, in load
    return tdms.TdmsFile(self.file, *args, **kwargs)
  File "/home/rsmet/.local/lib/python3.4/site-packages/npTDMS-0.6.4-py3.4-linux-x86_64.egg/nptdms/tdms.py", line 50, in __init__
    self.__implementation = cnptdms.TdmsFile(filename)
  File "/home/rsmet/.local/lib/python3.4/site-packages/npTDMS-0.6.4-py3.4-linux-x86_64.egg/nptdms/cnptdms.py", line 75, in __init__
    handle_exception_if_null(self._ctdmsfile)
  File "/home/rsmet/.local/lib/python3.4/site-packages/npTDMS-0.6.4-py3.4-linux-x86_64.egg/nptdms/cnptdms.py", line 52, in handle_exception_if_null
    handle_exception()
  File "/home/rsmet/.local/lib/python3.4/site-packages/npTDMS-0.6.4-py3.4-linux-x86_64.egg/nptdms/cnptdms.py", line 48, in handle_exception
    raise Exception(ffi.string(what).decode('utf-8'))
Exception: Reading string data not yet implemented

======================================================================
ERROR: check_timestamp_data (__main__.cTDMSTestClass)
Test reading contiguous and interleaved timestamp data,
----------------------------------------------------------------------
Traceback (most recent call last):
  File "nptdms/test/tdms_test.py", line 527, in check_timestamp_data
    channel_data = tdmsData.channel_data("Group", "TimeChannel1")
  File "/home/rsmet/.local/lib/python3.4/site-packages/npTDMS-0.6.4-py3.4-linux-x86_64.egg/nptdms/cnptdms.py", line 84, in channel_data
    return o.get_data()
  File "/home/rsmet/.local/lib/python3.4/site-packages/npTDMS-0.6.4-py3.4-linux-x86_64.egg/nptdms/cnptdms.py", line 148, in get_data
    self._load_data()
  File "/home/rsmet/.local/lib/python3.4/site-packages/npTDMS-0.6.4-py3.4-linux-x86_64.egg/nptdms/cnptdms.py", line 139, in _load_data
    dt=tdsDataTypes[self._data_type][1]
KeyError: 'tdsTypeTimeStamp'

======================================================================
FAIL: check_larger_channel (__main__.cTDMSTestClass)
In the second segment, increase the channel size
----------------------------------------------------------------------
Traceback (most recent call last):
  File "nptdms/test/tdms_test.py", line 372, in check_larger_channel
    self.assertTrue(all(data == [3, 4, 7, 8, 9, 10]))
AssertionError: False is not true

======================================================================
FAIL: check_no_data_section (__main__.cTDMSTestClass)
kTocRawData is set but data length is zero
----------------------------------------------------------------------
Traceback (most recent call last):
  File "nptdms/test/tdms_test.py", line 615, in check_no_data_section
    self.assertTrue(all(data == [1, 2]))
AssertionError: False is not true

----------------------------------------------------------------------
Ran 34 tests in 1.360s

FAILED (failures=2, errors=5)

Some of them are easier to fix than others, but I'm getting very close.

Still have some memoryleaks when I'm reading your (pretty tricky) tests too, so I'll have to check on those too.

I also proxied most of the C++ exceptions to Python now, through CFFI.

A little bit about performance:

  • Loading Python modules is slow
  • CFFI seems to be slow too
  • On my 270MB .tdms file, I'm having 0m1.713s on Python, 0m1.382s on CFFI, which is pretty terrible. I did some profiling yesterday, but I could probably win some in the C++ part. Most of the time was taken by loading modules (sigh)
  • I still think that on bigger .tdms files, things will go a lot faster.
@adamreeve

This comment has been minimized.

Copy link
Owner

adamreeve commented Jul 30, 2015

Awesome, looks like some good progress! Hmm yeah speed up doesn't look great though, can you time the difference in python and check what the times are without module loading?

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Jul 30, 2015

@adamreeve I will time them, just implementing some useful things now. Just pushed groups() and group_channels() to my branch.

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Jul 30, 2015

Hi @adamreeve, you'll be happy to read this; I did some performance tweaks in by read_le_float code (unnecessary copying around stuff, things like that) and I measured:

INFO:__main__:TDMS: Took 1372.174 ms for Python
INFO:__main__:TDMS: Took 590.0530 ms for CFFI/C++

This is using your utils.Timer class; when I use the Unix time utility, I get

real 0m1.637s for Python
real 0m0.839s for CFFI

which doubles the performance in our use case. Yay.

I'm a happy man again. There probably is a lot more to improve, but read_le_float took 62% of the time
screenshot from 2015-07-30 14-57-50
which I reduced to 8% by now (using memcpy instead of copying memory manually). I probably can do that better too.

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Jul 30, 2015

Sorry for spamming your inbox. Just wanted to let you know that array reading has been made a litte faster. Not joking, I have a factor 3.5 in our use case, and if you do not take into account module loading and one-time python stuff like that, there's a factor 5.5 speed up.

Now I got it down to 255 milliseconds, I can sleep again. That's more than a gigabyte per second (8.5 Gbit/s).

The caveats:

  • Works only on little endian systems when reading LE data (is stated "TODO" about everywhere in my sources), or on BE systems when reading BE data. (All those cases should be taken into consideration, only LE<->LE is implemented atm)
  • Only works on double and float for now, but can easily be extended to all other types except strings and timestamps and other things that take individual parsing
  • Dataloading to Python/numpy is still somewhat slow IMO, I'd rather have numpy read from my data and not have ownership, but don't know if that's possible with current implementations. I'm doing lazy loading in Python atm.
@adamreeve

This comment has been minimized.

Copy link
Owner

adamreeve commented Jul 30, 2015

Awesome, that sounds great. It's probably not necessary to completely implement everything (eg. maybe big endian isn't that widely used in TDMS files?) before getting something merged and providing this as an optional backend. And just supporting ASCII in strings rather than UTF-8 might be reasonable for an initial version.

You should be able to create a numpy array from a pointer to existing data, I've done something similar from a SWIG binding before but not sure how that would work with CFFI. The tricky bit there is just handling data ownership as you say, so that numpy knows not to deallocate the data, but I think that's possible. You might need to look at implementing numpy's array interface: http://docs.scipy.org/doc/numpy/reference/arrays.interface.html

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Jul 31, 2015

It's probably not necessary to completely implement everything (eg. maybe big endian isn't that widely used in TDMS files?) before getting something merged and providing this as an optional backend.

I'd rather make things more robust first; I still know of one memory leak I have to struggle through. Your unit tests are pretty awesome to find that kind of bugs.

And just supporting ASCII in strings rather than UTF-8 might be reasonable for an initial version.
I do UTF-8 already, that's no problem. Receiving strings via CFFI generates a bytes object.

Still TODO: test the CFFI stuff in Python 2. More important: test whether Python 2 accepts my class proxying techniques.

That Array interface sounds interesting. I'll have a look at that.

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Aug 1, 2015

Another thing I'm thinking of now, while I'm looking at PEP-3118, the Python buffer interface.

How will I handle this case:

  • User creates TdmsFile
  • User asks for the numpy data of a channel
  • User del's the TdmsFile
  • My C++ backend will destroy/free() the data of the numpy channel and Python cannot read the array anymore.

Should I use some form of reference counting in my numpy interface? What would you suggest doing, should I add std::shared_ptr kind of thing to that memory block, so C++ will delete when it goes out of scope in Python?

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Aug 1, 2015

I just tried using the __array_interface__. It seems to speed up just a little (100ms on my 270MB file, which is about 1.5 ns per dataline (useless information) or 5% (perhaps useful information)).

It seems to me that pandas is being inefficient with their copying stuff, not me!

I'll check that.

EDIT: checked with cProfile, bottleneck is the goddamn numpy.linspace function. I guess I did my job now.

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       23    0.434    0.019    0.738    0.032 function_base.py:9(linspace)
        1    0.251    0.251    0.251    0.251 {built-in method tdms_read}
       47    0.160    0.003    0.160    0.003 {built-in method arange}
       24    0.145    0.006    0.145    0.006 {method 'astype' of 'numpy.ndarray' objects}
      326    0.054    0.000    0.054    0.000 {built-in method loads}
    48/46    0.027    0.001    0.057    0.001 {built-in method load_dynamic}
1164/1154    0.027    0.000    0.086    0.000 {built-in method __build_class__}
    421/1    0.019    0.000    1.526    1.526 {built-in method exec}
    11587    0.017    0.000    0.019    0.000 sre_parse.py:183(__next)
        3    0.016    0.005    0.016    0.005 {built-in method read}
     2689    0.015    0.000    0.015    0.000 {built-in method stat}
  607/146    0.015    0.000    0.048    0.000 sre_parse.py:429(_parse)
        3    0.013    0.004    0.013    0.004 {built-in method fork_exec}
      935    0.013    0.000    0.047    0.000 <frozen importlib._bootstrap>:2016(find_spec)
       12    0.013    0.001    0.013    0.001 ops.py:633(_bool_method_SERIES)

Can probably improve a little more on the C++ side, but I think I need a bigger file for that (need more samples in gprof to be significant). 200 ms is not a lot.

I remember that the biggest bottleneck now was the interpreting of the metadata.

Before merging, I still need to fix that one memory leak. Perhaps my implementation doesn't need to be complete when it's optional, but I don't want people to see too much segfaults when they're trying it out.

Another thing I want to discuss is documentation. Your documentation is on readthedocs and is (I suppose) automagically generated from the Python comments, which is great. But now that I'm using the proxy classes I'm afraid that a lot of documentation will get lost. How would you like me to solve that?

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Aug 1, 2015

@DwayneSmurdon If you're interested in a Java interface for what I have right now, please create a ticket on my https://github.com/rubdos/TDMSpp repository, then we can handle that there. I suggest using swig for that; that will take less time than doing the JNI all by myself and is as performant.

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Aug 1, 2015

Something else I did now: tdmsinfo has a -c/--cffi flag, so that it runs the CFFI backend instead of the Python backend. Optional, of course ;-)

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Aug 3, 2015

Small patch of today: added wercker.yml changes for the git submodule.

Somewhere in the future I'll make a pull request to check whether wercker will be able to handle my C++11 stuff... ;-)

@DwayneSmurdon

This comment has been minimized.

Copy link

DwayneSmurdon commented Aug 3, 2015

I will do this sometime next week, when I’d have time to try it out. I’m currently on vacation. I’m glad to see so much progress being made by you two.

-Dwayne

From: Ruben De Smet [mailto:notifications@github.com]
Sent: Saturday, August 01, 2015 1:01 PM
To: adamreeve/npTDMS
Cc: DwayneSmurdon
Subject: Re: [npTDMS] Speeding things up using C/C++? (#29)

@DwayneSmurdon https://github.com/DwayneSmurdon If you're interested in a Java interface for what I have right now, please create a ticket on my https://github.com/rubdos/TDMSpp repository, then we can handle that there. I suggest using swig for that; that will take less time than doing the JNI all by myself and is as performant.


Reply to this email directly or view it on GitHub #29 (comment) .Image removed by sender.

@adamreeve

This comment has been minimized.

Copy link
Owner

adamreeve commented Aug 3, 2015

Somewhere in the future I'll make a pull request to check whether wercker will be able to handle my C++11 stuff... ;-)

The cool thing about wercker is it runs in a Docker container and you can specify any image from the Docker hub to run the tests with, so I've made my own Docker image with Python and numpy installed. I might just need to update it to include gcc. The Dockerfile I'm using is at https://github.com/adamreeve/python-numpy-docker/blob/master/Dockerfile

@adamreeve

This comment has been minimized.

Copy link
Owner

adamreeve commented Aug 3, 2015

  • User creates TdmsFile
  • User asks for the numpy data of a channel
  • User del's the TdmsFile
  • My C++ backend will destroy/free() the data of the numpy channel and Python cannot read the array anymore.

Hmm yeah that's tricky. I wonder if it's possible to just transfer ownership to the numpy array somehow so it knows do deallocate the data, without having to do full reference counting? Or get numpy to allocate the data in the first place and then the C++ code writes into the memory location provided by numpy?

As for the documentation, I think the best way to handle this might be to have a single module that has all of the public classes and methods in it, and document that, but those just proxy through to either the pure Python version or the C++ version. Then that also ensures that the public interface is always consistent whether you're using the Python or C++ backend. Is that already what you're doing though?

Other option is to manually document everything, which probably isn't a large amount of work initially as the module isn't that big, but it's just a bit more annoying to keep up to date.

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Aug 4, 2015

As for the documentation, I think the best way to handle this might be to have a single module that has all of the public classes and methods in it, and document that, but those just proxy through to either the pure Python version or the C++ version. Then that also ensures that the public interface is always consistent whether you're using the Python or C++ backend. Is that already what you're doing though?

Currently, I'm using getattr and __getattr__ stuff, which automagically does what you proposed. But what you said sounds like the sane way of doing things. I'll stick with that.

I'm installing a Windows machine today, not in the mood of programming (been a bit sick too). I'll see if I can fix my memory leak and the documentation this week.

@adamreeve

This comment has been minimized.

Copy link
Owner

adamreeve commented Aug 4, 2015

Cool no worries, there's no big rush to get this done, thanks for all your work on it!

@HaMF

This comment has been minimized.

Copy link

HaMF commented Sep 4, 2015

Thanks a lot for your work guys! (We're also dealing with large TDMS files here and I'm looking forward for your efforts to land (-: )

@jshridha

This comment has been minimized.

Copy link
Contributor

jshridha commented Dec 16, 2016

@rubdos Is there any chance you can provide some quick documentation on how to use this implementation?

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Dec 31, 2016

@jshridha It depends on how much information you want; I'd have to reread everything too. The company for which I did this lost interest in this implementation. If you (or anyone else) would have any commercial interest in this, I could finish this.

@jshridha

This comment has been minimized.

Copy link
Contributor

jshridha commented Dec 31, 2016

@rubdos I don't need much. Just a quick example of how to compile/use the c-implementation would be great. I understand you've moved on from this and certainly don't need a full-blown documentation.

@rubdos

This comment has been minimized.

Copy link
Contributor

rubdos commented Dec 31, 2016

@jshridha try your luck here: https://github.com/rubdos/npTDMS/tree/C_impl (which is my fork, the C_impl branch). I think that if you checkout with submodules, you can just do regular setup.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment