# Mathematics 199 Final report - Winter 2019
**by Damien Lefebvre**

## Introduction

In this report, we will outline a step-by-step process to design a Python interface that uses C++ code. The implementation uses Cython to allow the two languages to communicate. We will describe the example we designed, give an overview of the code in C++, Cython, and Python, explain the compilation process, go over the testing results, discuss some important pitfalls and how to avoid them, and give some concluding remarks.

## Outline
1. Example
2. C++ code
3. Cython code
4. Python code
5. Compilation
6. Testing
7. Pitfalls
8. Conclusion

## 1. Example

The implementation used as an example to help the reader understand the concepts is about the stock market. We will use Barrick Gold, the largest gold mining company in the world, as an example of a company. This particular company was chosen because it underwent a change in its ticker, which is something that will be useful to represent.

We will create a string for its ticker, a 1D array for its share price data on 2 January 2019, and a 2D array for its share price data on 31 December 2018 and 28 December 2018. We will intentionally make mistakes in our Python code, so that C++ can fix them. We will initialize the ticker to ABX so that the C++ code can update it to GOLD. We will write the Close share price for the price data as 1.310 instead of 13.10, so that the C++ code can multiply it by 10. And we will write the Close share price for the latest day of the historical price data as 1.312 instead of 13.12, so that the C++ code can multiply it by 10.

## 2. C++ code

### Motivation
The objective of this project is to use C++ code because of its faster, more efficient properties. For this example, we will write very basic C++ code that only highlights the main features Cython offers. We will create a simple C++ class called `Stock_c`, that will hold several member variables.

### Class member variables
-  a char* called `ticker`, for the company’s ticker. e.g. ABX for Barrick Gold on the New York Stock Exchange. It changed to GOLD on 2 January 2019.
-  a vector of doubles called `price_data`, for the company’s share price on a given day, using the typical Open, High, Low, Close (OHLC) representation. e.g. Barrick Gold traded at [13.64, 13.69, 13.05, 13.10] on 2 January 2019.
-  a vector of vector doubles called `historical_price_data`, for the company’s price data in the previous two trading days. e.g. Barrick Gold traded at [13.00, 13.59, 12.69, 13.54] on Monday, 31 December 2018 and [13.54, 13.62, 13.06, 13.12] on Friday, 28 December 2019. These are the two previous trading days because markets close on the weekend and on January 1st to celebrate New Year’s Eve.
-  two integers called `price_data_size` and `historical_price_data_size`, for the size of their respective variable.


### Class member functions
-  get ticker: return the class member variable ticker
-  change ticker: change the class member variable ticker to GOLD
-  change price data: multiply the Close share price of the price data ten-fold
-  change historical price data: multiply the Close share price of the oldest day in the historical_price_data ten-fold
-  get industry: return the industry
-  get historical volume: return the historical volume, in thousands of shares

### Files
-  stock_h.h: a classic header file containing the declaration of the class
-  stock_cpp.cpp: a classic cpp file containing the implementation of the class

## 3. Cython code

### Motivation
The Cython code allows the interface between the C++ code and the Python code. The pyx file is equivalent to a cpp file, and the pxd file is equivalent to a header file.

### Files
-  stock_pyx.pyx: a Cython file that
  -  imports the C++ class through its declaration in the pxd file
  -  declares the functions used from Python that will call the C++ functions
-  stock_pxd.pxd: a Cython file that
  -  declares the cpp file to use
  -  declares the header file to use, along with declarations of the C++ functions

## 4. Python code

### Motivation
Allow the user to only write Python code, and through it use the classes and functions in C++. Python also initiates the compilation process.

### Files
-  setup.py: declares the pyx file to cythonize
-  test.py: import the pyx file, use the functions declared in Cython

## 5. Compilation

### Motivation
Optional bash script that executes the setup Python program.

### Files
-  compile.sh:

In [None]:
python3 setup.py build_ext --inplace

## 6. Testing

### Motivation
Our test file written in Python describes what variables are created and their addresses, by printing. These variables are used to create the Barrick Gold `Stock_py` class instance. The C++ code also prints out the value of the variables and their addresses.

### Checking the C++ code 
We first check that the C++ code is working. We can write a simple main routine to create a `Stock_c` object and print its `ticker`. We will compile this using the GNU C++ compiler. Simply compile with `g++ stock_cpp.cpp` and run the output file  `./a.out` to check the output:

In [None]:
C++: create BarrickGold
C++: overloaded constructor
C++: this->ticker: ABX
C++: this->price_data: 0x7fffca28d330
C++: this->price_data_size: 4
C++: this->historical_price_data: 0x7fffca28d310
C++: this->historical_price_data_size: 8
C++: get_ticker_c, return this->ticker
C++: get_ticker_c() returns: ABX

### Checking the Python code
Now that we know the C++ code is working, we will try to use it from Python. Run `./compile.sh`, the command line should output something like this:

In [None]:
[1/1] Cythonizing stock_pyx.pyx
/home/damienlefebvre/.local/lib/python3.5/site-packages/Cython/Compiler/Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /home/damienlefebvre/Math199/v5/stock_pyx.pyx
  tree = Parsing.p_module(s, pxd, full_module_name)
running build_ext
building 'stock_pyx' extension
creating build
creating build/temp.linux-x86_64-3.5
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I. -I/usr/include/python3.5m -c stock_pyx.cpp -o build/temp.linux-x86_64-3.5/stock_pyx.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.5/stock_pyx.o -o /home/damienlefebvre/Math199/v5/stock_pyx.cpython-35m-x86_64-linux-gnu.so

You'll notice the compiler throws a warning about `-Wstrict-prototypes`. This is a common issue caused by Python's build_ext. Since this warning is harmless and there are no easy solutions currently available, it's best to ignore it.

### Outline of `test.py`
1. create the string `ticker`
2. create the 1D array `price_data`
3. create the 2D array `historical_price_data`
4. create the Stock_py `Barrick Gold` using variables `ticker`, `price_data`, and `historical_price_data`
5. retrieve `ticker` from C++
6. change `ticker` using C++
7. check in Python that `ticker` changed by printing it
8. change `price_data` using C++
9. check in Python that `price_data` changed by printing it
10. change `historical_price_data` using C++
11. check in Python that `historical_price_data` changed by printing it
12. retrieve the industry from C++
13. retrieve the historical_volume from C++

### Output

In [None]:
Python: create string ticker
Python: ticker is b'ABX'
Python: the address of ticker is 0x7f94d8994b48

We create `ticker` as a binary string.

In [None]:
Python: create numpy array price_data
Python: price_data is [13.64 13.69 13.05  1.31]
Python: the address of price_data is 0x7f94d89f85d0

We create `price_data` with the intentional typo for the Close share price.

In [None]:
Python: create numpy array historical_price_data
Python: historical_price_data is [[13.    13.59  12.69  13.54 ]
 [13.54  13.62  13.06   1.312]]
Python: the address of historical_price_data is 0x7f94d8a000d0

We create `historical_price_data` with the intentional typo for the Close share price of the oldest day.

In [None]:
Python: create Stock_py BarrickGold with ticker, price_data, and historical_price_data

We initialize a `Stock_py` class called `BarrickGold` with the three variables we declared.

In [None]:
C++: overloaded constructor
C++: this->ticker: ABX
C++: this->price_data: 0x26e2910
C++: this->price_data_size: 4
C++: this->historical_price_data: 0x28a8690
C++: this->historical_price_data_size: 8

C++ was called and initialized the class successfully.

In [None]:
Python: call get_ticker_py
C++: get_ticker_c, return this->ticker
Python: get_ticker_py returns: ABX

We use Python to retrieve the ticker successfully.

In [None]:
Python: call change_ticker_py
C++: this->ticker: ABX
C++: *(this->ticker): A
C++: &(this->ticker): 0x7f94cb4f34a0

We use Python to make C++ change `ticker`. In C++, we print the value of `ticker` and its address. Notice the first 4 digits of the hex address are identical to the address in Python: `0x7f4e`

In [None]:
C++: this->ticker[0]: A
C++: this->ticker[1]: B
C++: this->ticker[2]: X
C++: this->ticker[3]:
C++: this->ticker[4]:
C++: this->ticker[5]: <0x7f>
C++: this->ticker[6]:
C++: this->ticker[7]:
C++: this->ticker[8]: <0x01>
C++: this->ticker[9]:

We print in C++ the first 10 values at that address. We recognize the `A`, `B`, and `X` characters, followed by some garbage values.

In [None]:
C++: set this->ticker to GOLD, one character at a time
C++: this->ticker: GOLD<0x7f>

We set `ticker` to `GOLD` one character at a time in a loop, and print to check.

In [None]:
Python: ticker is b'GOL'
Python: the address of ticker is 0x7f94d8994b48

Back to Python, we print `ticker` to check the results. The characters were successfully changed, but the Python variable is still set to length 3. If we could print the character after that, we would get the missing `L`. The address hasn't changed.

In [None]:
Python: call change_price_data_py
C++: this->price_data: 0x26e2910
C++: *(this->price_data): 13.64
C++: &(this->price_data): 0x7f94cb4f34a8

We use Python to make C++ change `price_data`. In C++, we print the value of `price_data` and its address. Again, notice the first 4 digits of the hex address are identical to the address in Python: `0x7f4e`

In [None]:
C++: this->price_data[0]: 13.64
C++: this->price_data[1]: 13.69
C++: this->price_data[2]: 13.05
C++: this->price_data[3]: 1.31
C++: this->price_data[4]: 2.37152e-322
C++: this->price_data[5]: 4.03652e-321
C++: this->price_data[6]: 4.94066e-324
C++: this->price_data[7]: 1.58101e-322
C++: this->price_data[8]: 2.92073e-317
C++: this->price_data[9]: 4.94066e-323

We print in C++ the first 10 values at that address. We recognize the first 4 values, followed by some garbage values (although this time they have the same format, namely doubles).

In [None]:
C++: multiply this->price_data[this->price_data_size-1] ten-fold
C++: this->price_data[this->price_data_size-1]: 13.1

We fix the typo in C++ and print to check.

In [None]:
Python: price_data is [13.64 13.69 13.05 13.1 ]
Python: the address of price_data is 0x7f94d89f85d0

Back to Python, we print `price_data` to check the results. The double was successfully changed. The address hasn't changed.

In [None]:
Python: call change_historical_price_data_py
C++: this->historical_price_data: 0x28a8690

We use Python to make C++ change `historical_price_data`. In C++, we print the value of `historical_price_data`. Its address is unavailable. Again, notice the first 4 digits of the hex address are identical to the address in Python: `0x7f4e`

In [None]:
C++: this->historical_price_data[0]: 13
C++: this->historical_price_data[1]: 13.59
C++: this->historical_price_data[2]: 12.69
C++: this->historical_price_data[3]: 13.54
C++: this->historical_price_data[4]: 13.54
C++: this->historical_price_data[5]: 13.62
C++: this->historical_price_data[6]: 13.06
C++: this->historical_price_data[7]: 1.312
C++: this->historical_price_data[8]: 6.93062e-310
C++: this->historical_price_data[9]: 2.1783e-316

We print in C++ the first 10 values at that address. We recognize the first 8 values, followed by some garbage values.

In [None]:
C++: multiply this->historical_price_data[this->historical_price_data_size-1] ten-fold
C++: this->historical_price_data[this->historical_price_data_size-1]: 13.12

We fix the typo in C++ and print to check.

In [None]:
Python: historical_price_data is [[13.   13.59 12.69 13.54]
 [13.54 13.62 13.06 13.12]]
Python: the address of historical_price_data is 0x7f94d8a000d0

Back to Python, we print `historical_price_data` to check the results. The double was successfully changed. The address hasn't changed.

In [None]:
Python: call get_industry_py
C++: get_industry_c, return a char*
Python: get_industry_py returns: gold mining

We use Python to get the industry, and let C++ create a char*. We print the value in Python to check.

In [None]:
Python: call get_historical_volume_py
C++: get_historical_volume_c, return a vector of doubles
Python: get_historical_volume_py returns: [22729.528, 38261.599]

We use Python to get the historical volume, and let C++ create a vector of doubles. We print the value in Python to check.

## 7. Pitfalls

### Binary strings
Strings created in Python need to be encoded in bytes for C++ to read them. This can either be done by writing `b` before the string, e.g. `b'ABX'`, or by calling the default encode function, e.g. `'ABX'.encode()`<br>
However, if the code calls `.encode()` in the pyx file, this will create a copy of the string and pass it to the function. Thus it will lose the reference.

Pitfall: never call `.encode()` in the pyx file, this will break the reference <br>
Solution: always declare your strings as bytes strings directly in Python using `b`

### File naming
Cython will take the name of the pyx file, and use it for the name of the new cpp file that will be run. If a cpp file already exists with that name, Cython will not overwrite it. No errors will be raised, and the compilation will seemingly succeed. But in the end, there is no executable to call.

Pitfall: don’t call the cpp and pyx files the same name<br>
Solution: we recommend using \_fileformat at the end of every file name, e.g. `stock_pyx.pyx`

### Cython syntax
In the pxd file, the C++ function signatures are declared. However, Cython is still a hybrid between C++ and Python. Thus some of the syntax is different from C++.

Pitfall: vectors are not declared using `<>` <br>
Solution: use `[]` to declare vectors, e.g. `vector[double]`

### Updating files
If the `stock_pyx.cpp` file created during a previous compilation is not removed, then any subsequent compilation will use that file without updating it. This will most likely lead to a failed compilation.

Pitfall: compiling with an old `*pyx.cpp` file will fail <br>
Solution: always call `rm *pyx.cpp` (or even `rm *x.c*`) before `./compile.sh`

### Leaving comments
The top comment `# distutils: language = c++` in the pyx file is needed. Removing it will fail the compilation.

Pitfall: compiling without the comment will fail <br>
Solution: leave the comment

## 8. Conclusion

Using C++ through Python can be a challenging task, even through a robust interface such as Cython. Fortunately, the official Cython documentation offers useful examples, and forums such as Stack Exchange can also offer good advice.

Through careful variable and file naming, one can build a robust interface between C++ and Python, offering the best of both worlds.