## Оптимизация выполнения кода, векторизация, Numba

Материалы:
* Макрушин С.В. Лекция 3: Оптимизация выполнения кода, векторизация, Numba
* IPython Cookbook, Second Edition (2018), глава 4
* https://numba.pydata.org/numba-doc/latest/user/5minguide.html

In [171]:
import numpy as np
import pandas as pd
import numba
from numba import jit, njit

## Задачи для совместного разбора

1. Сгенерируйте массив `A` из `N=1млн` случайных целых чисел на отрезке от 0 до 1000. Пусть `B[i] = A[i] + 100`. Посчитайте среднее значение массива `B`.

In [2]:
A = np.random.randint(0, 1000, size=1000000)
B = A+100
B.mean()

599.673194

2. Создайте таблицу 2млн строк и с 4 столбцами, заполненными случайными числами. Добавьте столбец `key`, которые содержит элементы из множества английских букв. Выберите из таблицы подмножество строк, для которых в столбце `key` указаны первые 5 английских букв.

In [3]:
df = pd.DataFrame(np.random.randint(0, 1000, size=(2000000, 4)),
                  columns=['col1', 'col2', 'col3', 'col4'])
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
df['key'] = np.random.choice(letters, 2000000, replace=True)
def g(df):
    letters = ['a', 'b', 'c', 'd', 'e']
    dfs = []
    for letter in letters:
        q = df[df['key']==letter]
        dfs.append(q)
    return pd.concat(dfs, axis=0)
g(df).head()

Unnamed: 0,col1,col2,col3,col4,key
5,482,452,422,378,a
8,427,160,68,417,a
15,118,17,196,993,a
24,185,608,109,895,a
35,210,512,179,228,a


## Лабораторная работа 3

In [35]:
 #!pip install line_profiler
%reload_ext line_profiler

In [36]:
%reload_ext memory_profiler

1. В файлах `recipes_sample.csv` и `reviews_sample.csv` (__ЛР 2__) находится информация об рецептах блюд и отзывах на эти рецепты соответственно. Загрузите данные из файлов в виде `pd.DataFrame` с названиями `recipes` и `reviews`. Обратите внимание на корректное считывание столбца(ов) с индексами. Приведите столбцы к нужным типам.

Реализуйте несколько вариантов функции подсчета среднего значения столбца `rating` из таблицы `reviews` для отзывов, оставленных в 2010 году.

A. С использованием метода `DataFrame.iterrows` исходной таблицы;

Б. С использованием метода `DataFrame.iterrows` таблицы, в которой сохранены только отзывы за 2010 год;

В. С использованием метода `Series.mean`.

Проверьте, что результаты работы всех написанных функций корректны и совпадают. Измерьте выполнения всех написанных функций.


In [2]:
recipes = pd.read_csv("recipes_sample.csv", sep=",", parse_dates=['submitted'])
reviews = pd.read_csv("reviews_sample.csv", sep=",", parse_dates=['date'], index_col=0)
reviews.reset_index(drop = True, inplace=True)

In [11]:
def mean_A(reviews):
    ratS = 0
    k = 0
    for i, r in reviews.iterrows():
        if r.date.year == 2010:
            ratS += r['rating']
            k += 1
    return ratS/k

In [12]:
def mean_B(reviews):
    rv = reviews[reviews.date.dt.year == 2010]
    ratS = 0
    for i, r in rv.iterrows():
        ratS += r['rating']
    return ratS/rv.shape[0]

In [13]:
def mean_C(reviews):
    return reviews[reviews.date.dt.year == 2010]['rating'].mean()

In [14]:
%%time
print(mean_A(reviews))

` not found.


In [15]:
%%time
mean_B(reviews)

` not found.


In [16]:
%%time
print(mean_C(reviews))

` not found.


2. Какая из созданных функций выполняется медленнее? Что наиболее сильно влияет на скорость выполнения? Для ответа использовать профайлер `line_profiler`. Сохраните результаты работы профайлера в отдельную текстовую ячейку и прокомментируйте результаты его работы.

(*). Сможете ли вы ускорить работу функции 1Б, отказавшись от использования метода `iterrows`, но не используя метод `mean`?

In [17]:
%lprun -f mean_A mean_A(reviews)

Timer unit: 1e-07 s

Total time: 34.1637 s
File: <ipython-input-11-fb9491a545bc>
Function: mean_A at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def mean_A(reviews):
     2         1         15.0     15.0      0.0      ratS = 0
     3         1          6.0      6.0      0.0      k = 0
     4    126697  285276191.0   2251.6     83.5      for i, r in reviews.iterrows():
     5    126696   53535290.0    422.5     15.7          if r.date.year == 2010:
     6     12094    2750851.0    227.5      0.8              ratS += r['rating']
     7     12094      74385.0      6.2      0.0              k += 1
     8         1         15.0     15.0      0.0      return ratS/k

Timer unit: 1e-07 s

Total time: 33.5067 s
File: <ipython-input-131-c151a5cd4da1>
Function: mean_A at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def mean_A(reviews):
     2         1         15.0     15.0      0.0      ratS = 0
     3         1          8.0      8.0      0.0      k = 0
     4    126697  279562694.0   2206.5     83.4      for i, r in reviews.iterrows():
     5    126696   52714967.0    416.1     15.7          if r.date.year == 2010:
     6     12094    2707164.0    223.8      0.8              ratS += r['rating']
     7     12094      82230.0      6.8      0.0              k += 1
     8         1         12.0     12.0      0.0      return ratS/k

Больше всего времени тратится на интерации цикла и проверку условия

In [None]:
%lprun -f mean_B mean_B(reviews)

Timer unit: 1e-07 s

Total time: 2.83106 s
File: <ipython-input-132-2ada19111e66>
Function: mean_B at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def mean_B(reviews):
     2         1     167908.0 167908.0      0.6      rv = reviews[reviews.date.dt.year == 2010]
     3         1         13.0     13.0      0.0      ratS = 0
     4     12095   24993617.0   2066.4     88.3      for i, r in rv.iterrows():
     5     12094    3148955.0    260.4     11.1          ratS += r['rating']
     6         1         81.0     81.0      0.0      return ratS/rv.shape[0]

Timer unit: 1e-07 s

Total time: 2.83106 s
File: <ipython-input-132-2ada19111e66>
Function: mean_B at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def mean_B(reviews):
     2         1     167908.0 167908.0      0.6      rv = reviews[reviews.date.dt.year == 2010]
     3         1         13.0     13.0      0.0      ratS = 0
     4     12095   24993617.0   2066.4     88.3      for i, r in rv.iterrows():
     5     12094    3148955.0    260.4     11.1          ratS += r['rating']
     6         1         81.0     81.0      0.0      return ratS/rv.shape[0]

Больше всего времени тратится на интерации цикла

In [None]:
%lprun -f mean_C mean_C(reviews)

Timer unit: 1e-07 s

Total time: 0.0171297 s
File: <ipython-input-133-35496f1ea080>
Function: mean_C at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def mean_C(reviews):
     2         1     171297.0 171297.0    100.0      return reviews[reviews.date.dt.year == 2010]['rating'].mean()

Timer unit: 1e-07 s

Total time: 0.0171297 s
File: <ipython-input-133-35496f1ea080>
Function: mean_C at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def mean_C(reviews):
     2         1     171297.0 171297.0    100.0      return reviews[reviews.date.dt.year == 2010]['rating'].mean()

In [18]:
def mean_B1(reviews):
    rv = reviews[reviews.date.dt.year == 2010]
    return rv['rating'].sum()/rv.shape[0]

In [19]:
mean_B1(reviews)

4.4544402182900615

In [None]:
%lprun -f mean_B1 mean_B1(reviews)

Timer unit: 1e-07 s

Total time: 0.0196799 s
File: <ipython-input-140-4f01dfa41a1e>
Function: mean_B1 at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def mean_B1(reviews):
     2         1     184861.0 184861.0     93.9      rv = reviews[reviews.date.dt.year == 2010]
     3         1      11938.0  11938.0      6.1      return rv['rating'].sum()/rv.shape[0]

Timer unit: 1e-07 s

Total time: 0.0196799 s
File: <ipython-input-140-4f01dfa41a1e>
Function: mean_B1 at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def mean_B1(reviews):
     2         1     184861.0 184861.0     93.9      rv = reviews[reviews.date.dt.year == 2010]
     3         1      11938.0  11938.0      6.1      return rv['rating'].sum()/rv.shape[0]

3. Вам предлагается воспользоваться функцией, которая собирает статистику о том, сколько отзывов содержат то или иное слово. Измерьте время выполнения этой функции. Сможете ли вы найти узкие места в коде, используя профайлер? Выпишите (словами), что в имеющемся коде реализовано неоптимально. Оптимизируйте функцию и добейтесь значительного (как минимум, на один порядок) прироста в скорости выполнения.

In [37]:
def get_word_reviews_count(df):
    word_reviews = {}
    for _, row in df.dropna(subset=['review']).iterrows():
        recipe_id, review = row['recipe_id'], row['review']
        words = review.split(' ')
        for word in words:
            if word not in word_reviews:
                word_reviews[word] = []
            word_reviews[word].append(recipe_id)
    word_reviews_count = {}
    for _, row in df.dropna(subset=['review']).iterrows():
        review = row['review']
        words = review.split(' ')
        for word in words:
            word_reviews_count[word] = len(word_reviews[word])
    return word_reviews_count

In [40]:
%lprun -f get_word_reviews_count get_word_reviews_count(reviews)

Timer unit: 1e-07 s

Total time: 94.8331 s
File: <ipython-input-37-f2b5c45f390a>
Function: get_word_reviews_count at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def get_word_reviews_count(df):
     2         1         44.0     44.0      0.0      word_reviews = {}
     3    126680  307372721.0   2426.4     32.4      for _, row in df.dropna(subset=['review']).iterrows():
     4    126679   65111357.0    514.0      6.9          recipe_id, review = row['recipe_id'], row['review']
     5    126679    7858419.0     62.0      0.8          words = review.split(' ')
     6   6918689   30722058.0      4.4      3.2          for word in words:
     7   6792010   46552338.0      6.9      4.9              if word not in word_reviews:
     8    174426    1295688.0      7.4      0.1                  word_reviews[word] = []
     9   6792010   50238494.0      7.4      5.3              word_reviews[word].append(recipe_id)
    10 

Timer unit: 1e-07 s

Total time: 94.8331 s
File: <ipython-input-37-f2b5c45f390a>
Function: get_word_reviews_count at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def get_word_reviews_count(df):
     2         1         44.0     44.0      0.0      word_reviews = {}
     3    126680  307372721.0   2426.4     32.4      for _, row in df.dropna(subset=['review']).iterrows():
     4    126679   65111357.0    514.0      6.9          recipe_id, review = row['recipe_id'], row['review']
     5    126679    7858419.0     62.0      0.8          words = review.split(' ')
     6   6918689   30722058.0      4.4      3.2          for word in words:
     7   6792010   46552338.0      6.9      4.9              if word not in word_reviews:
     8    174426    1295688.0      7.4      0.1                  word_reviews[word] = []
     9   6792010   50238494.0      7.4      5.3              word_reviews[word].append(recipe_id)
    10         1         24.0     24.0      0.0      word_reviews_count = {}
    11    126680  291349315.0   2299.9     30.7      for _, row in df.dropna(subset=['review']).iterrows():
    12    126679   37424069.0    295.4      3.9          review = row['review']
    13    126679    7654003.0     60.4      0.8          words = review.split(' ')
    14   6918689   31503737.0      4.6      3.3          for word in words:
    15   6792010   71249044.0     10.5      7.5              word_reviews_count[word] = len(word_reviews[word])
    16         1         18.0     18.0      0.0      return word_reviews_count

Во-первых, данную функцию как минимум замедляет наличие двух похожих циклов. Во-вторых, мы храним индексы, которые не нужны. В-третьих, сама функция работает не совсем корректно.

In [47]:
def get_word_reviews_count2(df):
    k = 0
    word_reviews = dict.fromkeys(" ".join(reviews['review'].dropna()).split(' '), 0)
    for row in df['review'].dropna():
        for word in set(row.split(' ')):
            word_reviews[word] += 1
    return word_reviews

In [46]:
%lprun -f get_word_reviews_count2 get_word_reviews_count2(reviews)

Timer unit: 1e-07 s

Total time: 9.2698 s
File: <ipython-input-45-c95051c92bb4>
Function: get_word_reviews_count2 at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def get_word_reviews_count2(df):
     2         1         21.0     21.0      0.0      k = 0
     3         1   16691021.0 16691021.0     18.0      word_reviews = dict.fromkeys(" ".join(reviews['review'].dropna()).split(' '), 0)
     4    126680    1109065.0      8.8      1.2      for row in df['review'].dropna():
     5                                           
     6   5513986   34795738.0      6.3     37.5          for word in set(row.split(' ')):
     7   5387307   40102150.0      7.4     43.3              word_reviews[word] += 1
     8         1         30.0     30.0      0.0      return word_reviews

Timer unit: 1e-07 s

Total time: 9.2698 s
File: <ipython-input-45-c95051c92bb4>
Function: get_word_reviews_count2 at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def get_word_reviews_count2(df):
     2         1         21.0     21.0      0.0      k = 0
     3         1   16691021.0 16691021.0     18.0      word_reviews = dict.fromkeys(" ".join(reviews['review'].dropna()).split(' '), 0)
     4    126680    1109065.0      8.8      1.2      for row in df['review'].dropna():
     5                                           
     6   5513986   34795738.0      6.3     37.5          for word in set(row.split(' ')):
     7   5387307   40102150.0      7.4     43.3              word_reviews[word] += 1
     8         1         30.0     30.0      0.0      return word_reviews

4. Напишите несколько версий функции `MAPE` (см. [MAPE](https://en.wikipedia.org/wiki/Mean_absolute_percentage_error)) для расчета среднего абсолютного процентного отклонения значения рейтинга отзыва на рецепт от среднего значения рейтинга по всем отзывам для этого рецепта. 
    1. Без использования векторизованных операций и методов массивов `numpy` и без использования `numba`
    2. Без использования векторизованных операций и методов массивов `numpy`, но с использованием `numba`
    3. С использованием векторизованных операций и методов массивов `numpy`, но без использования `numba`
    4. C использованием векторизованных операций и методов массивов `numpy` и `numba`
    
Измерьте время выполнения каждой из реализаций.

Замечание: удалите из выборки отзывы с нулевым рейтингом.


In [164]:
rv = reviews[reviews['rating'] != 0]

In [195]:
def mape_1(df):
    mn = {}
    for _, row in df.iterrows():
        if row['recipe_id'] not in mn.keys():
            mn[row['recipe_id']] = {'n': 0, 'sum': 0}
        mn[row['recipe_id']]['sum'] += row['rating']
        mn[row['recipe_id']]['n'] += 1
    for id in mn:
        mn[id]['mean'] = mn[id]['sum']/mn[id]['n']
    mape = {}
    for _, row in df.iterrows():
        if row['recipe_id'] not in mape.keys():
            mape[row['recipe_id']] = {'s': 0}
        mape[row['recipe_id']]['s'] += abs((row['rating'] - mn[row['recipe_id']]['mean']) / row['rating'])
    res = {}
    for id in mape:    
        res[id] = 100 / mn[id]['n'] * mape[id]['s']
    return pd.Series(res)

In [196]:
%lprun -f mape_1 mape_1(rv)

Timer unit: 1e-07 s

Total time: 77.7704 s
File: <ipython-input-195-f6f6b12fb719>
Function: mape_1 at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def mape_1(df):
     2         1         21.0     21.0      0.0      mn = {}
     3    119892  263730001.0   2199.7     33.9      for _, row in df.iterrows():
     4    119891   33625050.0    280.5      4.3          if row['recipe_id'] not in mn.keys():
     5     27440    5924015.0    215.9      0.8              mn[row['recipe_id']] = {'n': 0, 'sum': 0}
     6    119891   48725256.0    406.4      6.3          mn[row['recipe_id']]['sum'] += row['rating']
     7    119891   23922929.0    199.5      3.1          mn[row['recipe_id']]['n'] += 1
     8     27441     144917.0      5.3      0.0      for id in mn:
     9     27440     265014.0      9.7      0.0          mn[id]['mean'] = mn[id]['sum']/mn[id]['n']
    10         1          8.0      8.0      0.0      mape = {}
 

Timer unit: 1e-07 s

Total time: 77.7704 s
File: <ipython-input-195-f6f6b12fb719>
Function: mape_1 at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def mape_1(df):
     2         1         21.0     21.0      0.0      mn = {}
     3    119892  263730001.0   2199.7     33.9      for _, row in df.iterrows():
     4    119891   33625050.0    280.5      4.3          if row['recipe_id'] not in mn.keys():
     5     27440    5924015.0    215.9      0.8              mn[row['recipe_id']] = {'n': 0, 'sum': 0}
     6    119891   48725256.0    406.4      6.3          mn[row['recipe_id']]['sum'] += row['rating']
     7    119891   23922929.0    199.5      3.1          mn[row['recipe_id']]['n'] += 1
     8     27441     144917.0      5.3      0.0      for id in mn:
     9     27440     265014.0      9.7      0.0          mn[id]['mean'] = mn[id]['sum']/mn[id]['n']
    10         1          8.0      8.0      0.0      mape = {}
    11    119892  264905758.0   2209.5     34.1      for _, row in df.iterrows():
    12    119891   33706174.0    281.1      4.3          if row['recipe_id'] not in mape.keys():
    13     27440    5925681.0    216.0      0.8              mape[row['recipe_id']] = {'s': 0}
    14    119891   96081272.0    801.4     12.4          mape[row['recipe_id']]['s'] += abs((row['rating'] - mn[row['recipe_id']]['mean']) / row['rating'])
    15         1          6.0      6.0      0.0      res = {}
    16     27441     150813.0      5.5      0.0      for id in mape:    
    17     27440     418903.0     15.3      0.1          res[id] = 100 / mn[id]['n'] * mape[id]['s']
    18         1     177891.0 177891.0      0.0      return pd.Series(res)

In [197]:
@njit
def mape_2(df):
    mn = {}
    for _, row in df.iterrows():
        if row['recipe_id'] not in mn.keys():
            mn[row['recipe_id']] = {'n': 0, 'sum': 0}
        mn[row['recipe_id']]['sum'] += row['rating']
        mn[row['recipe_id']]['n'] += 1
    for id in mn:
        mn[id]['mean'] = mn[id]['sum']/mn[id]['n']
    mape = {}
    for _, row in df.iterrows():
        if row['recipe_id'] not in mape.keys():
            mape[row['recipe_id']] = {'s': 0}
        mape[row['recipe_id']]['s'] += abs((row['rating'] - mn[row['recipe_id']]['mean']) / row['rating'])
    res = {}
    for id in mape:    
        res[id] = 100 / mn[id]['n'] * mape[id]['s']
    return pd.Series(res)

In [198]:
%lprun -f mape_2 mape_2(rv)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
[1m[1mnon-precise type pyobject[0m
[0m[1m[1] During: typing of argument at <ipython-input-197-3256e46d1a5c> (3)[0m
[1m
File "<ipython-input-197-3256e46d1a5c>", line 3:[0m
[1mdef mape_2(df):
[1m    mn = {}
[0m    [1m^[0m[0m

This error may have been caused by the following argument(s):
- argument 0: [1mcannot determine Numba type of <class 'pandas.core.frame.DataFrame'>[0m

This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.

To see Python/NumPy features supported by the latest release of Numba visit:
http://numba.pydata.org/numba-doc/dev/reference/pysupported.html
and
http://numba.pydata.org/numba-doc/dev/reference/numpysupported.html

For more information about typing errors and how to debug them visit:
http://numba.pydata.org/numba-doc/latest/user/troubleshoot.html#my-code-doesn-t-compile

If you think your code should work with Numba, please report the error message
and traceback, along with a minimal reproducer at:
https://github.com/numba/numba/issues/new


In [199]:
def mape_3(df):
    mn = df.groupby('recipe_id')['rating'].mean()
    mape = {}
    for ind in mn.index:
        revs = df[df['recipe_id'] == ind]['rating']
        revs = abs((revs - mn[ind]) / revs)
        mape[ind] = 100 / revs.shape[0] * revs.sum()
    return pd.Series(mape)


In [200]:
%lprun -f mape_3 mape_3(rv)

Timer unit: 1e-07 s

Total time: 79.0099 s
File: <ipython-input-199-bf51e456d4c9>
Function: mape_3 at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def mape_3(df):
     2         1     192706.0 192706.0      0.0      mn = df.groupby('recipe_id')['rating'].mean()
     3         1         17.0     17.0      0.0      mape = {}
     4     27441     568248.0     20.7      0.1      for ind in mn.index:
     5     27440  468965187.0  17090.6     59.4          revs = df[df['recipe_id'] == ind]['rating']
     6     27440  255551044.0   9313.1     32.3          revs = abs((revs - mn[ind]) / revs)
     7     27440   64627191.0   2355.2      8.2          mape[ind] = 100 / revs.shape[0] * revs.sum()
     8         1     194172.0 194172.0      0.0      return pd.Series(mape)

Timer unit: 1e-07 s

Total time: 79.0099 s
File: <ipython-input-199-bf51e456d4c9>
Function: mape_3 at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def mape_3(df):
     2         1     192706.0 192706.0      0.0      mn = df.groupby('recipe_id')['rating'].mean()
     3         1         17.0     17.0      0.0      mape = {}
     4     27441     568248.0     20.7      0.1      for ind in mn.index:
     5     27440  468965187.0  17090.6     59.4          revs = df[df['recipe_id'] == ind]['rating']
     6     27440  255551044.0   9313.1     32.3          revs = abs((revs - mn[ind]) / revs)
     7     27440   64627191.0   2355.2      8.2          mape[ind] = 100 / revs.shape[0] * revs.sum()
     8         1     194172.0 194172.0      0.0      return pd.Series(mape)

In [193]:
@njit
def mape_4(df):
    mn = df.groupby('recipe_id')['rating'].mean()
    mape = {}
    for ind in mn.index:
        revs = df[df['recipe_id'] == ind]['rating']
        revs = abs((revs - mn[ind]) / revs)
        mape[ind] = 100 / revs.shape[0] * revs.sum()
    return pd.Series(mape)

In [194]:
%lprun -f mape_4 mape_4(rv)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
[1m[1mnon-precise type pyobject[0m
[0m[1m[1] During: typing of argument at <ipython-input-193-9e8348cf5a26> (3)[0m
[1m
File "<ipython-input-193-9e8348cf5a26>", line 3:[0m
[1mdef mape_4(df):
[1m    mn = df.groupby('recipe_id')['rating'].mean()
[0m    [1m^[0m[0m

This error may have been caused by the following argument(s):
- argument 0: [1mcannot determine Numba type of <class 'pandas.core.frame.DataFrame'>[0m

This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.

To see Python/NumPy features supported by the latest release of Numba visit:
http://numba.pydata.org/numba-doc/dev/reference/pysupported.html
and
http://numba.pydata.org/numba-doc/dev/reference/numpysupported.html

For more information about typing errors and how to debug them visit:
http://numba.pydata.org/numba-doc/latest/user/troubleshoot.html#my-code-doesn-t-compile

If you think your code should work with Numba, please report the error message
and traceback, along with a minimal reproducer at:
https://github.com/numba/numba/issues/new
