Others:

[How much of Numpy is in C?](https://stackoverflow.com/questions/1825857/how-much-of-numpy-and-scipy-is-in-c)

[Python Performance](http://scipy.github.io/old-wiki/pages/PerformancePython.html)

[Numpy vs Cython](https://stackoverflow.com/questions/7799977/numpy-vs-cython-speed?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa)

[performance of python cython and c on a vector](http://notes-on-cython.readthedocs.io/en/latest/std_dev.html)

[Which-is-faster-numpy-vectorized-code-or-hand-written-C-code](https://www.quora.com/Which-is-faster-numpy-vectorized-code-or-hand-written-C-code)

[Tensorflow vs Numpy](https://towardsdatascience.com/numpy-vs-tensorflow-speed-on-matrix-calculations-9cbff6b3ce04)

"The optimized C code numpy/scipy use behind the scenes consists of the Automatically Tuned Linear Algebra Software (ATLAS), BLAS (Basic Linear Algebra Subprograms) and LAPACK - Linear Algebra PACKage. These libraries have been developed and worked on for ages, are the gold standard of linear algebra vector and matrix computations, and their performance for a given machine are the benchmark used to determine which architecture is best for scientific computing—Intel and other CPU chip makers care very much about these benchmarks and make sure the code is as updated and optimal as humanly possible.

Essentially nobody writes their own linear algebra routines anymore. It just makes no sense in general, and when it does, it's only in very particular cases (and in these special cases it almost always involves careful assembly coding for a specialized or one-of-a-kind architecture).

So if for some reason you feel your code is too slow, please consider doing all the appropriate profiling runs to find out exactly why. Odds are the linear algebra algorithms aren't to blame, and writing your own won't improve anything (and might even reduce performance)."

[why is list comprehension append faster than list.append?](https://www.quora.com/Why-are-list-comprehensions-faster-than-for-loops-in-Python)

The speed of list comprehensions is notably better than for-loops when appending items to the list.

List comprehensions perform better here because you don’t need to load the append attribute off of the list and call it as a function. Instead, in a comprehension, a specialized LIST_APPEND bytecode is generated for a fast append onto the result list.

Much more information and a better explanation in the source below.

----
Source: [Efficiency of list comprehensions](http://blog.cdleary.com/2010/04/efficiency-of-list-comprehensions/)

In [2]:
import dis
import inspect
import timeit


programs = dict(
    loop="""
result = []
for i in range(20):
    result.append(i * 2)
""",
   loop_faster="""
result = []
add = result.append
for i in range(20):
    add(i * 2)
""",
    comprehension='result = [i * 2 for i in range(20)]',
)

In [5]:
for name, text in programs.items():
    print (name, timeit.Timer(stmt=text).timeit())
    code = compile(text, '<string>', 'exec')
    dis.disassemble(code)

loop 1.9982350549980765
  2           0 BUILD_LIST               0
              2 STORE_NAME               0 (result)

  3           4 SETUP_LOOP              30 (to 36)
              6 LOAD_NAME                1 (range)
              8 LOAD_CONST               0 (20)
             10 CALL_FUNCTION            1
             12 GET_ITER
        >>   14 FOR_ITER                18 (to 34)
             16 STORE_NAME               2 (i)

  4          18 LOAD_NAME                0 (result)
             20 LOAD_ATTR                3 (append)
             22 LOAD_NAME                2 (i)
             24 LOAD_CONST               1 (2)
             26 BINARY_MULTIPLY
             28 CALL_FUNCTION            1
             30 POP_TOP
             32 JUMP_ABSOLUTE           14
        >>   34 POP_BLOCK
        >>   36 LOAD_CONST               2 (None)
             38 RETURN_VALUE
loop_faster 1.535684387999936
  2           0 BUILD_LIST               0
              2 STORE_NAME               0 (r

[Python loop optimization](https://nyu-cds.github.io/python-performance-tips/08-loops/)

The same loop can also be written with a list comprehension. You can use list comprehension to replace many for and while blocks. List comprehension is faster because it is optimized for the Python interpreter to spot a predictable pattern during looping. Besides the syntactic benefit of list comprehensions, they are often as fast or faster than equivalent use of map.

- in the link example, no use of upper(), append() functions and not having to access them saves a ton
- syntactically better
- fast/faster than equivalent use of map
- comprehension is optimized for Python interpreter


In [7]:
import timeit

list_code = '''
out = []
for x in range(100000):
    out.append(x*2)
'''

list_comprehension_code = '''[x*2 for x in range(100000)]'''

In [8]:
timeit.timeit(stmt=list_code, number=10000)

126.24306294300186

In [9]:
timeit.timeit(stmt=list_comprehension_code, number=10000)

73.33117101399694

## Size of float, numpy item size, etc.

In [11]:
# A_np.itemsize

import sys
sys.float_info

sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

In [13]:
# Numpy internal layout 

# https://docs.scipy.org/doc/numpy-1.14.1/reference/arrays.ndarray.html#internal-memory-layout-of-an-ndarray