# Numpy
Short for Numerical Python, the fundamental package required for high performance scientific computing and data analysis. There are several features that make _numpy_ an important tool for data analysis:
* __ndarray__, a fast and space-efficient multidimensional array providing vectorized arithemetic operations and sophiscated _broadcasting_ capabilities.
* Standard mathematical functions for fast operations on entire arrays of data without looping
* Tools for reading / writing data 
* Linear algebra, random number generation, and Fourier transform capabilities
* Tools for integrating code written in C, C++, and Fortran. 

In [1]:
import numpy as np

## __ndarray__ Object
A __ndarray__ object internally consists of the following:
* A _pointer_ to data: a block of memory
* The *data type* or __dtype__
* A tuple indicating the array's _shape_
* A tuple of _strides_: integers indicating the number of bytes to __step__ in order to advance one element along a dimension. 
Overall the structure can be:
    typedef struct PyArrayObject{
        PyObject_HEAD
        
        char *data;
        
        PyArray_Descr *descr;
        
        int nd;
        npy_intp *dimensions;
        npy_intp *strides;
        
        PyObject *base;
        int flags;
        PyObject *weekreflist;
   }PyArrayObject;

A Block of Memory can be found in __ndarry__ directly.

In [2]:
x = np.array([1, 2, 3])

In [3]:
x.data

<memory at 0x7f3bb4a3ab88>

In [4]:
str(x.data)

'<memory at 0x7f3bb4a3ac48>'

In [5]:
x.__array_interface__

{'data': (35624288, False),
 'descr': [('', '<i8')],
 'shape': (3,),
 'strides': None,
 'typestr': '<i8',
 'version': 3}

Above you can see the major information of the internal structure in the __ndarray__ object.

In [6]:
x.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

Here __OWNDATA__ and __WRITEABLE__ indicate the status of the memory block.

## Date Types
_dtype_ describes a single item in the array:
* type: __scalar type__ of the data
* itemsize: __size__ of the data block
* byteorder **byte order**: big-endian > :: little-endian < :: not applicable |
* fields: sub-dtypes, if it is a structured data type
* shape: shape of the array
Let's see them on the go:

In [7]:
np.dtype(int).type

numpy.int64

In [8]:
np.dtype(int).itemsize

8

In [9]:
np.dtype(int).byteorder

'='

One can use __dtype__ to get the data type of the array, and use __astype__ to _cast_ an array from one __dtype__ to another. __astype__ always creates a new array which is a copy of the data.

In [27]:
arr = np.array([4,5,6,3])

In [28]:
arr.dtype

dtype('int64')

In [30]:
float_arr = arr.astype(np.float64)
float_arr.dtype

dtype('float64')

OK, let's go to the __ndarray__ in general and see how efficient the calculation is involving __ndarray__ object.

In [11]:
t = range(1000)

In [12]:
%timeit [i**2 for i in t]

1000 loops, best of 3: 285 µs per loop


In [13]:
a = np.arange(1000)

In [14]:
%timeit a**2

The slowest run took 101.30 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.67 µs per loop


## Creating ndarray
__Any sequence__ can be passed in the _array_ function and generate a __ndarray__ object. Note that __ndarray__ contains homogenous data types.

In [15]:
d1 = [4, 54, 0.3, 4]
arry1 = np.array(d1)

In [16]:
arry1.shape

(4,)

In [17]:
arry1.ndim

1

In [22]:
len(arry1)

4

__Pay attention__ to __1D array__ that has empty values for the second dimension in **shape**, however one cannot determine the dimensions of the array by computing the size of the **shape** of the array. 

In [18]:
d2 = [[45, 69, -4.94], [54.45, 3.94, 0.34]]
array2 = np.array(d2)

In [19]:
array2.shape

(2, 3)

In [20]:
array2.ndim

2

In [21]:
len(array2)

2

From above, **len** function always return the size of the first dimension.There are some more functions that can be used to generate __ndarray__ objects.

In [23]:
np.empty((3, 4))

array([[  6.91173218e-310,   1.75115916e-316,   6.91173300e-310,
          6.91173300e-310],
       [  6.91173300e-310,   6.91173300e-310,   6.91173300e-310,
          6.91170205e-310],
       [  0.00000000e+000,   1.63041663e-322,   1.50796661e-316,
          1.75082241e-316]])

Here __empty__ function can generate an array with given shape (tuple) without initializing values, __not zeros__.

In [25]:
np.linspace(5, 20, 15)

array([  5.        ,   6.07142857,   7.14285714,   8.21428571,
         9.28571429,  10.35714286,  11.42857143,  12.5       ,
        13.57142857,  14.64285714,  15.71428571,  16.78571429,
        17.85714286,  18.92857143,  20.        ])

In [26]:
np.diag(np.array([4, 5, 10, 7]))

array([[ 4,  0,  0,  0],
       [ 0,  5,  0,  0],
       [ 0,  0, 10,  0],
       [ 0,  0,  0,  7]])