<a href="https://colab.research.google.com/github/anidhyabhatnagar/sttp1/blob/scientific_computing/Scientific_Computing_in_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Notebook Authored By: <b>Anidhya Bhatnagar</b>
### Email: anidhya@gmail.com

#Introduction


*   Python was not built for Scientific Computing.
*   It does not have any inbuilt functions for Scientific computing.
*   It utilizes the external libraries typically written in faster languages like C or Fortran.
*   The main libraries used for Scientific Computing in Python are Numpy, Scipy and Matplotlib.

##NumPy (Numerical Python)

*   Efficient data structures for working with arrays are provided by the NumPy library.
*   Numpy is a low level library written in C for high level mathematical functions.
*   It cleverly overcomes the problem of running slowwer algorithms on python by using multidimensional arrays and functions that operates on arrays.
*   With this any algorithm that can be expressed as a function on arrays, allows the alogrithm to run quickly.
*   NumPy is part of the SciPy project, and is released as a separate library so pople who only need the basic requirements can use it without intalling the rest of SciPy.
*   NumPy is compatible Python versions 2.4 trough 2.7.2 and 3.1+.

##SciPy (Scientific Python)
*   SciPy is a library that uses Numpy for mathematical functions.
*   SciPy uses NumPy arrays as the basic data structure, and comes with modules for various commonly used tasks in scientific programming, including
    *   Linear Algebra
    *   Integration (Calculus)
    *   Ordinary Differential Equation Solving
    *   Signal Processing

##Matplotlib


*   Matplotlib is a flexible plotting library for creating interactive 2D and 3D plots that can also be saved as manuscript-quality figures.
*   The API in many ways reflects that of MATLAB, easing transition of MATLAB users to Python.
*   Many examples, along with the source code to recreate them, are available in the matplotlib gallery.
---

# Basics of NumPy
*   The NumPy library provides data structures for representing a rich variety of arrays and methods and functions for operating on such arrays.
*   NumPy provides the numerical backend for nearly every scientific or technical library for Python.
*   It is very important part of the scientific ecosystem in Python.

> *NumPy array bear some resemblance to Python's list data structure.*

*   The important difference is that the Python lists are generic containers of objects, NumPy arrays are homogeneous and typed arrays of fixed size.
*   Homogeneous means that all elements in the array have the same data type.
*   Fixed size means that an array cannot be resized (without creating a new array).
*   For these and some other reasons operations and functions acting on NumPy arrays can be much more efficient than those using Python lists.
*   NumPy also provides a large collection of basic operators
and functions that act on these data structures, as well as submodules with higher-level algorithms such as linear algebra and fast Fourier transform.


---



*In this section we will see basic NumPy data structure for arrays and various methods to create such NumPy arrays*

In [2]:
import numpy as np

## The NumPy Array Object
*   The main data structure for multidimensional arrays in NumPy is the `ndarray` class.
*   In addition to the data stored in the array, this data structure also contains important metadata about the array, such as its shape, size, data type, and other attributes.

### Basic attributes of `ndarray` class are:

Attribute | Description
--- | --
Shape | A tuple that contains the number of elements (i.e. the length) for each dimension (axis) of the array.
Size | The total number of elements in the array.
Ndim | Number of dimensions (axes)
nbytes | Number of bytes used to store the data.
dtype | The data type of the elements in the array.




Now lets look at some examples which demonstrate the above attributes.

First let's initialize two arrays and check their types.

In [3]:
simple_array = np.array([1, 2, 3, 4, 5])
type(simple_array)

numpy.ndarray

In [4]:
two_dim_array = np.array([[1, 2], [3, 4], [5, 6]])
type(two_dim_array)

numpy.ndarray

Now lets print the values of the arrays declared.

In [5]:
simple_array

array([1, 2, 3, 4, 5])

In [6]:
two_dim_array

array([[1, 2],
       [3, 4],
       [5, 6]])

In [7]:
four_dim_array = np.array([
                           [
                            [[1, 2], [2, 3]], 
                            [[2, 3], [3, 4]]
                           ], 
                           [
                            [[5, 6], [6, 7]],
                            [[7, 8], [8, 9]]
                           ]
                          ])

`ndim` gives the number of dimensions of the ndarray.

In [8]:
simple_array.ndim

1

In [9]:
two_dim_array.ndim

2

In [10]:
four_dim_array.ndim

4

`shape` gives the shape of the array.

In [11]:
simple_array.shape

(5,)

In [12]:
two_dim_array.shape

(3, 2)

In [13]:
four_dim_array.shape

(2, 2, 2, 2)

`size` gives you the number of elements in the array.

In [14]:
simple_array.size

5

In [15]:
two_dim_array.size

6

In [16]:
four_dim_array.size

16

`dtype` gives the data type of the array

In [17]:
simple_array.dtype

dtype('int64')

In [18]:
two_dim_array.dtype

dtype('int64')

In [19]:
four_dim_array.dtype

dtype('int64')

`nbytes` gives the number of bytes occupied by the array.

In [20]:
simple_array.nbytes

40

In [21]:
two_dim_array.nbytes

48

In [22]:
four_dim_array.nbytes

128



---



# Data Types in NumPy
For scientific work, the most important data types are

*   `int` for integers
*   `float` for floating point numbers
*   `complex` for complex floating point numbers

*The following table lists the basic numerical data types available in NumPy.*

dtype | Variants | Description
--- | --- | ---
int | int8, int16, int32, int64 | Integers
uint | uint8, uint16, uint32, uint64 | Unsigned (non-negative) Integers
bool | Bool | Boolean Value (True or False)
float | float16, float32, float64, float128 | Floating-point Numbers
complex | complex64, complex128, complex256 | Complex-valued floating-point Numbers

*Lets see some examples how can you use dtype attribute to generate arrays of diffrent dtype.*



In [23]:
int_array = np.array([1, 2, 3], dtype=np.int)
int_array.dtype

dtype('int64')

In [24]:
float_array = np.array([1, 2, 3], dtype=np.float)
float_array

array([1., 2., 3.])

In [25]:
float_array.dtype

dtype('float64')

In [26]:
complex_array = np.array([1, 2, 3], dtype=np.complex)
complex_array

array([1.+0.j, 2.+0.j, 3.+0.j])

In [27]:
complex_array.dtype

dtype('complex128')

# Data Type Conversion

*   Once an NumPy array is created, you cannot change its type.
*   To change the type, you can type cast or use astype and create a new copy of the array with new data type.

*Lets see an example for both the approaches.*
First let's see how to use type casting.

In [28]:
int_array = np.array([1, 2, 3])
int_array.dtype

dtype('int64')

In [29]:
float_array = np.array(int_array, dtype=np.float)
float_array.dtype

dtype('float64')

Now let's see how you can use `astype`.

In [30]:
int_array.dtype

dtype('int64')

In [31]:
converted_array = int_array.astype(np.float)
converted_array

array([1., 2., 3.])

In [32]:
converted_array.dtype

dtype('float64')



---



# Complex Numbers

Regardless the value of `dtype` attribute, all NumPy array instances have the attributes `real` and `imag` for extracting the real and imaginary parts of array, respectively. 

In [46]:
simple_array = np.array([1, 2, 3])
simple_array

array([1, 2, 3])

In [47]:
simple_array.real

array([1, 2, 3])

In [48]:
simple_array.imag

array([0, 0, 0])

In [49]:
complex_array = np.array([1, 2. + 6.j, 3], dtype=complex)
complex_array

array([1.+0.j, 2.+6.j, 3.+0.j])

In [51]:
complex_array.real

array([1., 2., 3.])

In [50]:
complex_array.imag

array([0., 6., 0.])



---



# Data Generation in NumPy

Let's create some NumPy arrays using NumPy functions for generating arrays.



In [94]:
np.zeros((3, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [90]:
np.ones((2, 3))

array([[1., 1., 1.],
       [1., 1., 1.]])

In [91]:
5 * np.ones((2, 3))

array([[5., 5., 5.],
       [5., 5., 5.]])

In [93]:
np.arange(1, 10).reshape(3, 3)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [86]:
np.diag(arr)

array([1, 5, 9])

In [87]:
np.diag(arr, 1)

array([2, 6])

In [88]:
np.diag(arr, -1)

array([4, 8])

In [92]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [107]:
np.linspace(0, 100, num=10)

array([  0.        ,  11.11111111,  22.22222222,  33.33333333,
        44.44444444,  55.55555556,  66.66666667,  77.77777778,
        88.88888889, 100.        ])

In [110]:
np.logspace(0, 100, num=10)

array([1.00000000e+000, 1.29154967e+011, 1.66810054e+022, 2.15443469e+033,
       2.78255940e+044, 3.59381366e+055, 4.64158883e+066, 5.99484250e+077,
       7.74263683e+088, 1.00000000e+100])

## Now lets see how to generate some data

In [111]:
np.random.random()

0.23620761233920506

In [114]:
np.random.random((2, 3))

array([[0.47978453, 0.01325121, 0.41059453],
       [0.02296356, 0.89098975, 0.33665389]])

In [115]:
np.random.randn(2, 3)

array([[-0.0632862 , -0.86981947, -1.31234289],
       [-1.15187453,  0.06676025,  0.14312978]])

Lets generate a big array of 10000 random numbers and calculate the Mean.

In [120]:
r = np.random.randn(10000)
r.mean()

0.012831166823727029

We can also calculate the variance using `var` function.

In [121]:
r.var()

0.9853850935250346

You can calculate the standard deviation using the function `std`

In [122]:
r.std()

0.9926656504206411

Now lets talk about a common scenario where you have to perform operations on matrices. First let create a matrix of size 10000 x 3 with random values.

In [124]:
r = np.random.randn(10000, 3)
r.shape

(10000, 3)

In [125]:
r.mean(axis=0)

array([-0.00435392,  0.01537047, -0.00886807])

In [127]:
r.mean(axis=1)

(10000,)

When you are working with Data in Machine Learning it is typically organized so that row is a sample or an observation and each column is a specific measurement. 

So in our case we have 10,000 observation and three measurements per observation.

In our 10000 x 3 matrix we can say each observation in a row is a vector. So when you have vectors the analog of variance is covariance.

This leads us to a function `cov` in NumPy.

In [129]:
np.cov(r)

(10000, 10000)

Here, the `cov` function by default treat each column as a vector observation *by the way this is not the convention in the rest of the NumPy stack.*

We can fix this by using the Transpose. 

In [131]:
np.cov(r.T)

(3, 3)

Another way to do this is to set argument `rowvar` as `False`.

In [132]:
np.cov(r, rowvar=False)

array([[ 1.00659868, -0.01199551,  0.0127825 ],
       [-0.01199551,  0.96802723,  0.0042557 ],
       [ 0.0127825 ,  0.0042557 ,  0.99599477]])

## Generating arrays with random Integer values.

In [134]:
np.random.randint(1, 20, size=(3, 3))

array([[ 5, 15,  9],
       [14, 11,  2],
       [18, 18,  2]])

Another useful function is `choice` function which randomly select items from one dimensional input array.

In [138]:
np.random.choice(10, size=(3, 3))

array([[4, 2, 5],
       [8, 0, 5],
       [6, 7, 8]])



---

