<a href="https://colab.research.google.com/github/Rohan-cherkar/DS-with-Python/blob/init/DS_PY_Lab/23-24/.ipynb_checkpoints/Experiment_01_numpy-checkpoint.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Experiment No :** 01

**Aim :** Learn basics of Numpy library for storing and efficiently processing any external data into python execution pipeline.

**Theory :**  **NumPy** (short for Numerical Python) provides an efficient interface to store and operate on dense data buffers. In some ways, *NumPy arrays* are like Python's built-in *list type*, but NumPy arrays provide much more *efficient storage* and *data operations* as the arrays grow larger in size. NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python

NumPy is a Python library used for working with arrays. It also has functions for working in domain of linear algebra, fourier transform, and matrices. NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely.

In Python we have lists that serve the purpose of arrays, but they are slow to process. NumPy aims to provide an array object that is up to 50x faster than traditional Python lists. The array object in NumPy is called *ndarray*, it provides a lot of supporting functions that make working with *ndarray* very easy.
Arrays are very frequently used in data science, where speed and resources are very important.

NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and manipulate them very efficiently. This behavior is called locality of reference in computer science. This is the main reason why NumPy is faster than lists. Also it is optimized to work with latest CPU architectures.



**Working :**

In [None]:
import numpy
numpy.__version__

'1.26.4'

**A Python List Is More Than Just a List**

Let's consider now what happens when we use a Python data structure that holds many Python objects. The standard mutable multi-element container in Python is the list. We can create a list of integers as follows:

In [18]:
i=3
L = list(range(8+i))
L

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [None]:
type(L[0])

int

In [None]:
L2 = [str(c) for c in L]
L2

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

In [None]:
print(L2[1])
type(L2[1])

1


str

In [None]:
L3 = [True, "2", 3.0, 4]
# for item in L3:
#  print(type(item) )
[type(item) for item in L3]

[bool, str, float, int]

## Fixed-Type Arrays in Python

Python offers several different options for storing data in efficient, fixed-type data buffers.
The built-in ``array`` module (available since Python 3.3) can be used to create dense arrays of a uniform type:

In [None]:
import array as arr
L = list(range(10))
A = arr.array("d",L)
print(A)
[type(item) for item in A] # it will show float

array('d', [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0])


[float, float, float, float, float, float, float, float, float, float]


## Creating Arrays from Python Lists

First, we can use ``np.array`` to create arrays from Python lists:

In [None]:
import numpy as np
# integer array:
L=np.array([1, 4, "2", 5, 3])
print(type(L[2]))
# print(type(L[1]))


<class 'numpy.str_'>
<class 'numpy.str_'>


Remember that unlike Python lists, NumPy is constrained to arrays that all contain the same type. If types do not match, NumPy will upcast if possible (here, integers are up-cast to floating point):

In [None]:
A=np.array([3.14, 4, 2, 3])
type(A[1])

numpy.float64

If we want to explicitly set the data type of the resulting array, we can use the ``dtype`` keyword:

In [None]:
L=np.array([1, 2, 3, 4], dtype='int')
type(L[0])

numpy.int64

Finally, unlike Python lists, NumPy arrays can explicitly be multi-dimensional; here's one way of initializing a multidimensional array using a list of lists:

In [32]:
#this is extra code
#  L= 4
# P=[]
# for i in range(L):
#   P.append(i)
# print(type(P))
for i in range(1,15,2):
  print(i)

1
3
5
7
9
11
13


In [35]:
import numpy as np
# nested lists result in multi-dimensional arrays
np.array([range(i, i+4) for i in [2, 4, 6]])

array([[2, 3, 4, 5],
       [4, 5, 6, 7],
       [6, 7, 8, 9]])

## Creating Arrays from Scratch

Especially for larger arrays, it is more efficient to create arrays from scratch using routines built into NumPy.
Here are several examples:

In [41]:
# Create a length-10 integer array filled with zeros
np.ones(10, dtype=int)

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [49]:
# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=str)

array([['1', '1', '1', '1', '1'],
       ['1', '1', '1', '1', '1'],
       ['1', '1', '1', '1', '1']], dtype='<U1')

In [51]:
i=3
j=5
# Create a 3x5 array filled with 3.14
np.full((i, j), 3, dtype=float)

array([[3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3.],
       [3., 3., 3., 3., 3.]])

In [57]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 5)

array([ 0,  5, 10, 15])

In [59]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [78]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

array([[0.59212179, 0.96256861, 0.76945622],
       [0.34117002, 0.32912445, 0.36310792],
       [0.07580666, 0.75054515, 0.26893642]])

In [79]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))

array([[ 0.12645298, -2.39761519, -0.40132324],
       [-0.71885714, -1.85186399,  0.0545185 ],
       [ 2.69510547,  1.4297511 ,  1.63355059]])

In [101]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0,10, (3,3))

array([[2, 1, 0],
       [6, 4, 1],
       [3, 9, 8]])

In [82]:
# Create a 3x3 identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [103]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory location
np.empty(3)

array([1., 1., 1.])

## NumPy Standard Data Types

NumPy arrays contain values of a single type, so it is important to have detailed knowledge of those types and their limitations.
Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other related languages.

The standard NumPy data types are listed in the following table.
Note that when constructing an array, they can be specified using a string:

```python
np.zeros(10, dtype='int16')
```

Or using the associated NumPy object:

```python
np.zeros(10, dtype=np.int16)
```
Following table shows all datatypes for Numpy Array

| Data type	    | Description |
|---------------|-------------|
| ``bool_``     | Boolean (True or False) stored as a byte |
| ``int_``      | Default integer type (same as C ``long``; normally either ``int64`` or ``int32``)|
| ``intc``      | Identical to C ``int`` (normally ``int32`` or ``int64``)|
| ``intp``      | Integer used for indexing (same as C ``ssize_t``; normally either ``int32`` or ``int64``)|
| ``int8``      | Byte (-128 to 127)|
| ``int16``     | Integer (-32768 to 32767)|
| ``int32``     | Integer (-2147483648 to 2147483647)|
| ``int64``     | Integer (-9223372036854775808 to 9223372036854775807)|
| ``uint8``     | Unsigned integer (0 to 255)|
| ``uint16``    | Unsigned integer (0 to 65535)|
| ``uint32``    | Unsigned integer (0 to 4294967295)|
| ``uint64``    | Unsigned integer (0 to 18446744073709551615)|
| ``float_``    | Shorthand for ``float64``.|
| ``float16``   | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa|
| ``float32``   | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa|
| ``float64``   | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa|
| ``complex_``  | Shorthand for ``complex128``.|
| ``complex64`` | Complex number, represented by two 32-bit floats|
| ``complex128``| Complex number, represented by two 64-bit floats|

## NumPy Array Attributes

We will learn about important Attributes with NumPy Array objects

Each array object has attributes ``ndim`` (the number of dimensions), ``shape`` (the size of each dimension), and ``size`` (the total size of the array), the ``dtype``, the data type of the array :

In [105]:
#Consider following sample arrays
np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

In [112]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)
print("dtype:", x3.dtype)
print(x3)
# print("x2 ndim",x2.ndim)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60
dtype: int64
[[[8 1 5 9 8]
  [9 4 3 0 3]
  [5 0 2 3 8]
  [1 3 3 3 7]]

 [[0 1 9 9 0]
  [4 7 3 2 7]
  [2 0 0 4 5]
  [5 6 8 4 1]]

 [[4 9 8 1 1]
  [7 9 9 3 6]
  [7 2 0 3 5]
  [9 4 4 6 4]]]
x2 ndim 2


Other attributes include ``itemsize``, which lists the size (in bytes) of each array element, and ``nbytes``, which lists the total size (in bytes) of the array:

In [108]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

itemsize: 8 bytes
nbytes: 480 bytes


## Array Indexing: Accessing Single Elements
Next we learn how to access single element in a NumPy array
NumPy follows indexing similar to that of Python in a dimension index starts at 0 till length-1

So x1[0] will mean 0th element and x1[5] means sixth element in array x1.

We can use negative index value to indicate accessing elements from back side of array.

In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices as shown in below code cell.

In [111]:
# acessing third list's first element
x2[2, 1]

6

In [119]:
# accessing second last element from second list
x2[1, -2]

8

In [127]:
# modifying value at a perticular index
x2[1, 2] = 10
x2

array([[12,  5,  2,  4],
       [10,  6, 10,  8],
       [12,  6,  7,  7]])

## Array Slicing: Accessing Subarrays

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the *slice* notation, marked by the colon (``:``) character.
The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array ``x``, use this:
``` python
x[start:stop:step]
```
If any of these are unspecified, they default to the values ``start=0``, ``stop=``*``size of dimension``*, ``step=1``.
We'll take a look at accessing sub-arrays in one dimension and in multiple dimensions.

### One-dimensional subarrays

In [138]:
x = np.arange(15)
x

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [141]:
x[:5]  # first five elements

array([0, 1, 2, 3, 4])

In [140]:
x[5:]  # elements after index five

array([ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [134]:
x[4:7]  # sub-array of index 4, 5, 6

array([4, 5, 6])

In [139]:
x[::2]  # every other element

array([ 0,  2,  4,  6,  8, 10, 12, 14])

In [144]:
x[2::2]  # every other element starting at index 2

array([ 2,  4,  6,  8, 10, 12, 14])

In [147]:
x[2:11:2]

array([ 2,  4,  6,  8, 10])

A potentially confusing case is when the ``step`` value is negative.
In this case, the defaults for ``start`` and ``stop`` are swapped.
This becomes a convenient way to reverse an array:

In [148]:
x[::-1]  # all elements, reversed

array([14, 13, 12, 11, 10,  9,  8,  7,  6,  5,  4,  3,  2,  1,  0])

In [149]:
x[3::-2]  # reversed every other from index 3

array([3, 1])

### Multi-dimensional subarrays

Multi-dimensional slices work in the same way, with multiple slices separated by commas.
For example:

In [None]:
x2[:2, :3]  # This is sub Array of x2 with first two rows and first three columns

In [None]:
#Check how we can reverse the multidimension array
x2[::-1, ::-1]

#### Accessing array rows and columns

One commonly needed routine is accessing of single rows or columns of an array.
This can be done by combining indexing and slicing, using an empty slice marked by a single colon (``:``):

In [None]:
print(x2[:, 0])  # first column of x2

In [None]:
print(x2[0, :])  # first row of x2

### Subarrays as no-copy views

One important–and extremely useful–thing to know about array slices is that they return *views* rather than *copies* of the array data.
This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies.
Consider our two-dimensional array from before:

In [None]:
#Extract 2*2 sub array from x2
x2_sub = x2[:2, :2]
print(x2_sub)

In [None]:
x2_sub[0, 0] = 99
#Above statement not oly modifies subarray but also the original array as well
print(x2_sub)

Despite the nice features of array views, it is sometimes useful to instead explicitly copy the data within an array or a subarray. This can be most easily done with the ``copy()`` method:

In [None]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy
print(x2)

## Array Concatenation and Splitting

All of the preceding routines worked on single arrays. It's also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays. We'll take a look at those operations here.


### Concatenation of arrays

Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines ``np.concatenate``, ``np.vstack``, and ``np.hstack``.
``np.concatenate`` takes a tuple or list of arrays as its first argument, as we can see here:

In [None]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

In [None]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [None]:
# concatenate along the first axis
np.concatenate([grid, grid])

In [None]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

When joining arrays of mixed dimensions, it can be clearer to use the ``np.vstack`` (vertical stack) and ``np.hstack`` (horizontal stack) functions as shown below:

In [None]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

In [None]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

### Splitting of arrays

The opposite of concatenation is splitting, which is implemented by the functions ``np.split``, ``np.hsplit``, and ``np.vsplit``.  For each of these, we can pass a list of indices giving the split points:

In [None]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

Notice that *N* split-points, leads to *N + 1* subarrays.
The related functions ``np.hsplit`` and ``np.vsplit`` are similar:

In [None]:
grid = np.arange(16).reshape((4, 4))
grid

In [None]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

In [None]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

## Introducing UFuncs

For many types of operations, NumPy provides a convenient interface into just this kind of statically typed, compiled routine. This is known as a *vectorized* operation.
This can be accomplished by simply performing an operation on the array, which will then be applied to each element.
This vectorized approach is designed to push the loop into the compiled layer that underlies NumPy, leading to much faster execution.

In [None]:
#Consider following loop based implementation to ind reciprocals for each element of an array
np.random.seed(0)

def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output

big_array = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)

In [None]:
#Same operatoin using UFuncs applying '/' over array elements
%timeit (1.0 / big_array)

Vectorized operations in NumPy are implemented via *ufuncs*, whose main purpose is to quickly execute repeated operations on values in NumPy arrays. UFuncs are extremely flexible – before we saw an operation between a scalar and an array, but we can also operate between two arrays as well as multidimentional arrays.

Computations using vectorization through ufuncs are nearly always more efficient than their counterpart implemented using Python loops, especially as the arrays grow in size.
Any time you see such a loop in a Python script, you should consider whether it can be replaced with a vectorized expression.

## Exploring NumPy's UFuncs

Ufuncs exist in two flavors: *unary ufuncs*, which operate on a single input, and *binary ufuncs*, which operate on two inputs.


### Array arithmetic

NumPy's ufuncs feel very natural to use because they make use of Python's native arithmetic operators.
The standard addition, subtraction, multiplication, and division can all be used:

In [None]:
x = np.arange(4)
print("x     =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2)  # floor division
#some advanced arithmatic expression   -(x/2+1)^2
-(0.5*x + 1) ** 2

The following table lists the arithmetic operators implemented in NumPy:

| Operator	    | Equivalent ufunc    | Description                           |
|---------------|---------------------|---------------------------------------|
|``+``          |``np.add``           |Addition (e.g., ``1 + 1 = 2``)         |
|``-``          |``np.subtract``      |Subtraction (e.g., ``3 - 2 = 1``)      |
|``-``          |``np.negative``      |Unary negation (e.g., ``-2``)          |
|``*``          |``np.multiply``      |Multiplication (e.g., ``2 * 3 = 6``)   |
|``/``          |``np.divide``        |Division (e.g., ``3 / 2 = 1.5``)       |
|``//``         |``np.floor_divide``  |Floor division (e.g., ``3 // 2 = 1``)  |
|``**``         |``np.power``         |Exponentiation (e.g., ``2 ** 3 = 8``)  |
|``%``          |``np.mod``           |Modulus/remainder (e.g., ``9 % 4 = 1``)|

### Specialized ufuncs

NumPy has many more ufuncs available, including hyperbolic trig functions, bitwise arithmetic, comparison operators, conversions from radians to degrees, rounding and remainders, and much more.
A look through the NumPy documentation reveals a lot of interesting functionality.

Another excellent source for more specialized and obscure ufuncs is the submodule ``scipy.special``.
If you want to compute some obscure mathematical function on your data, chances are it is implemented in ``scipy.special``.
There are far too many functions to list them all, but the following snippet shows a couple that might come up in a statistics context:

In [None]:
#importing package special from scipy package
from scipy import special

In [None]:
# Gamma functions (generalized factorials) and related functions
x = [1, 5, 10]
print("gamma(x)     =", special.gamma(x))
print("ln|gamma(x)| =", special.gammaln(x))
print("beta(x, 2)   =", special.beta(x, 2))

Many other special functions like error functions, beta integral can also be evaluated.

### Aggregates

For binary ufuncs, there are some interesting aggregates that can be computed directly from the object.
For example, if we'd like to *reduce* an array with a particular operation, we can use the ``reduce`` method of any ufunc.
A reduce repeatedly applies a given operation to the elements of an array until only a single result remains.

For example, calling ``reduce`` on the ``add`` ufunc returns the sum of all elements in the array:

In [None]:
x = np.arange(1, 6)
np.add.reduce(x)

In [None]:
np.multiply.reduce(x)

In [None]:
#note the difference in output with accumulate
np.add.accumulate(x)

**Answer Following Questions :** (You should either type the answer in separate text cell or code cell as per questoins asked)
          


1.   What are UFuncs in numpy?
2.   Which are various attributes of numpy arrays object?
3.   If you have 3 dimentional array in numpy object Obj how to identify its size, type and dimentions ?
4.   Consider Obj object has dimentions 3*4*2  and you want to convert it to 4*6 shape exlpain how will you do it ? Also create code cell and demonstrate this using an example random int np array of 3*4*2 shape between 1 and 50.
5.   Consider above 4*6 array, sample 2*3 sub array from left  bottom of this array store the result in varible subObj variable.

**Conclusion :**  Thus we have learned basics of Numpy library for storing and efficiently processing any external data into python execution pipeline.  