# Introduction to NumPy  - MLWithAP

![image.png](attachment:image.png)

![image.png](attachment:image.png)

## Need for Numpy  ??

# Data comes in various forms - Video, Audio, Pictures, Texts , CSV , Database files etc.

![image-3.png](attachment:image-3.png)


#### Everything needs to be represented in form of numbers 


#### Computer understands only bits (0 and 1 ) but our machine language models will work only with numerical data - It has to be either float or Int type. everything needs to get converted to float/int. 


#### There are ways to convert pictures , audio files into numerical values. Vectors and Matrices.  Numpy is handy for Numerical processing of data. 

#### Text data is also converted in numerical data. Various encoding techniques are utilized.  

#### Categorical data is converted into numerical data 

#### Image data - bitmap (bmp ) jpeg ... Every pixel is denoted by the intensity of light on that pixel. [0-255] This is for grayscale image [black and white ] . For color image , we have 3 channels for RGB - Red Green Blue pixel . RGB is primary colors and every color is made from these 3 primary colors.

#### Video is nothing but addition of pictures and audio. Number of pictures ( frames ) per second overlaid with sound data . 

#### Audio files - various packages available to convert sound files into numerical time series arrays.



![image.png](attachment:image.png)


### Numpy official definition

NumPy (Numerical Python) is an **open source Python library** that’s used in almost every field of science and engineering. It’s the universal standard for working with numerical data in Python, and it’s at the core of the scientific Python and PyData ecosystems. NumPy users include everyone from beginning coders to experienced researchers doing state-of-the-art scientific and industrial research and development. **The NumPy API is used extensively in Pandas, SciPy, Matplotlib, scikit-learn, scikit-image and most other data science and scientific Python packages.**

The NumPy library contains multidimensional array and matrix data structures (you’ll find more information about this in later sections). It provides ndarray, a homogeneous n-dimensional array object, with methods to efficiently operate on it. NumPy can be used to perform a wide variety of mathematical operations on arrays. It adds powerful data structures to Python that guarantee efficient calculations with arrays and matrices and it supplies an enormous library of high-level mathematical functions that operate on these arrays and matrices.

# Why Numpy when we have Python lists ??

**homogeneous vs heterogeneous**

Python was not designed for heavy numerical processing. It was sort of **multi purpose high level language** which was simpler to learn and use.

People started using it for machine learning use case. This necessitated the use of Python libraries specially built for numerical computations.  Since ML and DL deals with large amount of data. These packages should be fast as well.


# How numpy arrays are faster than Python List


#  5 reasons

1. Vectorization: NumPy uses vectorization to perform mathematical operations on arrays, which means that it can perform operations on multiple elements of an array at once. This significantly reduces the number of loops required to perform operations and improves the performance.

2. C implementation: The core functionality of NumPy is implemented in C, which is a much faster language than Python. NumPy provides an interface to these C functions, making them accessible from Python.

3. Memory efficiency: NumPy uses contiguous blocks of memory for storing arrays, which makes it more memory-efficient than Python lists. This also improves the performance of NumPy as it can take advantage of hardware caching.

4. Broadcasting: NumPy has a broadcasting feature that allows it to perform operations on arrays of different sizes and shapes. This reduces the need for creating new arrays, which can improve performance.

5. Pre-compiled routines: NumPy provides a set of pre-compiled routines for common mathematical operations, which are optimized for performance.

#### Vectorization 

In [20]:
#non-vectorized ; looping using list
a = [1, 2, 3]
b = [4, 5, 6]
result = []
for i in range(len(a)):
    result.append(a[i] + b[i])
result

[5, 7, 9]

In [21]:
#vectorized using ndarray
import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = a + b

result

array([5, 7, 9])

#### Broadcasting

In [24]:
import numpy as np

a = np.array([1, 2, 3])
#b = 2
b = np.array([2, 2, 2])

result = a * b
print(result)


[2 4 6]


# Check the version !

In [20]:
import numpy
numpy.__version__



'1.24.2'

# Installation 

In [2]:
!pip install numpy



In [21]:
import timeit
import numpy as np

mylist = [i for i in range(100000)]
myarray = np.array([i for i in range(100000)])

In [22]:
type(mylist)

list

In [23]:
myarray

array([    0,     1,     2, ..., 99997, 99998, 99999])

In [24]:
type(myarray)

numpy.ndarray

In [25]:

%%timeit
mynewlist = [i*2 for i in mylist]

3.95 ms ± 20.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [26]:
%%timeit
mynewarray = myarray * 2

29.3 µs ± 26.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [27]:
# Creating it now. timeit doesnt create

mynewlist = [i*2 for i in mylist]

mynewarray = myarray * 2


In [28]:
type(mynewlist)

list

In [29]:
type(mynewarray)

numpy.ndarray

### Lets compare if mynewlist and mynewarray are actually same ?

In [30]:
mynewarrayfromlist = np.array(mynewlist)

In [31]:
type(mynewarrayfromlist)

numpy.ndarray

In [32]:
mynewarray == mynewarrayfromlist

array([ True,  True,  True, ...,  True,  True,  True])

In [33]:
# Test whether all elements are bool True

(mynewarray == mynewarrayfromlist).all()


True

#### Numpy is faster than list because 

1) It uses C compiled methods and functions for computationally heavy work 
   Why C is faster - It is a static langauge - Not referenced. 
   The way data is stored is more efficient.

2)  Construct avoids for loops -  array * 2 rather than multiplying every element with 2 in a for loop or list comprehension.  

3)  Numpy ndarrays are  C compatible data structure rather than list which is not efficient.


Nothing comes for free :

Trade of of flexibility and performance between Python and C . Dynamic typing offers flexibility but C comiled libraries offers performance. 