<a href="https://colab.research.google.com/github/SachinScaler/DAV1Aug24/blob/main/Numpy_1%7CLecture.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Why NumPy is Faster Than Python Lists

NumPy arrays are significantly faster than Python lists due to several key factors:

### 1. Homogeneous Data Types:
* **NumPy arrays:** Store elements of the same data type, leading to efficient memory allocation and operations.
* **Python lists:** Can store elements of different data types, requiring more overhead for type checking and memory management.

### 2. Contiguous Memory Allocation:
* **NumPy arrays:** Elements are stored in a continuous block of memory, allowing for faster data access and manipulation.
* **Python lists:** Elements can be scattered across memory, leading to slower access and processing.

### 3. Vectorized Operations:
* **NumPy:** Supports vectorized operations, performing calculations on entire arrays at once, leveraging CPU optimizations.
* **Python lists:** Require element-wise operations using loops, which is generally slower.

### 4. C-Based Implementation:
* **NumPy:** Many core operations are implemented in C, providing significant performance gains compared to Python's interpreted nature.
* **Python lists:** Rely on Python's interpreter for operations, which introduces overhead.

### 5. Optimized for Numerical Computations:
* **NumPy:** Specifically designed for numerical computations, offering a rich set of mathematical functions.
* **Python lists:** Are general-purpose data structures, not optimized for numerical calculations.


Typically, the NumPy array will be significantly faster for this operation.

**In summary,** NumPy's specialized design, efficient memory layout, vectorized operations, and C-based implementation make it a superior choice for numerical computations compared to Python lists.


### Install numpy:

```
!pip install numpy  # in jupyter

pip install numpy # if you are installing via CLI
```

In [None]:
!pip install numpy



### Import numpy with alias as np

In [None]:
import numpy as np

### Convert List to Array

In [None]:
a = [1, 2, 3, 4]

type(a)

list

In [None]:
a_arr = np.array(a)
type(a_arr)

numpy.ndarray

In [None]:
a = [1, 2, 3, "4"]

type(a)

list

In [None]:
a_arr = np.array(a)
type(a_arr)

numpy.ndarray

In [None]:
a_arr

array(['1', '2', '3', '4'], dtype='<U21')

In [None]:
a = [1, 2, 3, "Bhargav"]
# automatic type conversion
a_arr = np.array(a)
a_arr

array(['1', '2', '3', 'Bhargav'], dtype='<U21')

In [None]:
a = [1, 2, 3, 4.0]
# automatic type conversion
a_arr = np.array(a)
a_arr

array([1., 2., 3., 4.])

### Numpy supports element wise Operations:

NOTE: Read about Vectorization of Code

In [None]:
# generate square of below list
list1 = list(range(10000000))
list1 = [i**2 for i in list1]
list1[:5]

[0, 1, 4, 9, 16]

Using numpy

In [None]:
list1**2

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

In [None]:
list1 = list(range(10000000))
list1_arr = np.array(list1)

In [None]:
list1_arr**2 # element wise operations

array([             0,              1,              4, ...,
       99999940000009, 99999960000004, 99999980000001])

In [None]:
import time

start = time.time() # UNIX TIMESTAMP
start

1723861328.0014448

In [None]:
end = time.time() # UNIX TIMESTAMP
end

1723861342.500823

In [None]:
end - start

14.499378204345703

### Speed Comparisons:

Sum All Numbers in list/array

In [None]:
import numpy as np
import time

# Create a large list and NumPy array
list_data = list(range(10000000))
array_data = np.array(list_data)

# Time the sum operation: ListData
start_time = time.time()
sum_list = sum(list_data) # sum of all number
end_time = time.time()
list_time = end_time - start_time

# Time the sum operation: ArrayData
start_time = time.time()
sum_array = np.sum(array_data)
end_time = time.time()
array_time = end_time - start_time

print("List time:", list_time)
print("Array time:", array_time)
print("Speedup:", list_time / array_time)

List time: 0.06828856468200684
Array time: 0.007622957229614258
Speedup: 8.958277296468896


square all numbers of a list/array

In [None]:
import numpy as np
import time

# Create a large list and NumPy array
list_data = list(range(10000000))
array_data = np.array(list_data)

# Time the sqaure operation: ListData
start_time = time.time()
sq_list = [i**2 for i in list_data]
end_time = time.time()
list_time = end_time - start_time

# Time the sqaure operation: Numpy
start_time = time.time()
sq_array = array_data**2
end_time = time.time()
array_time = end_time - start_time

print("List time:", list_time)
print("Array time:", array_time)
print("Speedup:", list_time / array_time)

List time: 3.8075389862060547
Array time: 0.02696681022644043
Speedup: 141.19352471553483


Vectorization: the task is divided into subtasks and each task is performed independently in parallel

In [None]:
# Create a large list and NumPy array
list_data = list(range(10000000))
array_data = np.array(list_data)

## Dimension of an array

In [None]:
array_data.ndim

1

shape of an array

In [None]:
array_data.shape

(10000000,)

In [None]:
arr2 = np.array([[1,2], [3,4], [6,7]])
arr2.ndim

2

In [None]:
arr2.shape # dimension wise shape

(3, 2)

In [None]:
len(arr2)

3

# func(arr2) : you are calling a function and pass arr2
# arr2.func(): you are callling method of arr class using arr2 object




In [None]:
a = np.array([1,2,3,4,5,6,7,8])
print(a.ndim, a.shape)

1 (8,)


### How do i get total number of elements then?

### range function:

In [None]:
# Create a large list and NumPy array
list_data = list(range(10000000))
array_data = np.array(list_data)

### arange
```
arange(start, end, step)
```


In [None]:
array_data = np.arange(10000000)
array_data.shape

(10000000,)

In [None]:
array_data = np.arange(1, 5, 2)
array_data.shape

(2,)

In [None]:
array_data

array([1, 3])

#### arange can also take float as stepsize

In [None]:
list_data = list(range(1,5,0.5))
list_data

TypeError: 'float' object cannot be interpreted as an integer

In [None]:
array_data = np.arange(1, 5, 0.5)
array_data

array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])

In [None]:
array_data = np.arange(1, 5, 2)

array_data ** 2

array([1, 9])

In [None]:
array_data

array([1, 3])

In [None]:
array_data + 2# 1,9 ,2 or 3, 5

array([3, 5])

In [None]:
# Can we overwrite multiple elements as well?

In [None]:
array_data = np.arange(1, 5)
array_data

array([1, 2, 3, 4])

In [None]:
array_data[2:4]

array([3, 4])

In [None]:
array_data[2:4] = -99

In [None]:
array_data

array([  1,   2, -99, -99])

In [None]:
array_data = np.arange(1, 10)
array_data[5:]

array([6, 7, 8, 9])

In [None]:
array_data[5:] + 10

array([16, 17, 18, 19])

In [None]:
array_data[5:] = 10
array_data[5:]

array([10, 10, 10, 10])

In [None]:
array_data

array([ 1,  2,  3,  4,  5, 10, 10, 10, 10])

### Data Conversion

In [58]:
arr4 = np.array([1, 2, 3, 4])
arr4.dtype

dtype('int64')

In [59]:
arr4 = np.array([1, 2, 3, 4.0])
arr4.dtype

dtype('float64')

type coversion while creating array:

In [62]:
arr4 = np.array([1, 2, 3, 4], dtype = "float")
print(arr4)
print(arr4.dtype)

[1. 2. 3. 4.]
float64


In [63]:
arr = np.array(["1", 2, 3, 4], dtype = "int")
print(arr)
print(arr.dtype)

[1 2 3 4]
int64


In [64]:
int("1")

1

In [65]:
arr = np.array(["a", 2, 3, 4], dtype = "int")
print(arr)
print(arr.dtype)

ValueError: invalid literal for int() with base 10: 'a'

In [66]:
arr = np.array(["1", 2, 3, 4])
print(arr)
print(arr.dtype)

['1' '2' '3' '4']
<U21


np.astype: Can be used after array is created

In [67]:
arr.astype('int')

array([1, 2, 3, 4])

#### INDEXING

In [68]:
m1 = np.arange(12)
m1

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [69]:
m1[0]

0

In [70]:
m1[-1]

11

In [71]:
m1_list = list(range(12))
m1_list

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

In [72]:
m1_list[3], m1_list[5]

(3, 5)

In [73]:
m1[3], m1[5]

(3, 5)

Accessing multiple elements with indexing:

create array of all indexes you wnat. access and pass the array as index

In [74]:
m1[[3, 5]]

array([3, 5])

In [75]:
m1[[3, 5, 9]]

array([3, 5, 9])

In [76]:
m1[[3, 5, 3,5]]

array([3, 5, 3, 5])

In [77]:
m1 = np.array([100,200,300,400,500,600])
indexes = [2,3,4,2]
m1[indexes]

array([300, 400, 500, 300])

In [79]:
m1[3, 5, 3, 5]

IndexError: too many indices for array: array is 1-dimensional, but 4 were indexed

Slicing

In [80]:
m1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
#--->


In [81]:
m1[:3]

array([1, 2, 3])

In [82]:
m1[-5:]

array([ 6,  7,  8,  9, 10])

In [83]:
m1[-5:-2]

array([6, 7, 8])

In [84]:
m1[5:2]

array([], dtype=int64)

In [85]:
m1[500:20000]

array([], dtype=int64)

In [86]:
import numpy as np
a = np.array([0,1,2,3,4,5])
a[4:] = 10 # index 4 onwards will be replaced
print(a)

[ 0  1  2  3 10 10]


### Masking (Fancy Indexing):
used for indexing via Boolean Arrays

In [87]:
m1 = np.array([1, 2, 3, 4])

In [94]:
bool_ar  = [True, False, True, True]

In [95]:
m1[[0,2,3]]

array([1, 3, 4])

In [96]:
m1[bool_ar]

array([1, 3, 4])

Extract all people who have height greater then 150 CM

In [97]:
heights = np.array([100, 150, 180, 192])
heights

array([100, 150, 180, 192])

In [98]:
heights > 150

array([False, False,  True,  True])

In [99]:
heights[heights > 150]

array([180, 192])

In [100]:
m1 = np.array([1, 2, 3, 4])
# extract all elements which are even

Quiz

In [101]:
a = np.array([1,2,3,4,5])
b = np.array([8,7,6])
a[2:]

array([3, 4, 5])

In [102]:
b[::-1] # reverse of b

array([6, 7, 8])

In [103]:
a[2:] = b[::-1]
print(a)

[1 2 6 7 8]


In [104]:
a = np.array([1,2,3,4,5])
b = np.array([8,7,6])
a[3:] = b[::-2]
print(a)

[1 2 3 6 8]


USE-CASE

### NPS:
textfile:

In [105]:
url = 'https://drive.google.com/file/d/1c0ClC8SrPwJq5rrkyMKyPn80nyHcFikK/view?usp=sharing'

#### steps:
1) read file into numpy array

2) calculate count of detractors and promoters via Bool Index

3) calculate %

4) Calculate and print NPS

## POSTREAD: Broadcasting in NumPy

**Broadcasting** is a powerful feature in NumPy that allows arithmetic operations on arrays of different shapes. Essentially, NumPy stretches the smaller array to match the shape of the larger array before performing the operation. This is done efficiently without creating unnecessary copies of data.

### Rules for Broadcasting

1. **Shape compatibility:** Arrays must be compatible in shape. This means that each dimension must be either equal or one of them must be 1.
2. **Stretching:** Arrays with smaller dimensions are stretched to match the larger array by repeating elements.
3. **Resulting shape:** The resulting array has the maximum size along each dimension of the input arrays.



In [None]:
#### Example 1: Adding a scalar to an array
import numpy as np

a = np.array([1, 2, 3])
b = 2

# Broadcasting b to match the shape of a
c = a + b  # Output: [3 4 5] i.e [1 + 2, 2 + 2, 3 + 2]
print(c) #Here, the scalar `b` is broadcasted to match the shape of `a`, and then the addition is performed element-wise.

[3 4 5]


In [None]:

#### Example 2: Adding arrays of different shapes
a = np.array([[1, 2], [3, 4]])
b = np.array([10, 20])

# Broadcasting b to match the shape of a
c = a + b  # Output: [[11 22], [13 24]]
print(c) # In this case, `b` is broadcasted to match the shape of `a` by repeating its elements along the rows.


[[11 22]
 [13 24]]


In [None]:
#### Example 3: Multiplying arrays of different shapes
a = np.array([[1, 2], [3, 4]])
b = np.array([[1], [2]])

# Broadcasting b to match the shape of a
c = a * b  # Output: [[1 2], [6 8]]
print(c) # Here, `b` is broadcasted to match the shape of `a` by repeating its columns.

[[1 2]
 [6 8]]



### Why Broadcasting is Useful

* **Efficiency:** Broadcasting often leads to significant performance improvements over explicit loops.
* **Conciseness:** It allows for writing cleaner and more readable code.
* **Flexibility:** It enables operations on arrays with different shapes, providing more flexibility.


By understanding broadcasting, you can write more efficient and expressive NumPy code.

In [None]:
np.arange(12)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [None]:
arr = np.array([-3,4,27,34,-2, 0, -45,-11,4, 0])
print(np.where(arr))


(array([0, 1, 2, 3, 4, 6, 7, 8]),)
