### **NumPy**

NumPy is a powerful library for numerical computing in Python, and it offers several advantages over Python's built-in lists or arrays for scientific and mathematical tasks.

**Why use NumPy?**

1. Performance and Efficiency
    - Faster Computations: NumPy arrays (ndarrays) are implemented in C and use contiguous memory allocation, which leads to faster operations compared to Python lists.
    - Memory Efficiency: NumPy arrays use less memory than Python lists because they store elements of the same data type and have a more compact memory layout.

#### Task

We have populated a simple code that highlights the advantages of NumPy over Python lists for numerical computations is the calculation of the mean (average) of a large dataset.
Run the code in the IDE and notice the following!

Time taken using the List method is significantly higher versus time taken using the NumPy method

In [1]:
import random
import time
import numpy as np

# Method 1: Using Lists
# Generate a large list of random numbers
data_list = [random.random() for _ in range(1000000)]

# Calculate the mean using Python lists
start = time.time()
mean_list = sum(data_list) / len(data_list)
end = time.time()
print("Mean (list):", mean_list)
print("Time taken (list):", end - start)

# Method 2: Using NumPy
# Generate a large NumPy array of random numbers
data_array = np.random.random(1000000)

# Calculate the mean using NumPy
start = time.time()
mean_array = np.mean(data_array)
end = time.time()
print("Mean (NumPy):", mean_array)
print("Time taken (NumPy):", end - start)

Mean (list): 0.49950573609749827
Time taken (list): 0.011039972305297852
Mean (NumPy): 0.5001312564278734
Time taken (NumPy): 0.0021440982818603516


2. Mathematical and Statistical Functions
    - Built-in Mathematical Functions: NumPy provides a wide range of mathematical functions for operations such as trigonometric functions, logarithms, and exponentials.
    - Statistical Functions: Functions like mean, median, standard deviation, variance, etc., are available out-of-the-box for efficient computation.

#### Task
We have populated a simple code to compute the mean and standard deviation with and without NumPy.
Run the code in the IDE and notice the following!

The inbuilt statistical functions such as mean and std make the code a lot easier.

In [2]:
import math
import random
import numpy as np

# Method 1: Using regular Python
# Generate a large list of random numbers
data_list = [random.random() for _ in range(1000)]

# Calculate the mean and standard deviation
mean_list = sum(data_list) / len(data_list)
variance_list = sum((x - mean_list) ** 2 for x in data_list) / len(data_list)
std_dev_list = math.sqrt(variance_list)
print("Mean (list):", mean_list)
print("Standard Deviation (list):", std_dev_list)

# Method 2: Using NumPy
# Generate a large NumPy array of random numbers
data_array = np.random.random(1000)

# Calculate the mean
mean_array = np.mean(data_array)
std_dev_array = np.std(data_array)
print("Mean (NumPy):", mean_array)
print("Standard Deviation (NumPy):", std_dev_array)


Mean (list): 0.5075377113332489
Standard Deviation (list): 0.2901329667888275
Mean (NumPy): 0.504493419252264
Standard Deviation (NumPy): 0.28555863483380456


3. Array Manipulation and Operations
    - Reshaping Arrays: NumPy allows you to easily reshape arrays, perform transpositions, and manage multidimensional data.
    - Broadcasting: NumPy supports broadcasting, which allows operations on arrays of different shapes in a way that is both intuitive and efficient.

#### Task
We have populated a simple code to reshape arrays and perform array operations with and without NumPy.
Run the code in the IDE and notice the following!

In Python lists, we need to use nested loops to achieve the same results, which can be cumbersome and less efficient.

In [3]:
import numpy as np

# Method 1: Using Python Lists
# Create a 1D list
data_list = [1, 2, 3, 4, 5, 6]

# Reshape into a 2x3 "matrix"
data_matrix = [data_list[i:i + 3] for i in range(0, len(data_list), 3)]
print("Reshaped list (2x3):")
print(data_matrix)

# Transpose the "matrix"
transposed_matrix = [[data_matrix[j][i] for j in range(len(data_matrix))] for i in range(len(data_matrix[0]))]
print("Transposed list (3x2):")
print(transposed_matrix)

# Method 2: Using NumPy
# Create a 1D NumPy array
data_array = np.array([1, 2, 3, 4, 5, 6])

# Reshape into a 2x3 matrix
reshaped_array = data_array.reshape((2, 3))
print("Reshaped array (2x3):")
print(reshaped_array)

# Transpose the matrix
transposed_array = reshaped_array.T
print("Transposed array (3x2):")
print(transposed_array)


Reshaped list (2x3):
[[1, 2, 3], [4, 5, 6]]
Transposed list (3x2):
[[1, 4], [2, 5], [3, 6]]
Reshaped array (2x3):
[[1 2 3]
 [4 5 6]]
Transposed array (3x2):
[[1 4]
 [2 5]
 [3 6]]


#### Why use Numpy
To summarise, NumPy is used for the following main reasons

1. Performance and Efficiency
2. Mathematical and Statistical Functions
3. Array Manipulation and Operations
4. Integration with Other Libraries
5. Handling Large Datasets

As a result, NumPy finds applications in

1. Machine Learning
2. Data Science
3. Image and Signal Processing and other fields

### NumPy
NumPy (short for "Numerical Python") is a library for the Python programming language. We import NumPy in Python using the import statement.

In [4]:
import numpy as np

- The code above imports the NumPy library with an alias np
- If we dont import - then every NumPy functional call needs to have the term numpy

In [5]:
import numpy as np

# Create a 1D NumPy array
data_array = np.array([1, 2, 3, 4, 5, 6])

print(data_array)

[1 2 3 4 5 6]


#### NumPy array using a Python List
We can create a NumPy array by passing a Python list as the value.

In [6]:
# create a list named num
num = [2, 4, 6, 8]

# create numpy array using num
array1 = np.array(num)
array1

array([2, 4, 6, 8])

#### Task
Create a Numpy array containing the first 6 whole numbers and output the same to the console.

In [7]:
# Update your code below this line
import numpy as np
num = [0, 1, 2, 3, 4, 5]
array1 = np.array(num)
print(array1)

[0 1 2 3 4 5]


### NumPy array using randint
We can also create a NumPy array containing 'n' random integers using the np.random.randint function.

In [9]:
# Create a 1D array of 5 random integers between 0 and 10
random_array = np.random.randint(0, 10, size=5)
random_array

array([1, 1, 8, 0, 8], dtype=int32)

np.random.randint(low, high, size): Generates random integers from low (inclusive) to high (exclusive).

- low: The lowest integer to be drawn from the distribution (inclusive).
- high: The highest integer to be drawn from the distribution (exclusive).
- size: The shape of the output array. In this case, size=5 means a 1D array with 5 elements.

Note that the following syntax creates an array of 5 random numbers - not necessarily integers

In [10]:
array1 = np.random.rand(5)
array1

array([0.2392105 , 0.01875225, 0.84951234, 0.5153592 , 0.77030685])

### Task

In [11]:
import numpy as np

# generate an array of 5 random numbers
array1 = np.random.rand(5)
print(array1)

# generate an array of 5 random integers between 0 and 10
random_array = np.random.randint(0, 10, size=5)
print(random_array)


[0.73234566 0.95439108 0.13219009 0.53905332 0.56978744]
[7 8 2 5 6]


### NumPy empty array
We can create an empty NumPy array using the empty() function.

In [12]:
num = np.empty(5)
num

array([0.73234566, 0.95439108, 0.13219009, 0.53905332, 0.56978744])

The above code creates an empty array of length 5.

Note that using the empty() function generates values in the array that are not set to any particular value but are just whatever values were already present in the allocated memory.

Similar functions are the zeros() and ones() functions

In [13]:
num0 = np.zeros(4)
print(num0)

num1 = np.ones(3)
print(num1)

[0. 0. 0. 0.]
[1. 1. 1.]


num0 will have 4 elements all initialised to 0. num1 will have 3 elements all initialised to 1.

In [14]:
import numpy as np

# create an array with 4 elements filled with zeros
num0 = np.zeros(4)

# create an array with 3 elements filled with ones
num1 = np.ones(4)

# create an empty array of length 5
num = np.empty(4)

print(num0)
print(num1)
print(num)

[0. 0. 0. 0.]
[1. 1. 1. 1.]
[0.00000000e+000 1.04133248e-311 1.04133248e-311 1.04133248e-311]


### NumPy array using arrange()
The np.arange() function in NumPy is used to create an array of evenly spaced values within a given range. It is similar to the built-in Python range() function but returns a NumPy array.

``` numpy.arange(start, stop, step, dtype=None)```

- start (optional): The starting value of the sequence. The default is 0.
- stop: The end value of the sequence. The sequence does not include this value.
- step (optional): The spacing between values. The default is 1.
- dtype (optional): The desired data type of the array.

In [16]:
# Basic Usage
arr = np.arange(10)
print(arr)      # Output: [0 1 2 3 4 5 6 7 8 9] since default start is 0 and default step is 1
# Create an array from 1 to 9 with a step of 2.
arr = np.arange(1, 11, 2)
print(arr)      # Output: [1 3 5 7 9] - note that 11 is not a part of this output
# Create an array from 0 to 5 with float data type.
arr = np.arange(0, 6, dtype=float)
print(arr)      # Output: [0. 1. 2. 3. 4. 5.] - note that 6 is not a part of this output

[0 1 2 3 4 5 6 7 8 9]
[1 3 5 7 9]
[0. 1. 2. 3. 4. 5.]


#### Task
Output to the console a Numpy array with all the even numbers starting from 2 and ending at 10. Their datatype needs to be float.

In [17]:
import numpy as np 
num1 = np.arange(2, 11, 2, dtype=float)
print(num1)

[ 2.  4.  6.  8. 10.]


#### Common Numpy datatypes
NumPy provides us built-in data types to efficiently work with numerical data.

- Integer
- Float
- Complex
- Boolean
- String
- Object

### Integer datatype in Numpy
NumPy supports various integer types such as int8, int16, int32, and int64.

In [18]:
arr_int = np.array([1, 2, 3], dtype=np.int32)
print(arr_int)    # Output: [1 2 3]

[1 2 3]


The difference between int8, int16, int32, and int64 in NumPy lies in the amount of memory each data type uses to store integers and the range of values they can represent.

- Memory Usage: The primary difference is the amount of memory each type uses. int8 uses the least memory (1 byte), while int64 uses the most (8 bytes). This affects the performance and memory footprint of your NumPy arrays, especially when working with large datasets.
- Each type can represent a different range of integers. int8 can represent values from -128 to 127, which is much narrower than the range for int64.

In [19]:
import numpy as np

arr_int = np.array([200, 201, 202], dtype=np.int16)
print(arr_int)

[200 201 202]


### Float and complex datatype in Numpy
NumPy supports float16, float32, and float64.

In [20]:
arr_float = np.array([1.0, 2.5, 3.8], dtype=np.float64)
print(arr_float)      # Output: [1.  2.5 3.8]

[1.  2.5 3.8]


NumPy supports complex numbers with complex64 and complex128.

In [21]:
arr_complex = np.array([1+2j, 3+4j], dtype=np.complex128)
print(arr_complex)    # Output: 1.+2.j 3.+4.j]

[1.+2.j 3.+4.j]


#### Task
- Create and print a NumPy array with 5 floating-point numbers of type float32 as per the sample output given below. [0.1 0.2 0.3 0.4 0.5]
- Create a NumPy array with 3 complex numbers of type complex64 as per the sample output given below. [1.+1.j 2.+2.j 3.+3.j]

In [23]:
arr_float = np.array([0.1, 0.2, 0.3, 0.4, 0.5], dtype=np.float32)
print(arr_float)

arr_complex = np.array([1.+1.j, 2.+2.j, 3.+3.j], dtype=np.complex128)
print(arr_complex)

[0.1 0.2 0.3 0.4 0.5]
[1.+1.j 2.+2.j 3.+3.j]


### Boolean datatype in Numpy
NumPy arrays with Boolean datatype are created using the following syntax

In [24]:
arr_bool = np.array([True, False, True], dtype=np.bool_)
print(arr_bool)     # Output: [True False True]

[ True False  True]


Notice '_' in the syntax np.bool_

In [25]:
import numpy as np

# Create an array of integers
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Create a boolean mask for even numbers
even_mask = arr % 2==0

# Print the result
print(even_mask)

[False  True False  True False  True False  True False  True]


### Strings and Object datatypes in Numpy
NumPy supports fixed-length strings with S followed by the number of characters.

In [26]:
arr_str = np.array(['a', 'bc', 'def'], dtype='S3')
print(arr_str)     # Output - [b'a' b'bc' b'def']

[b'a' b'bc' b'def']


- Byte strings in NumPy are denoted by a leading b

Alternatively, you can use the str dtype to let NumPy determine the appropriate length of the string

In [27]:
import numpy as np

# Use dtype=str to allow variable-length strings
arr_exercise_str = np.array(['apple', 'banana', 'cherry'], dtype=str)
print(arr_exercise_str)    # Output - ['apple' 'banana' 'cherry']

['apple' 'banana' 'cherry']


- NumPy also supports generic Python objects with datatype object.

- This flexibility means that the array can store mixed types and complex data structures that are not possible with standard NumPy dtypes.

- For our current level - we wont be using this datatype, so dont worry about it.

In [28]:
arr_obj = np.array([1, 'a', 3.5], dtype=object)
print(arr_obj)

[1 'a' 3.5]


#### Task

check the difference when strings are declared using the datatype 'S5' versus 'str'

In [29]:
import numpy as np

# Using S5 datatype
arr_exercise_str = np.array(['apple', 'banana', 'cherry'], dtype='S5')
print(arr_exercise_str)

# Use S6 to avoid truncation
arr_exercise_str = np.array(['apple', 'banana', 'cherry'], dtype='S6')
print(arr_exercise_str)

# Use str to auto detect the string length
arr_exercise_str = np.array(['apple', 'banana', 'cherry'], dtype='str')
print(arr_exercise_str)


[b'apple' b'banan' b'cherr']
[b'apple' b'banana' b'cherry']
['apple' 'banana' 'cherry']


#### Datatype conversion in NumPy
The astype method in NumPy is a powerful and flexible way to convert arrays from one data type to another.

Here is an example of converting an integer array to a string array:

In [30]:
# Create an array of integers
int_array = np.array([1, 2, 3, 4, 5])

# Convert the integer array to a string array
str_array = int_array.astype(str)

print(int_array)     # Output - [1 2 3 4 5]
print(str_array)     # Output - ['1' '2' '3' '4' '5']

[1 2 3 4 5]
['1' '2' '3' '4' '5']


Here is another example of converting an integer array to a float array:

In [31]:
# create an array of integers
int_array = np.array([1, 2, 3, 4])

# convert data type of int_array to float
float_array = int_array.astype('float')

# print the arrays and their data types
print(int_array)     # Output - [1 2 3 4] 
print(float_array)   # Output - [1. 2. 3. 4.]

[1 2 3 4]
[1. 2. 3. 4.]


In [32]:
# Debug the code given below

import numpy as np

# Create an array of integers
int_array = np.array([10, 20, 30, 40, 50])

# Correct conversion to string array
str_array = int_array.astype('str')

# Print the integer array
print("Integer array:", int_array)

# Print the string array
print("String array:", str_array)


Integer array: [10 20 30 40 50]
String array: ['10' '20' '30' '40' '50']
