<a href="https://colab.research.google.com/github/Desmondonam/DS_Python/blob/main/working_with_numpy_arrays.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Here is a practcal guide on how to make use of numpy and how it is used.

NumPy is a powerful library in Python for numerical computing, and it provides support for creating and manipulating arrays and matrices. NumPy arrays, known as ndarrays, are the fundamental data structures in NumPy, and they allow you to perform efficient numerical operations on large datasets.

In [2]:
## importingnumpy to use in the notebook
import numpy as np # t

### 2. creating a numpy array

In [3]:
###  you can create it from a python list like this
my_list = [1, 2, 3, 4, 5]
arr = np.array(my_list)

In [5]:
arr

array([1, 2, 3, 4, 5])

In [4]:
## Usinf built in functions
zeros_arr = np.zeros(5)       # Creates an array of zeros with 5 elements
ones_arr = np.ones((3, 4))    # Creates a 3x4 array of ones
random_arr = np.random.rand(3, 2)  # Creates a 3x2 array with random values between 0 and 1

In [6]:
random_arr

array([[0.93017334, 0.63532656],
       [0.42242742, 0.66022189],
       [0.04095534, 0.49453441]])

In [7]:
## Array attributes
arr.shape       # Tuple representing the dimensions of the array (rows, columns)
arr.ndim        # Number of array dimensions (1 for a 1D array, 2 for a 2D array, etc.)
arr.size        # Total number of elements in the array
arr.dtype       # Data type of the array elements

dtype('int64')

In [14]:
## Array indexing and slicing
arr[0]            # Access the first element of the array
arr[1:4]          # Slice elements from index 1 to 3 (exclusive)
arr[::2]          # Select every other element


array([1, 3, 5])

In [15]:
## Array operations
# We can perfom mathematical operation on arrays elements
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

arr_sum = arr1 + arr2      # Element-wise addition
arr_product = arr1 * arr2  # Element-wise multiplication
arr_squared = arr1 ** 2    # Element-wise exponentiation

In [16]:
arr_sum

array([5, 7, 9])

In [17]:
arr_product

array([ 4, 10, 18])

In [18]:
arr_squared

array([1, 4, 9])

In [19]:
## Broadcasting
## Numpy allows operation on different arrays through broadcasting
arr = np.array([[1, 2], [3, 4]])
scalar = 2

result = arr + scalar
# Output:
# array([[3, 4],
#        [5, 6]])

In [20]:
result

array([[3, 4],
       [5, 6]])

In [21]:
## Array aggregation
# you can compute statistics andaggregate along axes
arr.mean()         # Compute the mean of all elements
arr.sum(axis=0)    # Sum elements along axis 0 (columns)
arr.min(axis=1)    # Find the minimum value along axis 1 (rows)
arr.max()          # Find the maximum value in the entire array

4

In [22]:
## redahping
#by using the reshapemethod you can reshape an array
arr = np.array([1, 2, 3, 4, 5, 6])
reshaped_arr = arr.reshape(2, 3)
# Output:
# array([[1, 2, 3],
#        [4, 5, 6]])

In [23]:
reshaped_arr

array([[1, 2, 3],
       [4, 5, 6]])

In [24]:
## array concatination and splitting
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6]])

np.concatenate((arr1, arr2), axis=0)
# Output:
# array([[1, 2],
#        [3, 4],
#        [5, 6]])

array([[1, 2],
       [3, 4],
       [5, 6]])

In [25]:
## Array filtering
# you cna use boolean to filter array
arr = np.array([1, 2, 3, 4, 5])
condition = arr > 2

filtered_arr = arr[condition]
# Output: array([3, 4, 5])

In [26]:
filtered_arr

array([3, 4, 5])

NumPy provides many more functionalities for advanced numerical computing, such as matrix operations, linear algebra, and Fourier transforms. It is widely used in scientific and data analysis applications due to its performance and ease of use. For more detailed information, you can refer to the official [NumPy documentation](https://numpy.org/doc/stable/)

# Practical Use with sample data

Let's explore more on how NumPy is used in Python, and we'll use Python's built-in datasets to demonstrate some common NumPy operations.

First, make sure you have NumPy installed:

In [27]:
!pip install numpy



### 1. Loading Data into NumPy Arrays:
We can use NumPy to load data from Python's built-in datasets and convert them into NumPy arrays.

In [28]:
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
data = iris.data
target = iris.target

# Convert the data to NumPy arrays
import numpy as np
data_np = np.array(data)
target_np = np.array(target)

### 2. Array Arithmetic Operations:
NumPy allows us to perform element-wise arithmetic operations on arrays.

In [29]:
import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Element-wise addition
result_add = a + b
# Output: array([5, 7, 9])

# Element-wise multiplication
result_mult = a * b
# Output: array([4, 10, 18])

In [30]:
result_add

array([5, 7, 9])

In [31]:
result_mult

array([ 4, 10, 18])

### 3. Array Indexing and Slicing:
We can use NumPy to perform indexing and slicing operations on arrays.

In [32]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

# Indexing
print(arr[0])    # Output: 1
print(arr[-1])   # Output: 5

# Slicing
print(arr[1:4])  # Output: array([2, 3, 4])

1
5
[2 3 4]


### 4. Array Aggregation:
NumPy allows us to perform aggregation operations on arrays.

In [33]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

# Calculate the mean of the array
mean_value = np.mean(arr)
# Output: 3.0

# Calculate the sum of the array
sum_value = np.sum(arr)
# Output: 15

In [34]:
mean_value

3.0

In [35]:
sum_value

15

### 5. Array Reshaping:
We can use NumPy to reshape arrays.

In [36]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])

# Reshape the array to a 2x3 matrix
reshaped_arr = arr.reshape(2, 3)
# Output: array([[1, 2, 3],
#                [4, 5, 6]])

In [37]:
reshaped_arr

array([[1, 2, 3],
       [4, 5, 6]])

### 6. Broadcasting:
NumPy supports broadcasting, which allows us to perform operations on arrays with different shapes.

In [38]:
import numpy as np

arr = np.array([[1, 2], [3, 4]])
scalar = 2

result = arr + scalar
# Output: array([[3, 4],
#                [5, 6]])

In [39]:
result

array([[3, 4],
       [5, 6]])

### 7. Logical Operations and Filtering:
We can use NumPy for logical operations and filtering.

In [40]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

# Boolean indexing to filter elements
condition = arr > 2
filtered_arr = arr[condition]
# Output: array([3, 4, 5])

# By Use of Iris Dataset

In NumPy, there are various data manipulation techniques that you can apply to the Iris dataset. We'll cover some of the common data manipulation operations using NumPy on the Iris dataset.

### 1. Loading the Iris Dataset:

In [41]:
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
data = iris.data
target = iris.target

### 2. Accessing basic information
import numpy as np


In [42]:
import numpy as np

# Convert the data and target to NumPy arrays
data_np = np.array(data)
target_np = np.array(target)

# Get the shape and data type of the arrays
print("Shape of data array:", data_np.shape)
print("Data type of data array:", data_np.dtype)
print("Shape of target array:", target_np.shape)
print("Data type of target array:", target_np.dtype)

Shape of data array: (150, 4)
Data type of data array: float64
Shape of target array: (150,)
Data type of target array: int64


### 3. Filtering and selecting data

In [43]:
import numpy as np

# Filter rows where the target is 0 (Setosa)
setosa_data = data_np[target_np == 0]

# Select the first two features (columns) for Setosa flowers
setosa_features = setosa_data[:, :2]

### 4. Aggregating and summary statistics


In [44]:
import numpy as np

# Calculate mean, median, and standard deviation of sepal length
sepal_length = data_np[:, 0]
mean_sepal_length = np.mean(sepal_length)
median_sepal_length = np.median(sepal_length)
std_sepal_length = np.std(sepal_length)

print("Mean sepal length:", mean_sepal_length)
print("Median sepal length:", median_sepal_length)
print("Standard deviation of sepal length:", std_sepal_length)

Mean sepal length: 5.843333333333334
Median sepal length: 5.8
Standard deviation of sepal length: 0.8253012917851409


### 5. Data Concatination


In [45]:
import numpy as np

# Concatenate data_np and target_np horizontally
concatenated_data = np.concatenate((data_np, target_np.reshape(-1, 1)), axis=1)

### 6. Transposing the data

In [46]:
import numpy as np

# Transpose the data array
transposed_data = data_np.T

In [47]:
transposed_data

array([[5.1, 4.9, 4.7, 4.6, 5. , 5.4, 4.6, 5. , 4.4, 4.9, 5.4, 4.8, 4.8,
        4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5. ,
        5. , 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5. , 5.5, 4.9, 4.4,
        5.1, 5. , 4.5, 4.4, 5. , 5.1, 4.8, 5.1, 4.6, 5.3, 5. , 7. , 6.4,
        6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2, 5. , 5.9, 6. , 6.1, 5.6,
        6.7, 5.6, 5.8, 6.2, 5.6, 5.9, 6.1, 6.3, 6.1, 6.4, 6.6, 6.8, 6.7,
        6. , 5.7, 5.5, 5.5, 5.8, 6. , 5.4, 6. , 6.7, 6.3, 5.6, 5.5, 5.5,
        6.1, 5.8, 5. , 5.6, 5.7, 5.7, 6.2, 5.1, 5.7, 6.3, 5.8, 7.1, 6.3,
        6.5, 7.6, 4.9, 7.3, 6.7, 7.2, 6.5, 6.4, 6.8, 5.7, 5.8, 6.4, 6.5,
        7.7, 7.7, 6. , 6.9, 5.6, 7.7, 6.3, 6.7, 7.2, 6.2, 6.1, 6.4, 7.2,
        7.4, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, 6.4, 6. , 6.9, 6.7, 6.9, 5.8,
        6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9],
       [3.5, 3. , 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4, 3. ,
        3. , 4. , 4.4, 3.9, 3.5, 3.8, 3.8, 3.4, 3.7, 3.6, 3.3, 3.4, 3. ,
       

### 7. reshaping the data

In [48]:
import numpy as np

# Reshape data array to a 2D matrix with 5 rows and 30 columns
reshaped_data = data_np.reshape(5, 30)

ValueError: ignored

### 8. Sorting the data

In [49]:

import numpy as np

# Sort data_np along the first column (sepal length)
sorted_data = data_np[data_np[:, 0].argsort()]

In [50]:
sorted_data

array([[4.3, 3. , 1.1, 0.1],
       [4.4, 3.2, 1.3, 0.2],
       [4.4, 3. , 1.3, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.5, 2.3, 1.3, 0.3],
       [4.6, 3.6, 1. , 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [4.6, 3.4, 1.4, 0.3],
       [4.6, 3.2, 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.8, 3. , 1.4, 0.3],
       [4.8, 3.4, 1.9, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [4.9, 2.4, 3.3, 1. ],
       [4.9, 2.5, 4.5, 1.7],
       [4.9, 3.1, 1.5, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [4.9, 3.6, 1.4, 0.1],
       [4.9, 3. , 1.4, 0.2],
       [5. , 3.5, 1.3, 0.3],
       [5. , 3.4, 1.6, 0.4],
       [5. , 3.3, 1.4, 0.2],
       [5. , 3.2, 1.2, 0.2],
       [5. , 3.5, 1.6, 0.6],
       [5. , 2. , 3.5, 1. ],
       [5. , 3.4, 1.5, 0.2],
       [5. , 2.3, 3.3, 1. ],
       [5. , 3.6, 1.4, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5.1, 3.8, 1.9, 0.4],
       [5.1, 3.8, 1.6, 0.2],
       [5.1, 2

### 9. Removing duplicates

In [51]:
import numpy as np

# Create a sample array with duplicate rows
sample_array = np.array([[1, 2, 3],
                         [4, 5, 6],
                         [1, 2, 3]])

# Remove duplicate rows
unique_array = np.unique(sample_array, axis=0)

In [52]:
unique_array

array([[1, 2, 3],
       [4, 5, 6]])