# NumPy

GETTING FAMILIAR WITH NumPy :
 -

1. INSTALLING NumPY :

  If you haven't installed NumPy yet, you can do so using pip :
   - pip install numpy


2. IMPORTING NumPy :
   
To start using NumPy, we'll need to import it into our Python script or Jupyter Notebook. It's common to import NumPy with the alias np:

In [8]:
import numpy as np


3. CREATING ARRAYS :

NumPy arrays are the central data structure of the library. We can create arrays in several ways:

In [9]:
# FROM A LIST
array_from_list = np.array([1, 2, 3, 4, 5])
print(array_from_list)

# FROM A TUPLE
array_from_tuple = np.array((1, 2, 3, 4, 5))
print(array_from_tuple)


[1 2 3 4 5]
[1 2 3 4 5]


Creating arrays with specific values:
 

In [38]:
#zeroes array :
zeros_array = np.zeros((3, 3))  # 3x3 array of zeros
print(f"zeroes array : {zeros_array}")

#ones array :
ones_array = np.ones((2, 4))  # 2x4 array of ones
print(f"ones array : {ones_array}")

# Array with range of values :
range_array = np.arange(10)  # Array with values from 0 to 9
print(f"range array : {range_array}")

# Array with evenly spaced values:
spaced_array = np.linspace(0, 1, 5)  # 5 values evenly spaced between 0 and 1
print(f"evenly spaced array : {spaced_array}")

zeroes array : [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
ones array : [[1. 1. 1. 1.]
 [1. 1. 1. 1.]]
range array : [0 1 2 3 4 5 6 7 8 9]
evenly spaced array : [0.   0.25 0.5  0.75 1.  ]


4. BASIC OPERATIONS :

- Basic operations include arithmetic operations which we will learn in further topic "Data Manipulation" as 'Mathematical Operations'
- COMPARISON OPERATIONS :

In [17]:
# Element-wise Comparisons :
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
# Greater than comparison
greater_than = array1 > array2  # [False, False, False]

# Less than or equal comparison
less_than_equal = array1 <= 2  # [True, True, False]

print(greater_than)
print(less_than_equal)
# Array Equality :
equality = array1 == array2  # [False, False, False]
print(equality)


[False False False]
[ True  True False]
[False False False]


- MATRIX OPERATIONS :

NumPy also allows you to perform matrix operations on 2D arrays (matrices).

a. Matrix Multiplication : Perform matrix multiplication (dot product) between two matrices.

b. Transpose : 
Transpose a matrix (swap rows and columns).

c. Inverse : 
Calculate the inverse of a square matrix.

In [15]:
# matrix multiplication
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

matrix_product = np.dot(matrix1, matrix2)  # [[19, 22], [43, 50]]
print(f"multiplication: {matrix_product}")

#transpose
matrix_transpose = np.transpose(matrix1)  # [[1, 3], [2, 4]]
print(f"transpose matrix: {matrix_transpose}")

#inverse
matrix_inverse = np.linalg.inv(matrix1)  # [[-2. ,  1. ], [ 1.5, -0.5]]
print(f"inverse matrix: {matrix_inverse}")



multiplication: [[19 22]
 [43 50]]
transpose matrix: [[1 3]
 [2 4]]
inverse matrix: [[-2.   1. ]
 [ 1.5 -0.5]]


- BROADCASTING :
   
Broadcasting allows NumPy to perform element-wise operations on arrays of different shapes.

a. Broadcasting with Scalars : 
We can add a scalar to an array, and the scalar will be broadcasted across the entire array.

b. Broadcasting with Arrays of Different Shapes : 
We can perform operations on arrays of different shapes if one array can be broadcast to the shape of the other.

In [16]:
#broadcasting with scalars
broadcast_add = array1 + 10  # [11, 12, 13]
print(broadcast_add)

#broadcasting with arrays of different shapes
array3 = np.array([[1], [2], [3]])
broadcast_multiplication = array1 * array3  # [[1, 2, 3], [2, 4, 6], [3, 6, 9]]
print(broadcast_multiplication)


[11 12 13]
[[1 2 3]
 [2 4 6]
 [3 6 9]]


5. ARRAY PROPERITIES :

Understanding the properties of NumPy arrays is crucial for effectively working with them. Here are some of the key properties and attributes of NumPy arrays:

1. Array Shape : 
The shape attribute of a NumPy array returns a tuple representing the dimensions of the array. This is useful for understanding the structure of the array.
2. Array Dimensions (ndim) : 
The ndim attribute returns the number of dimensions (axes) of the array.
3. Array Size : 
The size attribute returns the total number of elements in the array.
4. Array Data Type (dtype) : 
The dtype attribute returns the data type of the elements in the array. This is important for understanding how the data is stored and processed.
5. Array Item Size : 
The itemsize attribute returns the size (in bytes) of each element in the array.
6. Array Number of Bytes : 
The nbytes attribute returns the total number of bytes consumed by the array.
7. Array Transpose : 
The T attribute provides the transpose of the array (swapping rows with columns for 2D arrays).
8. Array Flat Iterator : 
The flat attribute returns an iterator that can be used to iterate over the array as if it were a flat, one-dimensional array.
9. Array Copy vs View : 
When we modify a slice or a part of an array, NumPy often creates a view of the original array rather than a copy. To explicitly create a copy, we can use the copy() method.
10. Changing Data Type (astype) : 
We can change the data type of an array using the astype() method.
11. Checking for NaN or Infinite Values : 
We can check for NaN (Not a Number) or infinite values in an array.


Understanding these properties will allow us to manipulate and analyze NumPy arrays more effectively. These attributes and methods provide valuable insights into the structure and behavior of our arrays, enabling more efficient computations and data management.

In [21]:
#shape :
array = np.array([[1, 2, 3], [4, 5, 6]])
print(f"array shape : {array.shape}")  # Output: (2, 3)
#dimensions :
print(f"dimensions : {array.ndim}")  # Output: 2
#size :
print(f"size : {array.size}")  # Output: 6
#data type :
print(f"data type : {array.dtype}")  # Output: int64 (or another integer type depending on your system)
#item size :
print(f"item size : {array.dtype}")  # Output: int64 (or another integer type depending on your system)
#no.of bytes :
print(f"no.of bytes : {array.nbytes}")  # Output: 48 (6 elements * 8 bytes per element)
#transpose :
print(f"transpose : {array.T}") # Output; [[1, 4],[2, 5],[3, 6]]
#flat iterator :
print("flat iterator : ")
for element in array.flat:
    print(element) # Output: 1 2 3 4 5 6
#copy vs view :
copy_array = array.copy()  # Creates a copy of the array
view_array = array[0, :]   # Creates a view (not a copy)
view_array[0] = 100        # Modifies the original array
print(f"copy and view :{array}")               # The original array is modified if a view was used
#changing datatype : 
float_array = array.astype(float)
print(float_array)
print("datatype changed to :{float_array.dtype}")  # Output: float64
#checking for NaN or infinite values :
nan_array = np.array([1, np.nan, 3])
is_nan = np.isnan(nan_array)  # [False  True  False]
is_inf = np.isinf(nan_array)  # [False  False  False]
print(f"NaN : {is_nan}")
print(f"infinite : {is_inf}")

array shape : (2, 3)
dimensions : 2
size : 6
data type : int64
item size : int64
no.of bytes : 48
transpose : [[1 4]
 [2 5]
 [3 6]]
flat iterator : 
1
2
3
4
5
6
copy and view :[[100   2   3]
 [  4   5   6]]
[[100.   2.   3.]
 [  4.   5.   6.]]
datatype changed to :{float_array.dtype}
NaN : [False  True False]
infinite : [False False False]


 DATA MANIPULATION :
  -

- CREATION OF NumPy ARRAY :

A. Creating Arrays with Ranges of Values : 
We can create arrays with ranges of values using the following functions:

1. Using arange() : 
Creates an array with a range of values, similar to Python's range() function but returns an array.
2. Using linspace() : 
Creates an array of evenly spaced values between a start and an end value.

B. Creating Random Arrays : 
NumPy offers functions to create arrays with random values:
    
1. Random Values Between 0 and 1 : 
Creates an array of random values between 0 and 1.
2. Random Values from a Standard Normal Distribution : 
Creates an array of random values sampled from a standard normal distribution (mean 0, variance 1).
3. Random Integers : 
Creates an array of random integers within a specified range.

C. Creating Arrays from Existing Arrays : 
You can also create new arrays by copying or transforming existing ones:

1. Copying an Array : 
Creates a new array that is a copy of an existing array.
2. Reshaping an Array : 
Changes the shape of an existing array without changing its data.

D. Creating Arrays with Special Values : 
NumPy also provides ways to create arrays with special values or patterns:

1. Diagonal Arrays : 
Creates a diagonal array with specified values on the diagonal.

E. Creating Arrays Using Meshgrid : 
Creates coordinate matrices from coordinate vectors using np.meshgrid().

These methods cover the most common ways to create arrays in NumPy. The versatility of array creation in NumPy is one of the reasons it's so widely used in scientific computing and data analysis.

In [23]:
#using arange :
range_array = np.arange(0, 10, 2)  # Array of values from 0 to 8 with a step of 2
print(f"range array : {range_array}")
#using linspace():
linspace_array = np.linspace(0, 1, 5)  # 5 values evenly spaced between 0 and 1
print(f"linspace array: {linspace_array}")
#USING RANDOM VALUES :
# 1.btw o nd 1:
random_array = np.random.rand(3, 2)  # 3x2 array of random values between 0 and 1
print(f"rndm array : {random_array}")
# 2.from a standard normal distribution :
random_normal_array = np.random.randn(3, 3)  # 3x3 array of random values from a normal distribution
print(f"rndm nrml array : {random_normal_array}")
#random integers:
random_int_array = np.random.randint(1, 10, (3, 3))  # 3x3 array of random integers between 1 and 9
print(f"rndm int array :{random_int_array}")
#copying an array:
original_array = np.array([1, 2, 3])
copied_array = np.copy(original_array)
print(f"copied array : {copied_array}")
#reshaping array :
reshaped_array = np.array([1, 2, 3, 4, 5, 6]).reshape((2, 3))  # Reshape to a 2x3 array
print(f"reshaped array : {reshaped_array}")
#diagonal arrays :
diagonal_array = np.diag([1, 2, 3, 4])  # Diagonal array with the values 1, 2, 3, 4
print(f"diagonal array : {diagonal_array}")
#array using meshgrid :
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
X, Y = np.meshgrid(x, y)  # Creates grid coordinates based on x and y
print("array using meshgrid : ")
print(X)
print(Y)


range array : [0 2 4 6 8]
linspace array: [0.   0.25 0.5  0.75 1.  ]
rndm array : [[0.12782485 0.01830082]
 [0.63610326 0.11218235]
 [0.63424339 0.61329081]]
rndm nrml array : [[-1.00106145 -0.06375061  1.36543151]
 [ 0.60890251  0.20214452  1.24264718]
 [-0.96055855  0.03076896  0.35333376]]
rndm int array :[[2 3 7]
 [1 9 2]
 [4 9 5]]
copied array : [1 2 3]
reshaped array : [[1 2 3]
 [4 5 6]]
diagonal array : [[1 0 0 0]
 [0 2 0 0]
 [0 0 3 0]
 [0 0 0 4]]
array using meshgrid : 
[[1 2 3]
 [1 2 3]
 [1 2 3]]
[[4 4 4]
 [5 5 5]
 [6 6 6]]


-     ARRAY SLICING : Arrays can be sliced using the ':' operator to access a subset of the array.
Syntax : array[start:stop:step]

In [25]:
slice_array = array[0, :2]  # Accessing the first row, first two elements
print(slice_array)  # Output: [1 2]

[100   2]


-     ARRAY INDEXING :  NumPy arrays support various forms of indexing to access elements :

In [26]:
# BASIC INDEXING :
element = array[0, 1]  # Accessing the element at row 0, column 1
print(f"through basic indexing: {element}")  # Output: 2
# BOOLEAN INDEXING :
bool_index = array[array > 3]  # Select elements greater than 3
print(f"through boolean indexing: {bool_index}")  # Output: [4 5 6]
# FANCY INDEXING :
fancy_index = array[[0, 1], [1, 2]]  # Selecting elements at (0,1) and (1,2)
print(f"through fancy indexing: {fancy_index}")  # Output: [2 6]

through basic indexing: 2
through boolean indexing: [100   4   5   6]
through fancy indexing: [2 6]


-     MATHEMATICAL OPERATIONS :
  NumPy supports element-wise arithmetic operations, which means we can perform operations on arrays directly, and they will be applied to each element of the array.

In [28]:
#1 # ADDITION :
     # We can add two arrays together, or add a scalar to an array.
# Element-wise addition of two arrays
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
sum_array = array1 + array2  # [5, 7, 9]
# Adding a scalar to an array
scalar_addition = array1 + 10  # [11, 12, 13]
print(f"element wise addition : {sum_array}")
print(f"adding a scalar : {scalar_addition}")
#2 # SUBTRACTION :
     #Subtract one array from another, or subtract a scalar from an array.
difference_array = array1 - array2  # [-3, -3, -3]
scalar_subtraction = array1 - 10    # [-9, -8, -7]
print(f"difference : {difference_array}")
print(f"subtarcting a scalar : {scalar_subtraction}")
#3 # MULTIPLICATION :
     #Multiply arrays element-wise, or multiply an array by a scalar.
product_array = array1 * array2  # [4, 10, 18]
scalar_multiplication = array1 * 2  # [2, 4, 6]
print(F"Product : {product_array}")
print(f"scalar multiplied : {scalar_multiplication}")
#4 # DIVISION :
     #Divide arrays element-wise, or divide an array by a scalar.
division_array = array1 / array2  # [0.25, 0.4, 0.5]
scalar_division = array1 / 2      # [0.5, 1.0, 1.5]
print(f"division : {division_array}")
print(f"scalar division :{scalar_division}")
#5 # EXPONENTIATION :
     #Raise each element of an array to the power of another array or a scalar.
exponent_array = array1 ** 2  # [1, 4, 9]
array_exponentiation = array1 ** array2  # [1, 32, 729]
print(f"exponent array : {exponent_array}")
print(f"scalar exponent array : {array_exponentiation}")

#TRIGONOMETRIC FUNCTIONS :
print("tringonometric functions: ")
np.sin(np.pi / 2)  # 1.0
np.cos(np.pi)      # -1.0




element wise addition : [5 7 9]
adding a scalar : [11 12 13]
difference : [-3 -3 -3]
subtarcting a scalar : [-9 -8 -7]
Product : [ 4 10 18]
scalar multiplied : [2 4 6]
division : [0.25 0.4  0.5 ]
scalar division :[0.5 1.  1.5]
exponent array : [1 4 9]
scalar exponent array : [  1  32 729]
tringonometric functions: 


np.float64(-1.0)

DATA AGGREGATION : 
 -

In [34]:
my_array = np.array([10, 20, 30, 40, 50])
print(f"mean: {np.mean(my_array)}") # MEAN
print(f"median: {np.median(my_array)}") #MEDIAN
print(f"standard deviation: {np.std(my_array)}")  #STANDARD DEVIATION
print(f"sum: {np.sum(my_array)}")  # SUM
print(f"minimum: { np.min(my_array)  }") # MIN
print(f"maximum: {np.max(my_array) }") # MAX

mean: 30.0
median: 30.0
standard deviation: 14.142135623730951
sum: 150
minimum: 10
maximum: 50


DATA ANALYSIS :
-

1. Correlation: NumPy’s np.corrcoef() efficiently calculates correlations between large datasets.
2. Outliers: NumPy allows quick identification of outliers using standard deviation.
3. Percentiles: Calculating percentiles with np.percentile() is straightforward and efficient.
4. Efficiency: NumPy is highly optimized for numerical operations, making it significantly faster than pure Python, especially with large datasets.
This demonstrates how NumPy excels in handling large datasets, offering powerful tools for statistical analysis and data manipulation.

In [36]:
# Finding Correlations
correlation_matrix = np.corrcoef(data1, data2)
print("Correlation Matrix:")
print(correlation_matrix)

# Identifying Outliers
mean = np.mean(data1)
std_dev = np.std(data1)
outliers = data1[np.abs(data1 - mean) > 3 * std_dev]
print(f"Number of outliers in data1: {outliers.size}")

# Calculating Percentiles
percentile_25 = np.percentile(data1, 25)
percentile_50 = np.percentile(data1, 50)
percentile_75 = np.percentile(data1, 75)
print(f"25th Percentile: {percentile_25}")
print(f"50th Percentile (Median): {percentile_50}")
print(f"75th Percentile: {percentile_75}")

Correlation Matrix:
[[1.         0.89425319]
 [0.89425319 1.        ]]
Number of outliers in data1: 286
25th Percentile: -0.6686416961814368
50th Percentile (Median): 0.0017202220855813645
75th Percentile: 0.6734387006684547


APPLICATION IN DATASCIENCE :
 -

NumPy is an indispensable tool for data science professionals, offering unparalleled performance, functionality, and reliability. Its ability to handle large datasets efficiently makes it a preferred choice in domains requiring intensive numerical computations, such as machine learning, financial analysis, and scientific research. By leveraging NumPy, data scientists can perform complex data manipulations, statistical analyses, and simulations more effectively, allowing them to derive insights and make data-driven decisions with greater confidence and speed.

In real-world applications:

Machine Learning: NumPy is used for data preprocessing, feature scaling, and implementing algorithms.
Financial Analysis: It aids in portfolio optimization and risk management by efficiently analyzing large arrays of financial data.
Scientific Research: NumPy is crucial for running numerical simulations and analyzing large datasets in fields like genomics and physics.
Overall, NumPy's speed, functionality, and reliability make it a cornerstone of data science workflows.