## Datatypes
Every numpy array is a grid of elements of the same type. Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype.

**Why Do We Have So Many Data Types in NumPy?**
- Memory Efficiency: Different data types require different amounts of memory. For example, an int8 uses 1 byte, while an int64 uses 8 bytes. By choosing the appropriate data type, you can save memory, especially when working with large datasets.

- Performance: Operations on smaller data types can be faster because they require less memory bandwidth and can fit more data into CPU caches. This can lead to improved performance in numerical computations.

- Precision: Different data types offer varying levels of precision. For example, float32 has less precision than float64. Depending on the requirements of your calculations, you may need to choose a data type that provides the necessary precision.

Here is an example:

In [1]:
import numpy as np

x = np.array([1, 2])   # Let numpy choose the datatype
print(x.dtype)         # Prints "int64"

x = np.array([1.0, 2.0])   # Let numpy choose the datatype
print(x.dtype)             # Prints "float64"

x = np.array([1, 2], dtype=np.int64)   # Force a particular datatype
print(x.dtype)                         # Prints "int64"

int64
float64
int64


In [2]:
# available data types
import numpy as np

# 1. Integer Types
# Available types: int8, int16, int32, int64
# Create an array of int32
int_array = np.array([1, 2, 3], dtype=np.int32)
print("Integer Array:", int_array)
print("Data Type:", int_array.dtype)

# 2. Unsigned Integer Types
# Available types: uint8, uint16, uint32, uint64
# Create an array of uint8
uint_array = np.array([1, 2, 3], dtype=np.uint8)
print("Unsigned Integer Array:", uint_array)
print("Data Type:", uint_array.dtype)

# 3. Floating Point Types
# Available types: float16, float32, float64
# Create an array of float64
float_array = np.array([1.0, 2.0, 3.0], dtype=np.float64)
print("Floating Point Array:", float_array)
print("Data Type:", float_array.dtype)

# 4. Complex Types
# Available types: complex64, complex128
# Create an array of complex128
complex_array = np.array([1+2j, 3+4j], dtype=np.complex128)
print("Complex Array:", complex_array)
print("Data Type:", complex_array.dtype)

# 5. Boolean Type
# Available type: bool_
# Create a boolean array
bool_array = np.array([True, False, True], dtype=np.bool_)
print("Boolean Array:", bool_array)
print("Data Type:", bool_array.dtype)

# 6. String Type
# Available type: str_
# Create an array of strings
string_array = np.array(['apple', 'banana', 'cherry'], dtype=np.str_)
print("String Array:", string_array)
print("Data Type:", string_array.dtype)

# 7. Object Type
# Available type: object_
# Create an array of objects
object_array = np.array([1, 'apple', 3.14], dtype=np.object_)
print("Object Array:", object_array)
print("Data Type:", object_array.dtype)

Integer Array: [1 2 3]
Data Type: int32
Unsigned Integer Array: [1 2 3]
Data Type: uint8
Floating Point Array: [1. 2. 3.]
Data Type: float64
Complex Array: [1.+2.j 3.+4.j]
Data Type: complex128
Boolean Array: [ True False  True]
Data Type: bool
String Array: ['apple' 'banana' 'cherry']
Data Type: <U6
Object Array: [1 'apple' 3.14]
Data Type: object


In [4]:
# size of data types
import numpy as np

# Demonstrating the size of different NumPy data types in bytes

# Integer Types
print("Integer Types:")
print("Size of int8:", np.dtype(np.int8).itemsize, "bytes")   # 1 byte
print("Size of int16:", np.dtype(np.int16).itemsize, "bytes") # 2 bytes
print("Size of int32:", np.dtype(np.int32).itemsize, "bytes") # 4 bytes
print("Size of int64:", np.dtype(np.int64).itemsize, "bytes") # 8 bytes

# Unsigned Integer Types
print("\nUnsigned Integer Types:")
print("Size of uint8:", np.dtype(np.uint8).itemsize, "bytes")   # 1 byte
print("Size of uint16:", np.dtype(np.uint16).itemsize, "bytes") # 2 bytes
print("Size of uint32:", np.dtype(np.uint32).itemsize, "bytes") # 4 bytes
print("Size of uint64:", np.dtype(np.uint64).itemsize, "bytes") # 8 bytes

# Floating Point Types
print("\nFloating Point Types:")
print("Size of float16:", np.dtype(np.float16).itemsize, "bytes") # 2 bytes
print("Size of float32:", np.dtype(np.float32).itemsize, "bytes") # 4 bytes
print("Size of float64:", np.dtype(np.float64).itemsize, "bytes") # 8 bytes

# Complex Types
print("\nComplex Types:")
print("Size of complex64:", np.dtype(np.complex64).itemsize, "bytes") # 8 bytes (2 x 4 bytes)
print("Size of complex128:", np.dtype(np.complex128).itemsize, "bytes") # 16 bytes (2 x 8 bytes)

# Boolean Type
print("\nBoolean Type:")
print("Size of bool:", np.dtype(np.bool_).itemsize, "bytes") # 1 byte

# Object Type
print("\nObject Type:")
print("Size of object:", np.dtype(np.object_).itemsize, "bytes") # Typically 8 bytes for pointer

Integer Types:
Size of int8: 1 bytes
Size of int16: 2 bytes
Size of int32: 4 bytes
Size of int64: 8 bytes

Unsigned Integer Types:
Size of uint8: 1 bytes
Size of uint16: 2 bytes
Size of uint32: 4 bytes
Size of uint64: 8 bytes

Floating Point Types:
Size of float16: 2 bytes
Size of float32: 4 bytes
Size of float64: 8 bytes

Complex Types:
Size of complex64: 8 bytes
Size of complex128: 16 bytes

Boolean Type:
Size of bool: 1 bytes

Object Type:
Size of object: 8 bytes


In [15]:
# change data types
import numpy as np

# Create a NumPy array with default data type (int)
array_int = np.array([1, 2, 3, 4, 5])
print("Original Array (int):")
print(array_int)
print("Data type:", array_int.dtype)

# Change data type to float
array_float = array_int.astype(float)
print("\nArray converted to float:")
print(array_float)
print("Data type:", array_float.dtype)

# Change data type to string
array_str = array_int.astype(str)
print("\nArray converted to string:")
print(array_str)
print("Data type:", array_str.dtype)

# Create a float array
array_float2 = np.array([1.1, 2.2, 3.3])
print("\nOriginal Array (float):")
print(array_float2)
print("Data type:", array_float2.dtype)

# Change data type to integer (will truncate the decimal part)
array_int2 = array_float2.astype(int)
print("\nArray converted to int:")
print(array_int2)
print("Data type:", array_int2.dtype)

# Change data type to a specific NumPy type (e.g., np.float32)
array_float32 = array_int.astype(np.float32)
print("\nArray converted to float32:")
print(array_float32)
print("Data type:", array_float32.dtype)

Original Array (int):
[1 2 3 4 5]
Data type: int64

Array converted to float:
[1. 2. 3. 4. 5.]
Data type: float64

Array converted to string:
['1' '2' '3' '4' '5']
Data type: <U21

Original Array (float):
[1.1 2.2 3.3]
Data type: float64

Array converted to int:
[1 2 3]
Data type: int64

Array converted to float32:
[1. 2. 3. 4. 5.]
Data type: float32


## Array math
Basic mathematical functions operate elementwise on arrays, and are available both as operator overloads and as functions in the numpy module:

In [6]:
import numpy as np

x = np.array([[1,2],
              [3,4]], dtype=np.float64
)
y = np.array([[5,6],
              [7,8]], dtype=np.float64
)

# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y)
print(np.add(x, y))

# Elementwise difference; both produce the array
# [[-4.0 -4.0]
#  [-4.0 -4.0]]
print(x - y)
print(np.subtract(x, y))

# Elementwise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(x * y)
print(np.multiply(x, y))

# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)
print(np.divide(x, y))


[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]
[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]
[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]


In [9]:
import numpy as np

x = np.array([[1,2],
              [3,4]], dtype=np.float64
)

# Elementwise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(np.sqrt(x))

# Power
power_result = np.power(x, 2)  # Raises each element to the power of 2
print("Power:", power_result)

# Exponential
exp_result = np.exp(x)  # Computes e^x for each element
print("Exponential:", exp_result)

# Logarithm
log_result = np.log(x)  # Computes the natural logarithm
print("Natural Logarithm:", log_result)

log10_result = np.log10(x)  # Computes the base-10 logarithm
print("Base-10 Logarithm:", log10_result)

# Rounding Functions
floor_result = np.floor(x + 0.5)  # Rounds down
print("Floor:", floor_result)

ceil_result = np.ceil(x + 0.5)    # Rounds up
print("Ceil:", ceil_result)

round_result = np.round(x + 0.5)   # Rounds to nearest integer
print("Round:", round_result)

[[1.         1.41421356]
 [1.73205081 2.        ]]
Power: [[ 1.  4.]
 [ 9. 16.]]
Exponential: [[ 2.71828183  7.3890561 ]
 [20.08553692 54.59815003]]
Natural Logarithm: [[0.         0.69314718]
 [1.09861229 1.38629436]]
Base-10 Logarithm: [[0.         0.30103   ]
 [0.47712125 0.60205999]]
Floor: [[1. 2.]
 [3. 4.]]
Ceil: [[2. 3.]
 [4. 5.]]
Round: [[2. 2.]
 [4. 4.]]


### flatten / reshape
flaten Convert a multi-dimensional array into a one-dimensional array. This is particularly useful when you want to simplify the structure of your data for processing or analysis.

The opposite of the flatten() operation in NumPy is reshaping an array back to its original multi-dimensional form. This can be done using the reshape() method. The reshape() function allows you to specify the new shape of the array, as long as the total number of elements remains the same.

In [10]:
import numpy as np

# Create a 2D NumPy array (matrix)
array_2d = np.array([[1, 2, 3],
                     [4, 5, 6],
                     [7, 8, 9]])

print("Original 2D Array:")
print(array_2d)

# Flatten the 2D array into a 1D array
flattened_array = array_2d.flatten()

print("\nFlattened Array:")
print(flattened_array)

Original 2D Array:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Flattened Array:
[1 2 3 4 5 6 7 8 9]


In [11]:
import numpy as np

# Create a flattened array of size 18
flattened_array = np.arange(18)  # This creates an array with values from 0 to 17
print("Flattened Array:")
print(flattened_array)

# Reshape into different sizes

# 1. Reshape to 2D array of shape (2, 9)
reshaped_2d_2x9 = flattened_array.reshape(2, 9)
print("\nReshaped Array (2x9):")
print(reshaped_2d_2x9)

# 2. Reshape to 2D array of shape (3, 6)
reshaped_2d_3x6 = flattened_array.reshape(3, 6)
print("\nReshaped Array (3x6):")
print(reshaped_2d_3x6)

# 3. Reshape to 2D array of shape (6, 3)
reshaped_2d_6x3 = flattened_array.reshape(6, 3)
print("\nReshaped Array (6x3):")
print(reshaped_2d_6x3)

# 4. Reshape to 3D array of shape (2, 3, 3)
reshaped_3d_2x3x3 = flattened_array.reshape(2, 3, 3)
print("\nReshaped Array (2x3x3):")
print(reshaped_3d_2x3x3)

# 5. Reshape to 3D array of shape (3, 2, 3)
reshaped_3d_3x2x3 = flattened_array.reshape(3, 2, 3)
print("\nReshaped Array (3x2x3):")
print(reshaped_3d_3x2x3)

# 6. Reshape to 1D array (keeping it the same)
reshaped_1d = flattened_array.reshape(18)
print("\nReshaped Array (1D, same as original):")
print(reshaped_1d)

Flattened Array:
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17]

Reshaped Array (2x9):
[[ 0  1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16 17]]

Reshaped Array (3x6):
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]]

Reshaped Array (6x3):
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]
 [12 13 14]
 [15 16 17]]

Reshaped Array (2x3x3):
[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]]

Reshaped Array (3x2x3):
[[[ 0  1  2]
  [ 3  4  5]]

 [[ 6  7  8]
  [ 9 10 11]]

 [[12 13 14]
  [15 16 17]]]

Reshaped Array (1D, same as original):
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17]


In [12]:
import numpy as np

# Create a 2D NumPy array
array_2d = np.array([[1, 2, 3],
                     [4, 2, 6],
                     [1, 8, 3],
                     [4, 5, 6]])

print("Original 2D Array:")
print(array_2d)

# Find unique values across the entire array
unique_values = np.unique(array_2d)
print("\nUnique values in the entire array:")
print(unique_values)

# Find unique values in each column (axis=0)
unique_columns = [np.unique(array_2d[:, col]) for col in range(array_2d.shape[1])]
print("\nUnique values in each column:")
for i, unique in enumerate(unique_columns):
    print(f"Column {i}: {unique}")

# Find unique values in each row (axis=1)
unique_rows = [np.unique(array_2d[row, :]) for row in range(array_2d.shape[0])]
print("\nUnique values in each row:")
for i, unique in enumerate(unique_rows):
    print(f"Row {i}: {unique}")

Original 2D Array:
[[1 2 3]
 [4 2 6]
 [1 8 3]
 [4 5 6]]

Unique values in the entire array:
[1 2 3 4 5 6 8]

Unique values in each column:
Column 0: [1 4]
Column 1: [2 5 8]
Column 2: [3 6]

Unique values in each row:
Row 0: [1 2 3]
Row 1: [2 4 6]
Row 2: [1 3 8]
Row 3: [4 5 6]


We use the dot function to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices. dot is available both as a function in the numpy module and as an instance method of array objects:

In [3]:
import numpy as np

x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

v = np.array([9,10])
w = np.array([11, 12])

# Inner product of vectors; both produce 219
print(v.dot(w))
print(np.dot(v, w))

# Matrix / vector product; both produce the rank 1 array [29 67]
print(x.dot(v))
print(np.dot(x, v))

# Matrix / matrix product; both produce the rank 2 array
# [[19 22]
#  [43 50]]
print(x.dot(y))
print(np.dot(x, y))

219
219
[29 67]
[29 67]
[[19 22]
 [43 50]]
[[19 22]
 [43 50]]


In [2]:
import numpy as np

# Inverse of a Matrix
from numpy.linalg import inv

matrix = np.array([[4, 7],
                   [2, 6]])
inverse_matrix = inv(matrix)
print("Inverse of Matrix:\n", inverse_matrix)

Inverse of Matrix:
 [[ 0.6 -0.7]
 [-0.2  0.4]]


Numpy provides many useful functions for performing computations on arrays; one of the most useful is sum:

In [4]:
import numpy as np

x = np.array([[1,2],[3,4]])

print(np.sum(x))  # Compute sum of all elements; prints "10"
print(np.sum(x, axis=0))  # Compute sum of each column; prints "[4 6]"
print(np.sum(x, axis=1))  # Compute sum of each row; prints "[3 7]"

10
[4 6]
[3 7]


Apart from computing mathematical functions using arrays, we frequently need to reshape or otherwise manipulate data in arrays. The simplest example of this type of operation is transposing a matrix; to transpose a matrix, simply use the T attribute of an array object:

In [5]:
import numpy as np

x = np.array([[1,2], [3,4]])
print(x)    # Prints "[[1 2]
            #          [3 4]]"
print(x.T)  # Prints "[[1 3]
            #          [2 4]]"

# Note that taking the transpose of a rank 1 array does nothing:
v = np.array([1,2,3])
print(v)    # Prints "[1 2 3]"
print(v.T)  # Prints "[1 2 3]"

[[1 2]
 [3 4]]
[[1 3]
 [2 4]]
[1 2 3]
[1 2 3]


In [1]:
import numpy as np
from scipy import stats

# Create a sample NumPy array
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Calculate statistics
mean_value = np.mean(data)
std_deviation = np.std(data)
variance = np.var(data)
median_value = np.median(data)
min_value = np.min(data)
max_value = np.max(data)
percentile_25 = np.percentile(data, 25)
percentile_75 = np.percentile(data, 75)
summary = stats.describe(data)

# Print results
print("Mean:", mean_value)
print("Standard Deviation:", std_deviation)
print("Variance:", variance)
print("Median:", median_value)
print("Minimum:", min_value)
print("Maximum:", max_value)
print("25th Percentile:", percentile_25)
print("75th Percentile:", percentile_75)
print("Summary Statistics:", summary)

Mean: 5.5
Standard Deviation: 2.8722813232690143
Variance: 8.25
Median: 5.5
Minimum: 1
Maximum: 10
25th Percentile: 3.25
75th Percentile: 7.75
Summary Statistics: DescribeResult(nobs=10, minmax=(np.int64(1), np.int64(10)), mean=np.float64(5.5), variance=np.float64(9.166666666666668), skewness=np.float64(0.0), kurtosis=np.float64(-1.2242424242424244))


In [13]:
import numpy as np

# Create a 3x4 NumPy array
array_2d = np.array([[10, 20, 30, 40],
                     [50, 60, 70, 80],
                     [90, 100, 110, 120]])

print("Original Array:")
print(array_2d)

# Calculate mean, standard deviation, and variance for each column (axis=0)
mean_columns = np.mean(array_2d, axis=0)
std_columns = np.std(array_2d, axis=0)
var_columns = np.var(array_2d, axis=0)

print("\nMean of each column:")
print(mean_columns)

print("\nStandard Deviation of each column:")
print(std_columns)

print("\nVariance of each column:")
print(var_columns)

# Calculate mean, standard deviation, and variance for each row (axis=1)
mean_rows = np.mean(array_2d, axis=1)
std_rows = np.std(array_2d, axis=1)
var_rows = np.var(array_2d, axis=1)

print("\nMean of each row:")
print(mean_rows)

print("\nStandard Deviation of each row:")
print(std_rows)

print("\nVariance of each row:")
print(var_rows)

Original Array:
[[ 10  20  30  40]
 [ 50  60  70  80]
 [ 90 100 110 120]]

Mean of each column:
[50. 60. 70. 80.]

Standard Deviation of each column:
[32.65986324 32.65986324 32.65986324 32.65986324]

Variance of each column:
[1066.66666667 1066.66666667 1066.66666667 1066.66666667]

Mean of each row:
[ 25.  65. 105.]

Standard Deviation of each row:
[11.18033989 11.18033989 11.18033989]

Variance of each row:
[125. 125. 125.]


In [14]:
import numpy as np

# Create a 3x4 NumPy array
array_2d = np.array([[10, 20, 30, 40],
                     [50, 60, 70, 80],
                     [90, 100, 110, 120]])

print("Original Array:")
print(array_2d)

# Calculate sum for each column (axis=0)
sum_columns = np.sum(array_2d, axis=0)
print("\nSum of each column:")
print(sum_columns)

# Calculate minimum for each column (axis=0)
min_columns = np.min(array_2d, axis=0)
print("\nMinimum of each column:")
print(min_columns)

# Calculate maximum for each column (axis=0)
max_columns = np.max(array_2d, axis=0)
print("\nMaximum of each column:")
print(max_columns)

# Calculate median for each column (axis=0)
median_columns = np.median(array_2d, axis=0)
print("\nMedian of each column:")
print(median_columns)

# Calculate sum for each row (axis=1)
sum_rows = np.sum(array_2d, axis=1)
print("\nSum of each row:")
print(sum_rows)

# Calculate minimum for each row (axis=1)
min_rows = np.min(array_2d, axis=1)
print("\nMinimum of each row:")
print(min_rows)

# Calculate maximum for each row (axis=1)
max_rows = np.max(array_2d, axis=1)
print("\nMaximum of each row:")
print(max_rows)

# Calculate median for each row (axis=1)
median_rows = np.median(array_2d, axis=1)
print("\nMedian of each row:")
print(median_rows)

Original Array:
[[ 10  20  30  40]
 [ 50  60  70  80]
 [ 90 100 110 120]]

Sum of each column:
[150 180 210 240]

Minimum of each column:
[10 20 30 40]

Maximum of each column:
[ 90 100 110 120]

Median of each column:
[50. 60. 70. 80.]

Sum of each row:
[100 260 420]

Minimum of each row:
[10 50 90]

Maximum of each row:
[ 40  80 120]

Median of each row:
[ 25.  65. 105.]
