<a href="https://colab.research.google.com/github/VishalC02/LTIMB9/blob/master/LTIMB7_MS2_Topics_D4_Python_Pytest_Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Pytest Overview in Python:

pytest is a powerful and easy-to-use testing framework for writing unit tests, functional tests, and even complex testing suites in Python.
It is widely adopted because of its simple syntax, rich plugin architecture, and advanced features like fixtures, parameterization,
and assertion introspection.

# Why Use Pytest?

    Simple syntax: Write test functions without needing to create classes.
    Auto-discovery: Automatically detects test files and test functions.
    Advanced assertion introspection: Better output when tests fail.
    Fixtures: Reusable components for setup/teardown logic.
    Plugins: Extensible via plugins (pytest-django, pytest-cov, etc.).
    Parameterized tests: Run the same test with multiple inputs easily.

In [None]:
# Naming Conventions:

Files must start with test_ or end with _test.py.
Functions must start with test_.

# Assertions:

You can write assertions using Python's native assert statement:

def test_truth():
    assert True
    assert 5 > 2

Pytest gives detailed error messages when assertions fail.

In [None]:
# Basic Example

# test_sample.py

def add(a, b):
    return a + b

def test_add():
    assert add(2, 3) == 5
    assert add(-1, 1) == 0

# Run this test using the command:
pytest test_sample.py

In [None]:
Common Assertion Methods Provided by Pytest: Pytest provides a number of built-in assertions
    that make it easier to test various conditions. Some of the most common assertions include:

assert a == b: Checks if a is equal to b.
assert a != b: Checks if a is not equal to b.
assert a is b: Checks if a is the same object as b.
assert a is not b: Checks if a is not the same object as b.
assert a in b: Checks if a is a member of b.
assert a not in b: Checks if a is not a member of b.
assert a > b: Checks if a is greater than b.
assert a < b: Checks if a is less than b.
assert isinstance(a, type): Checks if a is an instance of type.

In [None]:
# Summary

**Testing is a key practice in software development that helps identify bugs early, ensures reliability, and improves code quality.**

Pytest is an easy-to-use, feature-rich testing framework in Python, enabling efficient test writing, running, and reporting.
With Pytest, tests are organized in separate files with a specific naming convention (e.g., test_*.py), and assertions are used to verify code behavior.

Automated testing with Pytest provides speed, reliability, and consistency to ensure the software works as intended.

In [None]:
# Numpy - An Introduction:

NumPy (Numerical Python) is a fundamental library for numerical computing in Python.

It provides support for working with large multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

It is fast, memory-efficient, and widely used in data science, machine learning, scientific computing, and more.

In [None]:
# Benefits of Using NumPy Over Standard Python Lists:

Efficiency: NumPy arrays are more efficient for numerical operations than Python lists.
They store elements of the same type, allowing them to take less memory and enable faster operations.
Vectorization: NumPy allows element-wise operations without the need for explicit loops, leading to cleaner and faster code
Broad Functionality: NumPy provides a wide array of mathematical functions for linear algebra, statistical operations, Fourier transforms, and more
Better Performance: NumPy is implemented in C, which makes it much faster for numerical computations compared to Python lists that are less optimized.

In [None]:
# Why Use NumPy?

| Feature       | Benefit                                                |
| ------------- | ------------------------------------------------------ |
| ndarray       | Efficient multi-dimensional arrays                     |
| Broadcasting  | Perform operations on arrays of different shapes       |
| Vectorization | Faster code, no need for explicit loops                |
| Integration   | Works well with Pandas, Matplotlib, Scikit-learn, etc. |

In [None]:
# NumPy Arrays:

**NumPy arrays are the core data structure in NumPy and they are used to store collections of data**
**Arrays in NumPy are much more powerful than Python lists and are the foundation for most numerical operations in Python.**

# Advantages of NumPy Arrays Over Lists:

**Fixed Type: NumPy arrays are homogeneous, meaning all elements must be of the same type, which increases efficiency.**

**Multidimensional: NumPy arrays can be multidimensional (e.g., 2D arrays), while Python lists are limited to one-dimensional structures.**

**Performance: NumPy arrays offer better performance for numerical operations, thanks to their implementation in C.**

In [None]:
| Feature   | Python List                                | NumPy Array                                  |
| --------- | ------------------------------------------ | -------------------------------------------- |
| Execution | Loop over each element                     | Vectorized C-level execution                 |
| Memory    | More memory overhead (stores full objects) | Compact memory layout                        |
| Speed     | Slower (interpreted loop)                  | Much faster (uses compiled C under the hood) |
| Operation | Element-wise manually                      | Element-wise automatic (vectorized)          |
| Result    | Not stored                                 | Can directly return a new array              |

In [None]:
#  Why NumPy is Faster ?

    Written in C: NumPy operations are executed in compiled C code, not Python.

    Vectorized Operations: Entire arrays are processed without Python for loops.

    Contiguous Memory: NumPy arrays are more memory-efficient than Python lists.

    SIMD Optimization: NumPy can take advantage of CPU-level parallelism. (Single Instruction, Multiple Data.)
    It is a parallel processing technique used by CPUs (and some GPUs), where one instruction operates on multiple data points at once.

In [None]:
# Conclusion: Why NumPy is Best?

If you're doing large-scale numerical computations, NumPy is vastly better.
Native Python lists are good for general-purpose use, but not optimized for numerical performance.

Always use NumPy for:
    Array math
    Data preprocessing
    Scientific computing
    Machine learning input pipelines

In [None]:
# Example - 1: Creating NumPy Arrays:

import numpy as np

arr_1d = np.array([1, 2, 3, 4, 5])
print("1D Array:")
print(arr_1d)

arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("\n2D Array:")
print(arr_2d)

arr_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print("\n3D Array:")
print(arr_3d)

print("\n1D Array Attributes:")
print("Shape:", arr_1d.shape)  # Shape of the array (number of elements along each axis)
print("Size:", arr_1d.size)    # Total number of elements in the array
print("Data Type:", arr_1d.dtype)  # Data type of the elements in the array


print("\n2D Array Attributes:")
print("Shape:", arr_2d.shape)  # Shape: (rows, columns)
print("Size:", arr_2d.size)    # Total number of elements
print("Data Type:", arr_2d.dtype)  # Data type of elements


print("\n3D Array Attributes:")
print("Shape:", arr_3d.shape)  # Shape: (depth, rows, columns)
print("Size:", arr_3d.size)    # Total number of elements
print("Data Type:", arr_3d.dtype)  # Data type of elements

In [None]:
# Other NumPy Functions:

Besides np.array(),
    --> NumPy provides several other functions for creating arrays with specific properties.

1. np.asarray(): Converts any sequence-like object (e.g., lists, tuples) to a NumPy array.
2. np.ones(): Creates an array filled with ones of a specified shape.
3. np.zeros(): Creates an array filled with zeros of a specified shape.
4. np.empty(): Creates an uninitialized array of a specified shape. The values will be random, as they are not initialized.

These functions provide an easy way to create arrays for specific use cases without having to manually specify each element.

In [None]:
# np.array() - Creates an array from a list or other array-like objects.¶


import numpy as np

arr = np.array([1, 2, 3, 4])
print(type(arr))
print("np.array:", arr)
print("Data Type:", arr.dtype)

# 2. np.asarray() - Similar to np.array(), but it does not copy the input if it is already an array (i.e., it returns a view).

arr_list = [1, 2, 3]
arr = np.asarray(arr_list)
print("np.asarray:", arr)

# 3. np.ones() - Creates an array of ones with a specified shape.

ones_arr = np.ones((2, 3))  # 2x3 array of ones -- 2 => rows and 3 => columns
print("np.ones():", ones_arr)

# 4. np.zeros() - Creates an array of zeros with a specified shape.

zeros_arr = np.zeros((3, 2))  # 3 x 2 array of zeros -- 3 => rows and 2 => columns
print("np.zeros():", zeros_arr)

# 5. np.empty() - Creates an uninitialized array. The values in the array are random and depend on the memory content at the time of creation.

empty_arr = np.empty((2, 2))  # 2 x 2 uninitialized array
print("np.empty():", empty_arr)

# np.arange() - Creates an array with a range of values, similar to Python’s range(). You can specify the start, stop, and step.

range_arr = np.arange(0, 10, 2)  # array from 0 to 10 with step size of 2
print("np.arange():", range_arr)

# np.linspace() Creates an array with evenly spaced numbers over a specified range. You specify the start and stop, and the number of points (not the step size).

linspace_arr = np.linspace(0,1,5)    # 5 evenly spaced numbers from 0 to 1
print("np.linspace():", linspace_arr)

# np.eye() - Creates an identity matrix (a square matrix with ones on the diagonal and zeros elsewhere).

identity_matrix = np.eye(3)  # 3x3 identity matrix
print("np.eye():", identity_matrix)

In [None]:
# Example - 1: Putting it All Together in One Code Example:

import numpy as np

# 1. np.array()
arr = np.array([1, 2, 3, 4])
print("np.array():\n", arr)

# 2. np.asarray()
arr_list = [1, 2, 3]
arr_from_list = np.asarray(arr_list)
print("np.asarray():\n", arr_from_list)

# 3. np.ones()
ones_arr = np.ones((2, 3))
print("np.ones():\n", ones_arr)

# 4. np.zeros()
zeros_arr = np.zeros((3, 2))
print("np.zeros():\n", zeros_arr)

# 5. np.empty()
empty_arr = np.empty((2, 2))
print("np.empty():\n", empty_arr)

# 6. np.arange()
range_arr = np.arange(0, 10, 2)
print("np.arange():\n", range_arr)

# 7. np.linspace()
linspace_arr = np.linspace(0, 1, 5)
print("np.linspace():\n", linspace_arr)

# 8. np.eye()
identity_matrix = np.eye(3)
print("np.eye():\n", identity_matrix)

In [None]:
# Example - 2: Putting it All Together Creating Array and Accessing their attributes:

import numpy as np

# 1D array
arr = np.array([1, 2, 3])
print("1D array:", arr)

# 2D array
arr_2d = np.array([[1, 2], [3, 4]])
print("2D array:\n", arr_2d)

# Array attributes
print("Shape:", arr_2d.shape)
print("Size:", arr_2d.size)
print("Data type:", arr_2d.dtype)

# Using other functions
arr_from_list = np.asarray([5, 6, 7])
print("Array from list:", arr_from_list)

ones_arr = np.ones((3, 3))
print("3x3 ones array:\n", ones_arr)

zeros_arr = np.zeros((2, 2))
print("2x2 zeros array:\n", zeros_arr)

empty_arr = np.empty((2, 2))
print("2x2 empty array (uninitialized):\n", empty_arr)

In [None]:
# Explaination of Functions:

# np.array(): Converts a list or array-like object into a NumPy array.
# np.asarray(): Similar to np.array() but returns a view if the input is already an array (no copy).
# np.ones(): Creates an array filled with ones of a given shape.
# np.zeros(): Creates an array filled with zeros of a given shape.
# np.empty(): Creates an array without initializing its values (it may contain arbitrary values).
# np.arange(): Generates an array with values from a given start to stop, with a specified step size.
# np.linspace(): Generates an array with a specified number of evenly spaced values between a given start and stop.
# np.eye(): Creates an identity matrix of size n x n, where the diagonal elements are 1, and the rest are 0.

These functions cover many common array creation patterns in NumPy and are essential when working with numerical data.

In [None]:
# Day-2_Part-2: Deep Dive into Numpy Module

Let's break this down into the following sections:
--> Basic Array Operations
--> Broadcasting
--> Element-wise Operations and Their Efficiency

In [None]:
# Compute the power of nonzero, unique elements in a NumPy array:

import numpy as np

# Creating two arrays
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([5, 6, 7, 8])

# Element-wise addition
sum_arr = arr1 + arr2
print("Addition:", sum_arr)

# Element-wise subtraction
diff_arr = arr1 - arr2
print("Subtraction:", diff_arr)

# Element-wise multiplication
prod_arr = arr1 * arr2
print("Multiplication:", prod_arr)

# Element-wise division
div_arr = arr1 / arr2
print("Division:", div_arr)

# Element-wise power (arr1 to the power of arr2)
pow_arr = arr1 ** arr2
print("Power:", pow_arr)

In [None]:
import numpy as np

# Create a NumPy array
arr = np.array([0, 2, 3, 2, 4, 0, 3, 5])

# Get unique nonzero elements
unique_nonzero = np.unique(arr[arr != 0])

# Apply power function (example: square each element)
powered_array = np.power(unique_nonzero, 2)

print("Unique Nonzero Elements:", unique_nonzero)
print("Powered Array:", powered_array)

In [None]:
import numpy as np

# Create a NumPy array
arr = np.array([0, 2, 3, 0, 4, 0, 5])

# Get indices of nonzero elements
nonzero_indices = np.nonzero(arr)

# Extract nonzero elements using these indices
nonzero_elements = arr[nonzero_indices]

print("Indices of nonzero elements:", nonzero_indices)
print("Nonzero elements:", nonzero_elements)



import numpy as np

# Create a 1D NumPy array
arr = np.array([0, 7, 0, 2, 3, 0, 4, 0, 5])

# Get indices of nonzero elements
nonzero_indices = np.nonzero(arr)

# Extract nonzero elements
nonzero_elements = arr[nonzero_indices]

print("Original Array:", arr)
print("Indices of Nonzero Elements:", nonzero_indices)
print("Nonzero Elements:", nonzero_elements)

In [None]:
# Array Operations and Broadcasting in NumPy


import numpy as np

# Creating two arrays
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([5, 6, 7, 8])

# Element-wise addition
sum_arr = arr1 + arr2
print("Addition:", sum_arr)

# Element-wise subtraction
diff_arr = arr1 - arr2
print("Subtraction:", diff_arr)

# Element-wise multiplication
prod_arr = arr1 * arr2
print("Multiplication:", prod_arr)

# Element-wise division
div_arr = arr1 / arr2
print("Division:", div_arr)

# Element-wise power (arr1 to the power of arr2)
pow_arr = arr1 ** arr2
print("Power:", pow_arr)

print(np.power(arr1,arr2))

print(np.unique(arr1))

print(np.unique(arr2))

In [None]:
# Broadcasting:

Broadcasting is a powerful feature in NumPy that allows operations on arrays of different shapes.
In cases where the shapes are not the same, NumPy will attempt to "broadcast" the smaller array across the larger one so that they have compatible shapes.

import numpy as np

# Creating a 2D array and a 1D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
arr_1d = np.array([1, 0, -1])

# Adding the 1D array to the 2D array
broadcasted_sum = arr_2d + arr_1d
print("Broadcasting Addition:\n", broadcasted_sum)


# Creating two large arrays

arr1 = np.random.rand(1000000)  # Array of 1 million random values
arr2 = np.random.rand(1000000)

# Element-wise addition (using broadcasting and vectorization)
result = arr1 + arr2

print("Element-wise addition of large arrays complete.")
print(result)



# Other Operations: Matrix Multiplication:

You can use the @ operator or np.dot() to perform matrix multiplication (dot product) or inner products.

# 2D array multiplication (dot product)
arr_2d_1 = np.array([[1, 2], [3, 4]])
arr_2d_2 = np.array([[5, 6], [7, 8]])

# Matrix multiplication
matrix_product = np.dot(arr_2d_1, arr_2d_2)
print("Matrix Multiplication:\n", matrix_product)


In [None]:
# Indexing and Slicing in NumPy Arrays
Indexing:

NumPy arrays are indexed similar to Python lists, but with support for multidimensional arrays. You can use integers or slices to index elements.

Example: Indexing a 1D Array

import numpy as np

# Create a 1D array
arr = np.array([10, 20, 30, 40, 50])

# Accessing elements by index
print("First Element:", arr[0])  # First element
print("Last Element:", arr[-1])  # Last element


# Create a 2D array
arr_2d = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Accessing elements by row and column indices
print("Element at (0, 0):", arr_2d[0, 0])  # First row, first column
print("Element at (2, 2):", arr_2d[2, 2])  # Third row, third column


# Slicing:
Slicing allows you to access a subarray or range of elements. The syntax for slicing is [start:stop:step].
# Example: Slicing a 1D Array

# Create a 1D array
arr = np.array([10, 20, 30, 40, 50])

# Slicing a 1D array
arr_slice = arr[1:4]  # Elements from index 1 to 3 (4 is not included)
print("Sliced Array:", arr_slice)


# Create a 2D array
arr_2d = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Slicing a 2D array
sub_arr = arr_2d[1:, 1:]  # Subarray from rows 1 onward, columns 1 onward
print("Sliced 2D Array:\n", sub_arr)

# Rows starting from index 1 (i.e., the second row and beyond).
# Columns starting from index 1 (i.e., the second column and beyond).

# Row 1 (from the original array) is [4, 5, 6], and after slicing columns from index 1 onward, we get [5, 6].
# Row 2 (from the original array) is [7, 8, 9], and after slicing columns from index 1 onward, we get [8, 9]

In [None]:
# Boolean Indexing:
Boolean indexing allows you to select elements based on conditions. You create a boolean array (True/False values) and use it to index the original array.

Example: Boolean Indexing

# Create a 1D array
arr = np.array([10, 20, 30, 40, 50])

# Boolean condition: Select elements greater than 20
condition = arr > 20
print("Condition (arr > 20):", condition)

# Apply the condition to select elements
filtered_arr = arr[condition]
print("Filtered Array:", filtered_arr)


# Create a 2D array
arr_2d = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Boolean condition: Select elements greater than 5
filtered_2d = arr_2d[arr_2d > 5]
print("Filtered 2D Array:", filtered_2d)


# Fancy Indexing:
Fancy indexing allows you to index an array using other arrays or lists of indices. This allows you to select multiple non-contiguous elements at once.

# Example: Fancy Indexing

# Create a 1D array
arr = np.array([10, 20, 30, 40, 50])

# Select multiple elements using a list of indices
fancy_arr = arr[[0, 2, 4]]
print("Fancy Indexed Array:", fancy_arr)

import numpy as np

# Create a 2D array
arr_2d = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Select specific rows and columns using fancy indexing
fancy_2d = arr_2d[[0, 2], [1, 2]]  # Select (0, 1) and (2, 2)  # arr_2d[[0, 2], [1, 2]] selects only paired elements (0,1) and (2,2).
print("Fancy Indexed 2D Array:", fancy_2d)


# Selecting rows and specific columns (without pairing elements):
# Create a 2D array
arr_2d = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

arr_2d[np.ix_([0, 2], [1, 2])]  # np.ix_() is useful when you need to select a grid of values, not just a few paired elements.

In [None]:
# Creating a Copy of ndarray:
In NumPy, when you slice an array, you are creating a view, not a copy.
This means that changes made to the view will affect the original array. If you want a separate copy, you can use the copy() method.

# Example: Creating a Copy

# Create a 1D array
arr = np.array([10, 20, 30, 40, 50])

# Create a view by slicing
view_arr = arr[1:4]
print("View of Array:", view_arr)

# Modify the view
view_arr[0] = 100
print("Modified View:", view_arr)
print("Original Array after modifying view:", arr)

# Create a copy
copy_arr = arr.copy()
print("Copy of Array:", copy_arr)

# Modify the copy
copy_arr[0] = 200
print("Modified Copy:", copy_arr)
print("Original Array after modifying copy:", arr)

In [None]:
# NumPy provides a variety of functions for manipulating arrays,  which are essential for preparing, modifying, and splitting datasets.

  Reshaping and Flattening Arrays
  Concatenating Arrays
  Stacking and Splitting Arrays
  Adding and Removing Elements

In [None]:
# Reshaping and Flattening Arrays
Reshaping Arrays: The reshape() method allows you to change the shape of an existing array without modifying its data. The total number of elements in the reshaped array must match the total number of elements in the original array.

# Example: Reshaping an Array:
import numpy as np

# Create a 1D array
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9,10])

# Reshape to 3x3 matrix
reshaped_arr = arr.reshape(2,5)
print("Reshaped Array:\n", reshaped_arr)  # Should be in Multiples


import numpy as np

# Create a 1D array
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

# Reshape to 3x3 matrix
reshaped_arr = arr.reshape(3,3)
print("Reshaped Array:\n", reshaped_arr)  # Should be in Multiples


In [None]:
# Flattening Arrays:
Flattening an array means converting it into a 1D array. The flatten() method returns a new 1D array, while ravel() does the same but returns a flattened view of the array (if possible).

# Example: Flattening an Array

# Flatten the reshaped array
flattened_arr = reshaped_arr.flatten()
print("Flattened Array:", flattened_arr)

# Using ravel to get a flattened view
raveled_arr = reshaped_arr.ravel()
print("Raveled Array:", raveled_arr)




In [None]:
# Concatenating Arrays
Concatenating arrays means joining two or more arrays along an existing axis (row-wise or column-wise). The np.concatenate() function is used for this purpose. You can specify the axis along which the arrays should be concatenated.

Example: Concatenating Arrays

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Concatenate along the first axis (axis=0)
concatenated_arr = np.concatenate((arr1, arr2))
print("Concatenated Array:", concatenated_arr)


arr_2d_1 = np.array([[1, 2],
                     [3, 4]])
arr_2d_2 = np.array([[5, 6],
                     [7, 8]])

# Concatenate along axis 0 (row-wise)
concat_rows = np.concatenate((arr_2d_1, arr_2d_2), axis=0)
print("Row-wise Concatenation:\n", concat_rows)

# Concatenate along axis 1 (column-wise)
concat_cols = np.concatenate((arr_2d_1, arr_2d_2), axis=1)
print("Column-wise Concatenation:\n", concat_cols)

In [None]:
# Stacking and Splitting Arrays
Stacking Arrays: Stacking arrays refers to combining multiple arrays along a new axis. You can use np.vstack() for vertical stacking (row-wise), np.hstack() for horizontal stacking (column-wise), or np.dstack() for stacking along the third axis (depth-wise).

# Example: Stacking Arrays


arr_2d_1 = np.array([[1, 2],
                     [3, 4]])
arr_2d_2 = np.array([[5, 6],
                     [7, 8]])

# Vertical stacking (row-wise)
vstacked_arr = np.vstack((arr_2d_1, arr_2d_2))
print("Vertical Stacking:\n", vstacked_arr)

# Horizontal stacking (column-wise)
hstacked_arr = np.hstack((arr_2d_1, arr_2d_2))
print("Horizontal Stacking:\n", hstacked_arr)

# Depth stacking (along the third axis)
dstacked_arr = np.dstack((arr_2d_1, arr_2d_2))
print("Depth Stacking:\n", dstacked_arr)

In [None]:
# Splitting Arrays:
You can split arrays into multiple sub-arrays using functions like np.split(), np.hsplit(), and np.vsplit(), depending on the axis along which you want to split.

#Example: Splitting Arrays:

import numpy as np
arr = np.array([1, 2,3, 4, 5, 6, 7, 8, 9])

arr_2d_1 = np.array([[1, 2,7],
                     [3, 4,8],[5,6,9]])

# Split the array into 3 equal parts
split_arr = np.split(arr, 3)
print("Split Array:", split_arr)

# Vertical splitting (along rows)
vsplit_arr = np.vsplit(arr_2d_1, 3)
print("Vertical Split:\n", vsplit_arr)

# Horizontal splitting (along columns)
hsplit_arr = np.hsplit(arr_2d_1, 3)
print("Horizontal Split:\n", hsplit_arr)

# Horizontal splitting (along columns)
vsplit_arr1 = np.vsplit(arr, 3)
print("Horizontal Split:\n", vsplit_arr1)

In [None]:
# Adding and Removing Elements
NumPy offers methods to add and remove elements from arrays, although adding/removing elements is generally more efficient with Python lists. However, these methods are still useful in many cases.

Adding Elements: You can use np.append(), np.insert(), and np.delete() for adding and removing elements.

Example: Adding Elements


arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

# Append values to an array
arr_appended = np.append(arr, [10, 11, 12])
print("Appended Array:", arr_appended)

# Insert value at a specific position
arr_inserted = np.insert(arr, 2, [10, 11])
print("Array after Insertion:", arr_inserted)

In [None]:
# Removing Elements:
The np.delete() method allows you to remove elements from an array, but it returns a new array since NumPy arrays have a fixed size.

Example: Removing Elements

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

# Remove an element by index
arr_deleted = np.delete(arr, 2)  # Remove element at index 2
print("Array after Deletion:", arr_deleted)

In [None]:
# Multidimensional arrays# Example – 2D NumPy Array - Append Row and Column:
import numpy as np

# Original 2D array
arr = np.array([[1, 2], [3, 4]])
print("Original:\n", arr)

# Add Elements
# 1. Add a row:
# Add a new row [5,6] at the end
new_arr = np.append(arr, [[5, 6]], axis=0)
print("After adding row:\n", new_arr)

# Add a column:
# Add a new column [7,8,9] (if 3 rows exist) – update example for shape
arr = np.array([[1, 2], [3, 4], [5, 6]])
new_arr = np.append(arr, [[7], [8], [9]], axis=1)
print("After adding column:\n", new_arr)

In [None]:
# Multidimensional arrays # Example – 2D NumPy Array - Delete Row and Column:
import numpy as np

# Original 2D array
arr = np.array([[1, 2], [3, 4]])
print("Original:\n", arr)

# 1. Remove a row:
# Remove row at index 1
new_arr = np.delete(arr, 1, axis=0)
print("After removing row 1:\n", new_arr)

# Remove a column:
# Remove column at index 0
new_arr = np.delete(arr, 0, axis=1)
print("After removing column 0:\n", new_arr)

In [None]:
# Reshaping a 3D NumPy Array:

# In NumPy, reshaping a 3D array is done using numpy.reshape(). The 3D array has the shape:

# (shape) = (depth, rows, columns)
#         = (z, y, x)

# Syntax: np.reshape(array, new_shape)
# Rule: Total number of elements must remain the same: z × y × x = new_shape[0] × new_shape[1] × new_shape[2]

# Example 1: Reshape from 3D → 2D or 1D

import numpy as np

# Original 3D array: shape (2, 2, 3)
arr = np.array([[[1, 2, 3], [4, 5, 6]],
                [[7, 8, 9], [10, 11, 12]]])
print("Original shape:", arr.shape)

# Convert to 2D:
arr_2d = arr.reshape(4, 3)
print("Reshaped to 2D:\n", arr_2d)

# Convert to 1D:
arr_1d = arr.reshape(-1)
print("Reshaped to 1D:\n", arr_1d)

# Example 2: Reshape to another 3D shape
# Total elements = 2*2*3 = 12
# Reshape to shape (3, 2, 2)

arr_new = arr.reshape(3, 2, 2)
print("Reshaped to (3,2,2):\n", arr_new)

# Using -1 (Auto-calculate Dimension)
# Let NumPy calculate one dimension automatically
arr_auto = arr.reshape(2, -1, 2)  # 2 blocks, auto rows, 2 columns
print("Auto reshape:\n", arr_auto)

# Error if shape mismatch:
# This will raise an error – 12 elements cannot reshape to 2×2×4 = 16
arr_wrong = arr.reshape(2, 2, 4)

In [None]:
# Pandas Series: Day-3: Deep Dive - Introduction to pandas
▪  Series - an one-dimensional labeled arrays.
▪  Creating Series from lists, arrays, and dictionaries.
▪  Explore Series attributes and methods for data manipulation.
A Pandas Series is a one-dimensional labeled array that can hold any data type (e.g., integers, strings, floats, etc.). It is similar to a list or a column in a DataFrame, but with added functionality like labeling of elements via an index.

# Key Points:
One-dimensional: A Series is a 1D structure, like a list or an array.
Labeled: Each element in the Series has a label (index), which can be customized.
Homogeneous: The data in a Series are typically of the same type (though it can hold different types).

In [None]:
# From a list:
import pandas as pd

data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print(series)

In [None]:
#2. From a NumPy array:

import numpy as np
import pandas as pd

np_array = np.array([5.5, 10.5, 15.5, 20.5])
series = pd.Series(np_array)
print(series)

In [None]:
# . From a dictionary (keys become the index, values become the data):

data_dict = {'a': 1, 'b': 2, 'c': 3}
series = pd.Series(data_dict)
print(series)

In [None]:
# Attributes of a Series:

    index: Returns the index (labels) of the Series.
    values: Returns the data of the Series as a NumPy array.
    dtype: Returns the data type of the elements in the Series.

In [None]:
# Common Methods of Series:

head(): Returns the first n elements.
tail(): Returns the last n elements.
describe(): Generates descriptive statistics of the Series.
sum(): Returns the sum of the elements.
mean(): Returns the mean of the elements.
map(): Apply a function to each element.

In [None]:
series = pd.Series([10, 20, 30, 40, 50])

print(series.head(3))     # First 3 elements
print(series.tail(2))     # Last 2 elements
print(series.describe())  # Descriptive stats
print(series.sum())       # Sum of elements
print(series.mean())      # Mean of elements

In [None]:
Indexing and Slicing:
You can access elements by their index:

series = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])
print(series['c'])
# Output: 30

In [None]:
# Example of Series Creation and Basic Operations:

import pandas as pd

# Creating a Series from a list
data = [100, 200, 300, 400, 500]
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)

print("Series:")
print(series)

# Basic operations
print("\nSum of elements:", series.sum())
print("Mean of elements:", series.mean())
print("First 3 elements:")
print(series.head(3))

In [None]:
# Conclusion:

    Pandas Series is a powerful and flexible structure for handling one-dimensional data,
    with an intuitive interface for data manipulation, indexing, and statistical operations.

In [None]:
# Pandas DataFrame:

    o  Pandas DataFrames a two-dimensional labeled data structures.
    o  Creating DataFrames from dictionaries, lists, and external files.
    o  Explore DataFrame attributes, indexing, and basic operations.
    o  Read/write operations with CSV and excel file

A Pandas DataFrame is a two-dimensional labeled data structure with columns that can be of different data types.
It is one of the most commonly used structures in Pandas and is very similar to a table in a database, an Excel spreadsheet, or a data frame in R.

# Key Points:

    Two-dimensional: A DataFrame is essentially a table with rows and columns.
    Labeled axes: Rows and columns are both labeled (via index for rows and column names for columns).
    Heterogeneous data: Each column can hold data of different types (e.g., integers, strings, floats, etc.).

In [None]:
# Common Ways to Create a DataFrame:

# 1. From a Dictionary of Lists or Arrays

Each key-value pair in the dictionary becomes a column in the DataFrame,
where the key is the column name and the value is the data in that column.

import pandas as pd
data = {
    'Name': ['DK', 'BK', 'CK', 'AK'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}

df = pd.DataFrame(data)
print(df)

In [None]:
# 2. From a List of Lists (or Nested Lists)
Each sublist is treated as a row in the DataFrame, and you can specify column names.

data = [
    ['AK', 25, 'New York'],
    ['BK', 30, 'Los Angeles'],
    ['CK', 35, 'Chicago'],
    ['DK', 40, 'Houston']
]
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)

In [None]:
# 3. From External Files (CSV, Excel, etc.)
You can read data into a DataFrame from external data sources such as CSV files or Excel files.

# Reading from a CSV file:
import pandas as pd

# Reading a CSV file into a DataFrame
df = pd.read_csv(r'C:\Users\dines\Python_DF_Files\Emp_Data.csv')
print(df)

# Reading from an Excel file:
# Reading an Excel file into a DataFrame
df = pd.read_excel(r'C:\Users\dines\Python_DF_Files\Emp_Data.xlsx', sheet_name='Emp_Data')
print(df)

In [None]:
# From a NumPy Array (with Column Labels)
You can also create a DataFrame from a NumPy array, specifying column names.

import numpy as np

data = np.array([[1, 'AK', 25], [2, 'DK', 30], [3, 'CK', 35]])
df = pd.DataFrame(data, columns=['ID', 'Name', 'Age'])
print(df)


In [None]:
# DataFrame Attributes:
  df.shape: Returns a tuple representing the dimensionality (number of rows, number of columns).
  df.columns: Returns the column labels.
  df.index: Returns the row labels (index).
  df.dtypes: Returns the data types of each column.
  df.values: Returns the data in the DataFrame as a NumPy array.

In [None]:
print(df.shape)       # Output: (4, 3)
print(df.columns)     # Output: Index(['Name', 'Age', 'City'], dtype='object')
print(df.index)       # Output: RangeIndex(start=0, stop=4, step=1)
print(df.dtypes)      # Output: Name     object
                      #          Age      int64
                      #          City     object
                      #          dtype: object


In [None]:
#Basic Operations on DataFrame
# Accessing Columns: You can access a column in a DataFrame by using its name

# .loc[] — Label-based Selection
# Use .loc[] when you're selecting rows/columns by names (labels).
# Example:

import pandas as pd

df = pd.DataFrame({
    'Name': ['AK', 'BK', 'CK'],
    'Age': [25, 30, 35]
}, index=['a', 'b', 'c'])

# Get row with index label 'b'
print(df.loc['b'])

# Get rows 'a' through 'b' (inclusive)
print(df.loc['a':'b'])

# Get Name column of rows 'a' and 'c'
print(df.loc[['a', 'c'], 'Name'])




# .iloc[] — Integer Position-based Selection
# Use .iloc[] when you're selecting rows/columns by numeric positions (like lists).
# Example:

# Get the 2nd row (index 1)
print(df.iloc[1])

# Get the first two rows
print(df.iloc[0:2])  # end excluded

# Get value at 0th row and 1st column
print(" Value for 0th Row 1st col:", df.iloc[0, 1])
print(" Value for 1st Row 1st col:", df.iloc[1, 1])
print(" Value for 2nd Row 1st col:", df.iloc[2, 1])



In [None]:
# Filtering: You can filter rows based on conditions.

import numpy as np
import pandas as pd

# Define the data
data = np.array([[1, 'AK', '25'], [2, 'BK', '30'], [3, 'CK', '35']])

# Create the DataFrame
df = pd.DataFrame(data, columns=['ID', 'Name', 'Age'])

# # Convert the 'Age' column to integers
df['Age'] = df['Age'].astype(int)

# Print the original DataFrame
print(df)

# Filtering rows where Age > 30
filtered_df = df[df['Age'] > 30]
print("\nFiltered DataFrame (Age > 30):")
print(filtered_df) # This will work now  -- This filters the DataFrame to include only the rows where the condition is True.

In [None]:
Basic Aggregation: You can perform operations like sum(), mean(), min(), max() on columns.

print(df['Age'].mean())  # Mean of the Age column
print(df['Age'].sum())   # Sum of the Age column

In [None]:
# Dropping Columns: To remove a column from a DataFrame, you can use drop().

df = df.drop(columns=['Age'])
print(df)

In [None]:
# Reading and Writing DataFrames to External Files:

df.to_csv('output.csv', index=False)  # index=False prevents writing the index to the file
df.to_excel('output.xlsx', index=False)

# Example Code: Working with DataFrames

import pandas as pd

# Create a DataFrame from a dictionary
data = {
    'Name': ['AK', 'BK', 'CK', 'DK'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}

df = pd.DataFrame(data)

# Display the DataFrame
print("Original DataFrame:")
print(df)

# Accessing specific columns
print("\nAge Column:")
print(df['Age'])

# Filtering rows where Age > 30
filtered_df = df[df['Age'] > 30]
print("\nFiltered DataFrame (Age > 30):")
print(filtered_df)

# Adding a new column for Salary
df['Salary'] = [50000, 60000, 70000, 80000]
print("\nDataFrame with Salary Column:")
print(df)

# Writing DataFrame to CSV
df.to_csv(r'C:\Users\dines\Python_DF_Files\LTIMB7\output_dk1.csv', index=False)
# df.to_csv(r'D:\07_LTIM_Batches_ 2023-24\output_dk1.csv', index=False)