# I/0 Handling with NumPy
- I/0 handling refers to the process of inputting and outputting data to and from a computer system.
- This includes reading data from a variety of sources, such as files or databases, and writing data to different types of storage, such as hard drives or cloud storage.
- I/0 handling is a crucial aspect of computer programming as it allows programs to interact with the outside world and manipulate data.

Numpy provides several functions for I/O handling:
1. `numpy.loadtxt`: This function is used to load data from a text file or a CSV file into a numpy array.
2. `numpy.genfromtxt`: This function is used to load data from a text file or a CSV file into a numpy array, and can handle more complex data structures such as missing values, variable number of columns, etc.
3. `numpy.savetxt`: This function is used to save data from a numpy array to a text file or a CSV file.
4. `numpy.save`: This function is used to save data from a numpy array to a binary file. The data can be loaded later using the numpy.load function.
5. `numpy.load`: This function is used to load data from a binary file created using numpy.savel.

## 1. numpy.loadtxt

In [1]:
# Create a sample data.txt file
data = """1.0 2.0 3.0
4.0 5.0 6.0
7.0 8.0 9.0"""

with open('data.txt', 'w') as file:
    file.write(data)

print("demo data.txt file created.")

demo data.txt file created.


In [2]:
import numpy as np

# Let's assume we have a text file "data.txt" with the following content:
# 1.0 2.0 3.0
# 4.0 5.0 6.0
# 7.0 8.0 9.0

# Load data from the text file
data = np.loadtxt('data.txt')

print(data)

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]


## 2. numpy.genfromtxt

In [13]:
import numpy as np

# Create a CSV file with missing values
data = """1.0, 2.0, 3.0
4.0, , 6.0
7.0, 8.0, 9.0"""

with open('data_with_missing.csv', 'w') as file:
    file.write(data)

print("demo data_with_missing.csv file created.")

demo data_with_missing.csv file created.


In [14]:
# Load data from the CSV file, filling missing values with np.nan
data_loaded = np.genfromtxt('data_with_missing.csv', delimiter=',', filling_values=np.nan)

print(data_loaded)

[[ 1.  2.  3.]
 [ 4. nan  6.]
 [ 7.  8.  9.]]


## 3. numpy.savetxt

In [15]:
import numpy as np

# Sample data
data = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])

# Save the data to a text file "output.txt"
np.savetxt('output.txt', data)

# This will create a text file with the content:
# 1.0 2.0 3.0
# 4.0 5.0 6.0

## 4. numpy.save

In [16]:
import numpy as np

# Sample data
data = np.array([1.0, 2.0, 3.0, 4.0])

# Save the data to a binary file "output.npy"
np.save('output.npy', data)

## 5. numpy.load

In [17]:
import numpy as np

# Load data from the binary file "output.npy"
data = np.load('output.npy')

print(data)

[1. 2. 3. 4.]


# Masking in NumPy
- Masking in NumPy is a powerful technique for filtering or manipulating specific elements of an array based on a condition. This is done by creating a Boolean mask, which is an array of the same shape as the original array. Each element of the mask is either True or False, depending on whether the condition is met.

-> The Boolean mask is then used to index the original array, allowing you to:
- Extract elements that satisfy a condition.
- Modify elements that satisfy a condition.

-> How Masking Works:
- Create a Mask: A mask is a Boolean array where each element is determined by applying a condition to the original array.
- Apply the Mask: You can use the mask to select, modify, or filter elements from the original array.

## Extracting Elements Based on a Condition

In [18]:
import numpy as np

# Create a NumPy array
arr = np.array([1, 3, 5, 7, 9, 2, 4, 6])

# Create a Boolean mask where values are greater than 5
mask = arr > 5

# Apply the mask to the array (select elements greater than 5)
filtered_elements = arr[mask]

print("Original Array:", arr)
print("Mask:", mask)
print("Filtered Elements (greater than 5):", filtered_elements)

Original Array: [1 3 5 7 9 2 4 6]
Mask: [False False False  True  True False False  True]
Filtered Elements (greater than 5): [7 9 6]


## Modifying Elements Using a Mask

In [21]:
import numpy as np

# Create a NumPy array
arr = np.array([1, 3, 5, 7, 9, 2, 4, 6])

# Create a Boolean mask where values are greater than 5
mask = arr > 5

# Modify the elements greater than 5 to be 10
arr[mask] = 10

print("Modified Array:", arr)

Modified Array: [ 1  3  5 10 10  2  4 10]


## Using Masking to Handle Missing Data

In [22]:
import numpy as np

# Create an array with NaN values
arr = np.array([1.0, 2.0, np.nan, 4.0, 5.0])

# Create a mask for NaN values
mask = np.isnan(arr)

# Replace NaN values with 0
arr[mask] = 0

print("Array after replacing NaN with 0:", arr)

Array after replacing NaN with 0: [1. 2. 0. 4. 5.]


# Structured Array

- Key Points:
1. Heterogeneous Data: A structured array allows each field (column) to have a different data type (e.g., integers, strings, floats).
2. Named Fields: Each column in the array has a name, making it easy to access specific data.
3. Custom Data Types: You can define custom data types for fields (e.g., using datetime or other custom types).
4. Efficient Storage: Structured arrays are efficient in terms of both memory and data access, as they store data in a compact, organized w

## 1. Creating a Structured Array

In [26]:
import numpy as np

# Define the data type (dtype) for the structured array
dtype = [('name', 'U10'),  # 'U10' means string of 10 characters
         ('age', 'i4'),     # 'i4' means 4-byte integer
         ('height', 'f4')]  # 'f4' means 4-byte float

# Create the structured array
structured_array = np.array([('Alice', 25, 5.6), 
                             ('Bob', 30, 5.8)], dtype=dtype)

print(structured_array)

[('Alice', 25, 5.6) ('Bob', 30, 5.8)]


## Accessing Data by Field Name
You can access individual fields by their names:

In [27]:
# Access the 'age' field
ages = structured_array['age']
print(ages)  # Output: [25 30]

[25 30]


## Creating Nested Structured Arrays
You can also have nested structures, where one field is itself a structured array:

In [28]:
# Nested structured array
dtype_nested = [('name', 'U10'), 
                ('address', [('street', 'U10'), ('city', 'U10')])]

nested_array = np.array([('Alice', ('Main St', 'New York'))], dtype=dtype_nested)

print(nested_array)

[('Alice', ('Main St', 'New York'))]
