# **Chapter #6 NumPy Data Processing & Array I/O**

#### Reading file from Goole drive

### **Mount Google Drive:**

In [None]:
from google.colab import drive
drive.mount("/content/drive", force_remount=True)

Mounted at /content/drive


In [None]:
!ls -ltr  "/content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data"

total 8
drwx------ 2 root root 4096 Jan 22 15:14 input
drwx------ 2 root root 4096 Jan 22 15:14 output


### **Reading the file from Google drive**

In [None]:
# mounting the drive

from google.colab import drive
drive.mount("/content/drive", force_remount=True)

# reading the file

file_path = '/content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/input/sample.txt'
with open(file_path, 'r') as file:
  file_content = file.read()

print(file_content)

Mounted at /content/drive
Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.[31]

Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.[32][33]

Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 0.9.0.[34] Python 2.0 was released in 2000. Python 3.0, released in 2008, was a major revision not completely backward-compatible with earlier versions. Python 2.7.18, released in 2020, was the last release of Python 2.[35]


### **Unmount the drive**

In [None]:
# unmount the drive
# drive.flush_and_unmount()

In [None]:
# mounting the drive
from google.colab import drive
drive.mount("/content/drive", force_remount=True)

Mounted at /content/drive


In [None]:
# Case 1: Basic usage of np.loadtxt
import numpy as np
input_file_path = '/content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/input/sample1.txt'

# Reading the data and data type will be inferred automatically
data = np.loadtxt(input_file_path, delimiter=',')
print("Loaded Data:", data)
print("Shape:", data.shape)
print("Data Type:", data.dtype)

Loaded Data: [[10. 20. 30.]
 [40. 50. 60.]
 [70. 80. 90.]]
Shape: (3, 3)
Data Type: float64


In [None]:
# Case 2: Basic usage of np.loadtxt
import numpy as np
input_file_path = '/content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/input/sample1.txt'

# Reading the data with specified data type
data = np.loadtxt(input_file_path, delimiter=',',dtype=int)
print("Loaded Data:", data)
print("Shape:", data.shape)
print("Data Type:", data.dtype)

Loaded Data: [[10 20 30]
 [40 50 60]
 [70 80 90]]
Shape: (3, 3)
Data Type: int64


In [None]:
# Case 2: Basic usage of np.loadtxt
import numpy as np
input_file_path = '/content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/input/sample1.txt'

# Reading the data with specified data type
data = np.loadtxt(input_file_path, delimiter=',',dtype=int)
print("Loaded Data:", data)
print("Shape:", data.shape)
print("Data Type:", data.dtype)

In [None]:
# Case 3: Handling missing values with a custom converter
import numpy as np
import re

input_file_path = '/content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/input/sample2.txt'

# Define a custom converter function
def custom_converter(value):
    cleaned_value = re.sub(r'[^\d.Ee+-]', '', value.decode())  # Remove non-numeric characters
    if (cleaned_value == 'NA' or cleaned_value == ''):
        return np.nan
    else:
        return float(cleaned_value)

# Read data from the file, applying the updated custom converter
data = np.loadtxt(input_file_path, delimiter=',',converters={2: custom_converter})

print("Loaded Data with Handling of 'NA' Values:")
print(data)
print("Shape:", data.shape)
print("Data Type:", data.dtype)

Loaded Data with Handling of 'NA' Values:
[[10. 20. 30.]
 [40. 50. nan]
 [70. 80. 90.]]
Shape: (3, 3)
Data Type: float64


In [None]:
# case 4: Skip header and reading specified columns with converters

# Define the input file path
input_file_path = '/content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/input/emp_sample.txt'

# Use np.loadtxt with skiprows and usecols parameters
data = np.loadtxt(input_file_path, delimiter='|',skiprows=1, usecols=(0, 4),converters={0: int, 4: float})
print("Loaded Data:",data)
print("Shape:", data.shape)
print("Data Type:", data.dtype)


Loaded Data: [[ 1001. 60000.]
 [ 1002. 68000.]
 [ 1003. 75000.]
 [ 1004. 62000.]
 [ 1005. 70000.]]
Shape: (5, 2)
Data Type: float64


### **Use of genfromtxt() f**

In [None]:
# Example #1 - Loading data from a text file and handling missing values
import numpy as np
input_file_path = '/content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/input/sample3.txt'

# Read data from the file with handling missing values
filled_with_constant_single_value = np.genfromtxt(input_file_path,delimiter=',',missing_values=['N/A','???',''],filling_values=np.nan)
filled_with_dictionary_values = np.genfromtxt(input_file_path,delimiter=',',missing_values=['N/A','???',''],filling_values={0:-1,1:np.nan,2:0})


print("Filled with Constant Single Value:", filled_with_constant_single_value)
print("Filled with Dictionary Values:", filled_with_dictionary_values)


Filled with Constant Single Value: [[10. 20. 30.]
 [40. nan nan]
 [nan 80. 90.]]
Filled with Dictionary Values: [[10. 20. 30.]
 [40. nan  0.]
 [-1. 80. 90.]]


In [None]:
# Example #2 - Loading data with specified column (using column name and column index)
import numpy as np
input_file_path = '/content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/input/emp_sample.txt'

# Loading data with names=True and usecols with column names
data = np.genfromtxt(input_file_path,delimiter='|',names=True,usecols=('emp_id', 'salary'),dtype=(int, float))

print("Loaded Data:",data)
print("Shape:", data.shape)
print("Data Type:", data.dtype)


Loaded Data: [(1001, 60000) (1002, 68000) (1003, 75000) (1004, 62000) (1005, 70000)]
Shape: (5,)
Data Type: [('emp_id', '<i8'), ('salary', '<i8')]


## **Exporting Data into files**

In [None]:
# Using function np.save(file, array, allow_pickle=False, fix_imports=True) to save data
import numpy as np
output_file_path = '/content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/output/fn_save.npy'

#create an array
array = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]])
print("Array:",array)

#save the array
np.save(output_file_path, array)

#Read the array
data = np.load(output_file_path)
print("\nLoaded Data from File:",data)


Array: [[10 20 30]
 [40 50 60]
 [70 80 90]]

Loaded Data from File: [[10 20 30]
 [40 50 60]
 [70 80 90]]


In [None]:
# Seeing the content of the file using cat command
!cat '/content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/output/fn_save.npy'

�NUMPY v {'descr': '<i8', 'fortran_order': False, 'shape': (3, 3), }                                                          

                     (       2       <       F       P       Z       

In [None]:
# using np.savez(file, *args, **kwds) to save data
import numpy as np
output_file_path = '/content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/output/student_marks.npz'

math_marks = np.array([90, 80, 92, 75, 85])
english_marks = np.array([96, 90, 85, 88, 95])
art_marks = np.array([98, 93, 82, 88, 90])

# Saving the arrays in a single file
np.savez(output_file_path, math = math_marks, english = english_marks, art = art_marks)
print("Saved Data to File:",output_file_path)

Saved Data to File: /content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/output/student_marks.npz


In [None]:
!ls -ltr  '/content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/output/'

total 2
-rw------- 1 root root 200 Jan 29 07:20 fn_save.npy
-rw------- 1 root root 866 Jan 29 08:19 student_marks.npz


In [None]:
import numpy as np

# Input file path
student_marks_file = '/content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/output/student_marks.npz'

# Load the .npz file containing student marks
data = np.load(student_marks_file)

# Accessing individual arrays (marks in each subject) using their keywords
math_marks = data['math']
science_marks = data['english']
literature_marks = data['art']

# Printing the arrays (marks)
print("Math Marks:", math_marks)
print("English Marks:", science_marks)
print("Art Marks:", literature_marks)


Math Marks: [90 80 92 75 85]
English Marks: [96 90 85 88 95]
Art Marks: [98 93 82 88 90]


In [None]:
# Using np.savez_compressed(file, *args, **kwds) to save data
import numpy as np
output_file_path = '/content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/output/student_marks_compressed.npz'

# Creating arrays for math, English, and art marks
math_marks = np.array([90, 80, 92, 75, 85])
english_marks = np.array([96, 90, 85, 88, 95])
art_marks = np.array([98, 93, 82, 88, 90])

# Saving the arrays in a single compressed file
np.savez_compressed(output_file_path, math=math_marks, english=english_marks, art=art_marks)
print("Saved Data to File:", output_file_path)


Saved Data to File: /content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/output/student_marks_compressed.npz


In [None]:
import numpy as np

# Input file path
student_marks_file_compressed =  '/content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/output/student_marks_compressed.npz'

# Load the .npz file containing student marks
data = np.load(student_marks_file_compressed)

# Accessing individual arrays (marks in each subject) using their keywords
math_marks = data['math']
science_marks = data['english']
literature_marks = data['art']

# Printing the arrays (marks)
print("Math Marks:", math_marks)
print("English Marks:", science_marks)
print("Art Marks:", literature_marks)


Math Marks: [90 80 92 75 85]
English Marks: [96 90 85 88 95]
Art Marks: [98 93 82 88 90]


In [None]:
#save_txt()
import numpy as np

# Create a NumPy array
data = np.array([[1001, 50000.40],
                 [1002, 55000.00],
                 [1003, 60000.00],
                 [1004, 52000.00],
                 [1005, 58000.77],
                 [1006, 62000.00],
                 [1007, 54000.00],
                 [1008, 57000.98],
                 [1009, 61000.00],
                 [1011, 53000.00]])

# Define the file path and name
output_file = '/content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/output/emp_data_export.csv'

# Customize formatting and headers
header = 'emp_id, salary'
footer = '######End of Data#######'
delimiter = ','  # Comma-separated values
newline = '\r\n'  # Carriage return and line feed as line terminator
fmt = '%.2f'  # Format numbers with two decimal places

# Save the array to a text file
np.savetxt(output_file, data, delimiter=delimiter, newline=newline, fmt=fmt, header=header, footer=footer)
print(" Data ahs been saved at :",output_file)

 Data ahs been saved at : /content/drive/My Drive/Numpy_for_Numerical_Computing_&_Data_Analysis/data/output/emp_data_export.csv


In [None]:
# from io import StringIO   # StringIO behaves like a file object
# c = StringIO("0 1\n2 3")
# np.loadtxt(c)

