# CPU bound

CPU bound means the program is bottlenecked by the CPU (Central Processing Unit). When your program is waiting for I/O (e.g., disk read/write, network read/write), the CPU is free to do other tasks, even if your program is stopped. The speed of your program will mostly depend on how fast that I/O can happen; if you want to speed it up, you'll need to speed up the I/O. If your program is running lots of program instructions and not waiting for I/O, then it's CPU bound. Speeding up the CPU will make the program run faster.

In either case, the key to speeding up the program might not be to speed up the hardware but to optimize the program to reduce the amount of I/O or CPU it needs. Or you can have it do I/O while it also does CPU-intensive work. CPU bound implies that upgrading the CPU or optimizing code will improve the overall computing performance.

**psutil (process and system utilities)** is a cross-platform library for retrieving information on running processes and system utilization (CPU, memory, disks, network, sensors) in Python. It's mainly useful for system monitoring, profiling, and limiting process resources and management of running processes. Install the psutil python library using pip3:

pip3 install psutil # install psutil module\
python3 # open python interpreter\
**\>> import psutil**\
**\>> psutil.cpu_percent()**

This shows that CPU utilization is low. Here, you have a CPU with multiple cores; this means one fully loaded CPU thread/virtual core equals 1.2% of total load. So, it only uses one core of the CPU regardless of having multiple cores.

After checking CPU utilization, you noticed that they're not reaching the limit.

So, you check the CPU usage, and it looks like the script only uses a single core to run. But your server has a bunch of cores, which means the task is CPU-bound.

Now, using psutil.disk_io_counters() and psutil.net_io_counters() you'll get byte read and byte write for disk I/O and byte received and byte sent for the network I/O bandwidth. For checking disk I/O, you can use the following command:

**psutil.disk_io_counters()**\
**psutil.net_io_counters()**

# Basics rsync command

rsync(remote sync) is a utility for efficiently transferring and synchronizing files between a computer and an external hard drive and across networked computers by comparing the modification time and size of files. One of the important features of rsync is that it works on the delta transfer algorithm, which means it'll only sync or copy the changes from the source to the destination instead of copying the whole file. This ultimately reduces the amount of data sent over the network.

The basic syntax of the rsync command is below:

> rsync [Options] [Source-Files-Dir] [Destination]

1. Copy or sync files locally: \
  rsync -zvh [Source-Files-Dir] [Destination]

2. Copy or sync directory locally: \
  rsync -zavh [Source-Files-Dir] [Destination]

3. Copy files and directories recursively locally: \
  rsync -zrvh [Source-Files-Dir] [Destination]

**Example**\
import subprocess\
src = "\<source-path>" # replace \<source-path> with the source directory\
dest = "\<destination-path>" # replace \<destination-path> with the destination directory\

subprocess.call(["rsync", "-arq", src, dest])

# Multiprocessing

Now, when you go through the hierarchy of the subfolders of /data/prod, data is from different projects (e.g., , beta, gamma, kappa) and they're independent of each other. So, in order to efficiently back up parallelly, use multiprocessing to take advantage of the idle CPU cores. Initially, because of CPU bound, the backup process takes more than 20 hours to finish, which isn't efficient for a daily backup. Now, by using multiprocessing, you can back up your data from the source to the destination parallelly by utilizing the multiple cores of the CPU.

Now, you'll get the Python script **multisync.py** for practice in order to understand how multiprocessing works. We used the Pool class of the multiprocessing Python module. Here, we define a run method to perform the tasks. Next, we have a few tasks. Create a pool object of the Pool class of a specific number of CPUs your system has by passing a number of tasks you have. Start each task within the pool object by calling the map instance method, and pass the run function and the list of tasks as an argument.

In [None]:
# multisync.py 

#!/usr/bin/env python3
from multiprocessing import Pool
def run(task):
  # Do something with task here
    print("Handling {}".format(task))
if __name__ == "__main__":
    tasks = ['task1', 'task2', 'task3']
    # Create a pool of specific number of CPUs
    p = Pool(len(tasks))
    # Start each task within the pool
    p.map(run, tasks)

# run to see the output:
sudo chmod +x ~/scripts/multisync.py
./scripts/multisync.py

# Back up scripts 
**back up all the files in a directory into another directory with faster approach / processing**

In [None]:


#!/usr/bin/env python
import subprocess
import os
from multiprocessing import Pool  # !!! try to understand what this multiprocessing as well as Pool do !!!

def backup(src):
  dest = os.getcwd() +  "/data/prod_backup/"
  subprocess.call(["rsync", "-arq", src, dest])

if __name__ == "__main__":
  src_path = os.getcwd() + "/data/prod/"
  list_of_files = os.listdir(src_path) # !!! show all the files in the directory by giving the path !!!
  all_files = []

  for file_name in list_of_files:
    list_path = os.path.join(src_path + file_name) # !!! create the whole path of the file by using os.path.join fuction !!!
    all_files.append(list_path)

  pool_of_CPU = Pool(len(all_files))
  pool_of_CPU.map(backup(src_path), all_files) # !!!

