Author: Miao Cai [miao.cai@slu.edu](miao.cai@slu.edu)

# Storage on computers

| Byte     | B  | storage on disk |
| -------- | -- | --------------- |
| Byte     | B  | $2^3$ bits   |
| Kilobyte | KB | $2^10$ Bytes |
| Megabyte | MB | $2^20$ Bytes |
| Gigabyte | GB | $2^30$ Bytes |
| Terabyte | TB | $2^40$ Bytes |

The jump from RAM to disk to cloud, the processing time jumps from seconds to minutes to days.

In [6]:
import psutil, os
import pandas as pd
def memory_footprint():
    '''Returns memory (in MB) being used by Python process'''
    mem = psutil.Process(os.getpid()).memory_info().rss
    return(mem/1024**2)

In [2]:
import numpy as np
before = memory_footprint()
N = ( 1024**2 ) // 8 # number of floats that fill 1 MB
x = np.random.randn(50*N) # Random array filling 50 MB
after = memory_footprint()

print('Memory before: {} MB'.format(before))
print('Memory after: {} MB'.format(after))

Memory before: 54.2578125 MB
Memory after: 104.28515625 MB


## Allocating memory for a computation

In [4]:
before = memory_footprint()
x ** 2 # compute, bute does not bind result to a variable
after = memory_footprint()
print('Extra memory obtained: {} MB'.format(after - before))

Extra memory obtained: 50.0 MB


In [5]:
x.nbytes # memory footprint in bytes (B)
x.nbytes // (1024 ** 2) # memory footprint in megabytes (MB)

52428800

In [10]:
df = pd.DataFrame(x)
df.info()
df.memory_usage(index = False)
df.memory_usage(index = False) // (1024 ** 2)

0    50
dtype: int64