assignment 1.1

Provide estimates for the size of various data items. Please explain how you arrived at the estimates for the size of each item by citing references or providing calculations.

Assume all videos are 30 frames per second

HEVC stands for High Efficiency Video Coding

See the Wikipedia article on display resolution for information on HD (1080p) and 4K UHD resolutions.

In [4]:
import pandas as pd
from collections import namedtuple
from dataclasses import dataclass

InformationUnit = namedtuple('InformationUnit', ['name', 'size'])
DataItem = namedtuple('DataItem', ['name', 'size', 'unit'])
LatencyItem = namedtuple('LatencyItem', ['name', 'time', 'unit', 'explanation'])

information_units = dict(
    B=InformationUnit("byte", 1),
    KB=InformationUnit("kilobyte", 1e3),
    MB=InformationUnit("megabyte", 1e6),
    GB=InformationUnit("gigabyte", 1e9),
    TB=InformationUnit("terabyte", 1e12),
    PB=InformationUnit("petabyte", 1e15),
    EB=InformationUnit("exabyte", 1e18),
    ZB=InformationUnit("zettabyte", 1e21),
    YB=InformationUnit("yottabyte", 1e24)
)

time_units = {
    "ms": "millisecond",
    "s": "second",
    "min": "minute"
}

def check_data_items(items):
    # Checks to see if data sizes and units are filled out correctly
    for item in items:
        assert item.size > 0, 'Size for "{}" should be greater than zero'.format(item.name)
        assert item.unit in information_units, 'Unit "{}" not in units dictionary'.format(item.unit)
        
def check_latency_items(items):
    # Checks to see if time sizes and units are filled out correctly
    for item in items:
        # assert item.time > 0, 'Time for "{}" should be greater than zero'.format(item.name)
        assert item.unit in time_units, 'Unit "{}" not in time units dictionary'.format(item.unit)
        assert item.explanation != "FILL IN THE EXPLANATION HERE", 'Fill in explanation for "{}"'.format(item.name)

In [5]:
items1_1 = [
    DataItem('1 Byte', 1, 'B'),
    DataItem("128 character message", 128, "B"),
    DataItem("1024x768 PNG image", 2.25, "MB"),
    DataItem("1024x768 RAW image", 1.13, "MB"),
    DataItem("HD (1080p) HEVC Video (15 minutes)", 6668, "MB"),
    DataItem("HD (1080p) Uncompressed Video (15 minutes)", 161237.12, "MB"),
    DataItem("4K UHD HEVC Video (15 minutes)", 26763.36, "MB"),
    DataItem("4k UHD Uncompressed Video (15 minutes)", 26984.38, "MB"),
    DataItem("Human Genome (Uncompressed)", 0.75, "GB"),
]

# Checks if items properly updated
check_data_items(items1_1)
    
df1_1 = pd.DataFrame(items1_1)
df1_1.style.hide_index()

  df1_1.style.hide_index()


name,size,unit
1 Byte,1.0,B
128 character message,128.0,B
1024x768 PNG image,2.25,MB
1024x768 RAW image,1.13,MB
HD (1080p) HEVC Video (15 minutes),6668.0,MB
HD (1080p) Uncompressed Video (15 minutes),161237.12,MB
4K UHD HEVC Video (15 minutes),26763.36,MB
4k UHD Uncompressed Video (15 minutes),26984.38,MB
Human Genome (Uncompressed),0.75,GB


In [6]:
df1_1['size'] = df1_1['size'].apply(lambda x: round(x,2))

In [7]:
df1_1

Unnamed: 0,name,size,unit
0,1 Byte,1.0,B
1,128 character message,128.0,B
2,1024x768 PNG image,2.25,MB
3,1024x768 RAW image,1.13,MB
4,HD (1080p) HEVC Video (15 minutes),6668.0,MB
5,HD (1080p) Uncompressed Video (15 minutes),161237.12,MB
6,4K UHD HEVC Video (15 minutes),26763.36,MB
7,4k UHD Uncompressed Video (15 minutes),26984.38,MB
8,Human Genome (Uncompressed),0.75,GB


Assignment 1.2
Using the estimates for data sizes in the previous part, determine how much storage space you would need for the following items.

Twitter statistics estimates 500 million tweets are sent each day. For simplicity, assume each tweet is 128 characters.

See the Snappy Github repository for estimates of Snappy's performance.

Instagram statistics estimates over 100 million videos and photos are uploaded to Instagram every day. Assume that 75% of those items are 1024x768 PNG photos.

YouTube statistics estimates 500 hours of video is uploaded to YouTube every minute. For simplicity, assume all videos are HD quality encoded using HEVC at 30 frames per second.

In [8]:
items1_2 = [
    DataItem("Daily Twitter Tweets (Uncompressed)", 0.0582, "TB"),
    DataItem("Daily Twitter Tweets (Snappy Compressed)", 0.0349, "PB"),
    DataItem("Daily Instagram Photos", 10987500, "GB"),
    DataItem("Daily YouTube Videos", 321.46, "TB"),
    DataItem("Yearly Twitter Tweets (Uncompressed)", 0.0207, "PB"),
    DataItem("Yearly Twitter Tweets (Snappy Compressed)", 0.0384, "PB"),
    DataItem("Yearly Instagram Photos", 4.98, "PB"),
    DataItem("Yearly YouTube Videos", 0.01875, "PB"),
]

# Checks if items properly updated
check_data_items(items1_2)

df1_2 = pd.DataFrame(items1_2)
df1_2.style.hide_index()


  df1_2.style.hide_index()


name,size,unit
Daily Twitter Tweets (Uncompressed),0.0582,TB
Daily Twitter Tweets (Snappy Compressed),0.0349,PB
Daily Instagram Photos,10987500.0,GB
Daily YouTube Videos,321.46,TB
Yearly Twitter Tweets (Uncompressed),0.0207,PB
Yearly Twitter Tweets (Snappy Compressed),0.0384,PB
Yearly Instagram Photos,4.98,PB
Yearly YouTube Videos,0.01875,PB


Assignment 1.3
Provide estimates of the one way latency for each of the following items. Please explain how you arrived at the estimates for each item by citing references or providing calculations.

In [12]:
los_angeles_to_amsterdam_explanation = """
Latency = Distance/Approx Speed of Light in fiber-optic cables = 8000 km/(200000 km/s)
"""
low_earth_orbit_satellite_explanation = """
Latency = (2*altitude)/speed of light = (2*500 km)/(299792 km/s)
"""
geostationary_satellite_explanation = """
Latency = Distance/Speed of Light = 35786 km/(299792 km/s)
"""
earth_to_the_moon_explanation = """
Latency = Distance/Speed of Light = 384400 km/(299792 km/s)
"""
earth_to_mars_explanation = """
Latency = Distance/Speed of Light = (225000000 km/(299792 km/s))/60
"""

# TODO: Fill in the estimated times for each item

items1_3 = [
    LatencyItem(
        "Los Angeles to Amsterdam",
        40,
        "ms",
        los_angeles_to_amsterdam_explanation.strip()
    ),
    LatencyItem(
        "Low Earth Orbit Satellite",
        3.34,
        "ms",
        low_earth_orbit_satellite_explanation.strip()
    ),
    LatencyItem(
        "Geostationary Satellite",
        119,
        "ms",
        geostationary_satellite_explanation.strip()
    ),
    LatencyItem(
        "Earth to the Moon",
        1282,
        "ms",
        earth_to_the_moon_explanation.strip()
    ),
    LatencyItem(
        "Earth to Mars",
        12.5,
        "min",
        earth_to_mars_explanation.strip()
    ),
]
# Checks if items properly updated
check_latency_items(items1_3)

df1_3 = pd.DataFrame(items1_3)
df1_3.style.hide_index()

  df1_3.style.hide_index()


name,time,unit,explanation
Los Angeles to Amsterdam,40.0,ms,Latency = Distance/Approx Speed of Light in fiber-optic cables = 8000 km/(200000 km/s)
Low Earth Orbit Satellite,3.34,ms,Latency = (2*altitude)/speed of light = (2*500 km)/(299792 km/s)
Geostationary Satellite,119.0,ms,Latency = Distance/Speed of Light = 35786 km/(299792 km/s)
Earth to the Moon,1282.0,ms,Latency = Distance/Speed of Light = 384400 km/(299792 km/s)
Earth to Mars,12.5,min,Latency = Distance/Speed of Light = (225000000 km/(299792 km/s))/60
