# HADOOP-18146: ABFS: Add changes for expect hundred continue header with append requests
**Problem description**:

Summary: Heavy load to Azure storage constantly gets throttled, and the FE (Frontend) node shows high resource (memory) utilization

Problem: The payload of the throttled operations will still gets buffered at Http.Sys, a windows
    http server.

    Env: ABFS (Azure storage)

    Trigger: Heavy load with append request (supposedly large request)

    Implication:
        1. High memory pressure at frontend node
        2. Large amount of TCP SYN packet (used for TCP connection establishment/handshake)

    Fix:
        1. Add a new parameter to enable Expect 100 header in all client's append requests
        2. When Expect 100 header is included in the request, server will respond with 100 continue
           if the operation is not throttled. The client will then transfer the payload upon receiving it
        3. Result of the patch shows that: a). The ratio of TCP SYN packet count with and without
           expect hundred continue enabled is 0.32 : 3 on average. b). The ingress into the machine
           at TCP level is almost 3 times lesser with hundred continue enabled which implies a lot of BW save.
Jira issue: https://issues.apache.org/jira/browse/HADOOP-18146

## Initialize Jupyter notebook environment 

We will setup which chameleon site to use, which project to use, and necessary packages (mainly python-chi) that will be used in later experiments.

In [1]:
import chi
from chi import lease, server
import os
import keystoneauth1, blazarclient
import uuid

CLOUD_SITE = "CHI@UC"

PROJECT_ID = "CHI-231080"
chi.set("project_name", PROJECT_ID)
chi.use_site(CLOUD_SITE)
uid_this = str(uuid.uuid4())

print(f"UID for this experiment is {uid_this}")

Now using CHI@UC:
URL: https://chi.uc.chameleoncloud.org
Location: Argonne National Laboratory, Lemont, Illinois, USA
Support contact: help@chameleoncloud.org
UID for this experiment is 94813b23-7906-4956-ac87-7d2a7fc8a52a


## Make reservation 

We will create lease reserving 1) a floating IP (i.e., public IP we are going to connect to) and 2) a bare metal server

In [2]:
reservations = []
try:
    print("Creating lease...")
    lease.add_fip_reservation(reservations, count=1)
    lease.add_node_reservation(
        reservations, count=1,
        resource_properties=["==","$node_type","compute_cascadelake_r"])

    start_date, end_date = lease.lease_duration(days=1)
    # start_date, end_date = lease.lease_duration(hours=4)

    l = lease.create_lease(
        f"HADOOP-18146-{uid_this}",
        reservations, 
        start_date=start_date,
        end_date=end_date)
    cloud_lease_id = l["id"]

    print("Waiting for lease to start ... This can take upto 1 min ...")
    lease.wait_for_active(cloud_lease_id)
    print("Lease is now active!")
except keystoneauth1.exceptions.http.Unauthorized as e:
    print("Unauthorized.\nDid set your project name and and run the code in the first cell?")
except blazarclient.exception.BlazarClientException as e:
    print(f"There is an issue making the reservation. Check the calendar to make sure a node is available.")
    print("https://chi.uc.chameleoncloud.org/project/leases/calendar/host/")
    print(e)
except Exception as e:
    print("An unexpected error happened.")
    print(e)


Creating lease...
Waiting for lease to start ... This can take upto 1 min ...
Lease is now active!


In [3]:
cloud_lease_id

'eb716356-9c12-4b51-a1c6-a8ed7abdbc93'

## Launch server

We now launch the node on the bare metal server we just reserved in the above cell. This could take up to 10-15 mins.

In [5]:
s = server.create_server(
    f"HADOOP-18146-{uid_this}", 
    image_name="CC-Ubuntu20.04",
    reservation_id=lease.get_node_reservation(cloud_lease_id))

print("Waiting for server to start ...")
server.wait_for_active(s.id)
print("Done")

# image = "CC-Ubuntu18.04"
# reservation_req_time = int(time.time())
# s = server.create_server(
#     f"{os.getenv('USER')}-HADOOP-18146-{reservation_req_time}", 
#     image_name=image,
#     reservation_id=lease.get_node_reservation(lease_id)
# )

# print("Waiting for server to start ...")
# server.wait_for_active(s.id)
# print("Done")

Waiting for server to start ...
Done


Wait for the server's operating system to boot up and be ready for SSH connection. This could take 5-10 mins.

In [6]:
floating_ip = lease.get_reserved_floating_ips(cloud_lease_id)[0]
server.associate_floating_ip(s.id, floating_ip_address=floating_ip)

print(f"Waiting for SSH connectivity on {floating_ip} ...")
timeout = 60*2
import socket
import time
# Repeatedly try to connect via SSH.
start_time = time.perf_counter()
while True:
    try:
        with socket.create_connection((floating_ip, 22), timeout=timeout):
            print("Connection successful")
            break
    except OSError as ex:
        time.sleep(10)
        if time.perf_counter() - start_time >= timeout:
            print(f"After {timeout} seconds, could not connect via SSH. Please try again.")

Waiting for SSH connectivity on 192.5.86.238 ...
Connection successful


In [7]:
from chi import ssh

with ssh.Remote(floating_ip) as conn:
    # generate rsa key pari
    conn.run("ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa ")
    conn.run("cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys")
    conn.run("chmod 0600 ~/.ssh/authorized_keys")



Generating public/private rsa key pair.
Your identification has been saved in /home/cc/.ssh/id_rsa
Your public key has been saved in /home/cc/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:wGt3K5sChs0ERXGjxNEWeWaS1caN9f0x+aT7JDeWbPk cc@hadoop-18146-94813b23-7906-4956-ac87-7d2a7fc8a52a
The key's randomart image is:
+---[RSA 3072]----+
|   +*++=.o +.    |
|  ...+*.+ = .. ..|
|   ...o= .    .+o|
|    .  o       o=|
|   =  o S .   . o|
|  . =. . . .  ..o|
|   . .  . .   oBo|
|      .  +    o=o|
|       .o       E|
+----[SHA256]-----+


## Experiment

### Preparation

We will upload and run the preparation script `prepare.sh` on the server to setup JVM.

In [8]:
from chi import ssh

with ssh.Remote(floating_ip) as conn:
    conn.put("./script/prepare.sh")
    conn.run("bash prepare.sh")

Setting up environment...


debconf: unable to initialize frontend: Dialog
debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 


Cloning hadoop code base...


Cloning into 'hadoop'...
Updating files: 100% (15369/15369), done.


Done!


### Run experiment

In [2]:
from chi import ssh
with ssh.Remote(floating_ip) as conn:
    conn.put("./script/run.sh")
    conn.run("bash run.sh")

NameError: name 'floating_ip' is not defined

In [3]:
from chi import ssh
with ssh.Remote(floating_ip) as conn:
    conn.put("./script/run_test.sh")
    conn.run("bash run_test.sh")

NameError: name 'floating_ip' is not defined