Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS EBS volume support #5138

Merged
merged 40 commits into from Apr 10, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
edf0292
Add initial support for Volumes to AWS
justinsb Mar 6, 2015
8fde691
Fix tests
justinsb Mar 26, 2015
ffadd55
Fix AWS region vs zone
justinsb Mar 26, 2015
6a4153f
Always create volumes in the active k8s zone
justinsb Mar 27, 2015
2812936
Simplify logic of pd.go
justinsb Mar 27, 2015
8908990
Fix merge problems
justinsb Apr 2, 2015
743f2ed
Fix tests
justinsb Apr 2, 2015
e56e2ee
Strip leading slash from url path when parsing aws volume
justinsb Apr 2, 2015
49543ac
Update IAM permissions for minion, to allow EBS
justinsb Apr 2, 2015
86ddc0f
Fix mountpoint values
justinsb Apr 2, 2015
ee72fa4
Tolerate volume being already-attached
justinsb Apr 2, 2015
f2184e0
Fix comment
justinsb Apr 2, 2015
6c823db
Small clean-ups
justinsb Apr 2, 2015
b9fd560
Add safe_format_and_mount script to aws
justinsb Apr 2, 2015
3549b30
Add missing import
justinsb Apr 2, 2015
a366f9e
Create the /usr/share/google dir in salt
justinsb Apr 2, 2015
f0cedd7
More logging around error causes
justinsb Apr 3, 2015
cdc569a
Parse the pdName from the volume mount
justinsb Apr 3, 2015
21beabd
Default attachment status to detached
justinsb Apr 3, 2015
3689bf0
Fix pd name parse
justinsb Apr 3, 2015
0101bf2
Fix detached-check logic, warn on multiple attachments
justinsb Apr 3, 2015
aa60510
Add comment about EBS status being a bit slow through API
justinsb Apr 3, 2015
95b68ae
Rename pdName -> volumeId for AWS persistent volumes
justinsb Apr 7, 2015
2e91fdd
Fixup merge mistakes
justinsb Apr 7, 2015
b3666ed
Add AWSPersistentDisk to fuzzer
justinsb Apr 7, 2015
9711e77
Rename AWSPersistentDisk -> AWSElasticBlockStore, aws-pd -> aws-ebs
justinsb Apr 7, 2015
a20484b
Apply latest changes from copy-and-pasted gce_pd
justinsb Apr 7, 2015
c7c9695
Add missing conversion for v1beta2
justinsb Apr 8, 2015
4e17677
Make fetching the aws instance id optional, so we can use it on e2e
justinsb Apr 8, 2015
5a887e8
Remove now-unused instanceId parameter from newAwsCloud
justinsb Apr 8, 2015
034412a
Support multiple k8s clusters
justinsb Mar 25, 2015
9561366
Provide more output during a disk delete
justinsb Apr 8, 2015
7e758fe
Grammar fix: s/a AWS/an AWS/g
justinsb Apr 9, 2015
46f9c2c
Style: Ebs -> EBS
justinsb Apr 9, 2015
2afc184
Style: awsId -> awsID
justinsb Apr 9, 2015
98c9ebb
Style: Aws -> AWS
justinsb Apr 9, 2015
933cf60
Style: volumeId -> volumeID
justinsb Apr 9, 2015
503e19e
Rename aws_pd -> aws_ebs
justinsb Apr 9, 2015
9462471
Fix some mistaken volumeId -> volumeID changes
justinsb Apr 9, 2015
7626914
Rename aws_pd.go -> aws_ebs.go, aws_pd_test.go -> aws_ebs_test.go
justinsb Apr 10, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions cluster/aws/config-default.sh
Expand Up @@ -27,6 +27,7 @@ NUM_MINIONS=${NUM_MINIONS:-4}
AWS_S3_REGION=${AWS_S3_REGION:-us-east-1}

INSTANCE_PREFIX="${KUBE_AWS_INSTANCE_PREFIX:-kubernetes}"
CLUSTER_ID=${INSTANCE_PREFIX}
AWS_SSH_KEY=${AWS_SSH_KEY:-$HOME/.ssh/kube_aws_rsa}
IAM_PROFILE_MASTER="kubernetes-master"
IAM_PROFILE_MINION="kubernetes-minion"
Expand Down
1 change: 1 addition & 0 deletions cluster/aws/config-test.sh
Expand Up @@ -23,6 +23,7 @@ NUM_MINIONS=${NUM_MINIONS:-2}
AWS_S3_REGION=${AWS_S3_REGION:-us-east-1}

INSTANCE_PREFIX="${KUBE_AWS_INSTANCE_PREFIX:-e2e-test-${USER}}"
CLUSTER_ID=${INSTANCE_PREFIX}
AWS_SSH_KEY=${AWS_SSH_KEY:-$HOME/.ssh/kube_aws_rsa}
IAM_PROFILE_MASTER="kubernetes-master"
IAM_PROFILE_MINION="kubernetes-minion"
Expand Down
15 changes: 15 additions & 0 deletions cluster/aws/templates/iam/kubernetes-minion-policy.json
Expand Up @@ -7,6 +7,21 @@
"Resource": [
"arn:aws:s3:::kubernetes-*"
]
},
{
"Effect": "Allow",
"Action": "ec2:Describe*",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "ec2:AttachVolume",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "ec2:DetachVolume",
"Resource": "*"
}
]
}
58 changes: 33 additions & 25 deletions cluster/aws/util.sh
Expand Up @@ -36,12 +36,12 @@ function json_val {

# TODO (ayurchuk) Refactor the get_* functions to use filters
# TODO (bburns) Parameterize this for multiple cluster per project
function get_instance_ids {
python -c "import json,sys; lst = [str(instance['InstanceId']) for reservation in json.load(sys.stdin)['Reservations'] for instance in reservation['Instances'] for tag in instance.get('Tags', []) if tag['Value'].startswith('${MASTER_TAG}') or tag['Value'].startswith('${MINION_TAG}')]; print ' '.join(lst)"
}

function get_vpc_id {
python -c 'import json,sys; lst = [str(vpc["VpcId"]) for vpc in json.load(sys.stdin)["Vpcs"] for tag in vpc.get("Tags", []) if tag["Value"] == "kubernetes-vpc"]; print "".join(lst)'
$AWS_CMD --output text describe-vpcs \
--filters Name=tag:Name,Values=kubernetes-vpc \
Name=tag:KubernetesCluster,Values=${CLUSTER_ID} \
--query Vpcs[].VpcId
}

function get_subnet_id {
Expand Down Expand Up @@ -69,7 +69,9 @@ function expect_instance_states {
function get_instance_public_ip {
local tagName=$1
$AWS_CMD --output text describe-instances \
--filters Name=tag:Name,Values=${tagName} Name=instance-state-name,Values=running \
--filters Name=tag:Name,Values=${tagName} \
Name=instance-state-name,Values=running \
Name=tag:KubernetesCluster,Values=${CLUSTER_ID} \
--query Reservations[].Instances[].NetworkInterfaces[0].Association.PublicIp
}

Expand Down Expand Up @@ -371,14 +373,15 @@ function kube-up {

$AWS_CMD import-key-pair --key-name kubernetes --public-key-material "file://$AWS_SSH_KEY.pub" > $LOG 2>&1 || true

VPC_ID=$($AWS_CMD describe-vpcs | get_vpc_id)
VPC_ID=$(get_vpc_id)

if [[ -z "$VPC_ID" ]]; then
echo "Creating vpc."
VPC_ID=$($AWS_CMD create-vpc --cidr-block 172.20.0.0/16 | json_val '["Vpc"]["VpcId"]')
$AWS_CMD modify-vpc-attribute --vpc-id $VPC_ID --enable-dns-support '{"Value": true}' > $LOG
$AWS_CMD modify-vpc-attribute --vpc-id $VPC_ID --enable-dns-hostnames '{"Value": true}' > $LOG
add-tag $VPC_ID Name kubernetes-vpc
add-tag $VPC_ID KubernetesCluster ${CLUSTER_ID}
fi

echo "Using VPC $VPC_ID"
Expand Down Expand Up @@ -467,6 +470,7 @@ function kube-up {
--user-data file://${KUBE_TEMP}/master-start.sh | json_val '["Instances"][0]["InstanceId"]')
add-tag $master_id Name $MASTER_NAME
add-tag $master_id Role $MASTER_TAG
add-tag $master_id KubernetesCluster ${CLUSTER_ID}

echo "Waiting for master to be ready"

Expand Down Expand Up @@ -548,6 +552,7 @@ function kube-up {

add-tag $minion_id Name ${MINION_NAMES[$i]}
add-tag $minion_id Role $MINION_TAG
add-tag $minion_id KubernetesCluster ${CLUSTER_ID}

MINION_IDS[$i]=$minion_id
done
Expand Down Expand Up @@ -700,25 +705,7 @@ EOF
}

function kube-down {
instance_ids=$($AWS_CMD describe-instances | get_instance_ids)
if [[ -n ${instance_ids} ]]; then
$AWS_CMD terminate-instances --instance-ids $instance_ids > $LOG
echo "Waiting for instances deleted"
while true; do
instance_states=$($AWS_CMD describe-instances --instance-ids $instance_ids | expect_instance_states terminated)
if [[ "$instance_states" == "" ]]; then
echo "All instances terminated"
break
else
echo "Instances not yet terminated: $instance_states"
echo "Sleeping for 3 seconds..."
sleep 3
fi
done
fi

echo "Deleting VPC"
vpc_id=$($AWS_CMD describe-vpcs | get_vpc_id)
vpc_id=$(get_vpc_id)
if [[ -n "${vpc_id}" ]]; then
local elb_ids=$(get_elbs_in_vpc ${vpc_id})
if [[ -n ${elb_ids} ]]; then
Expand All @@ -741,6 +728,27 @@ function kube-down {
done
fi

echo "Deleting instances in VPC: ${vpc_id}"
instance_ids=$($AWS_CMD --output text describe-instances \
--filters Name=vpc-id,Values=${vpc_id} \
Name=tag:KubernetesCluster,Values=${CLUSTER_ID} \
--query Reservations[].Instances[].InstanceId)
if [[ -n ${instance_ids} ]]; then
$AWS_CMD terminate-instances --instance-ids $instance_ids > $LOG
echo "Waiting for instances to be deleted"
while true; do
instance_states=$($AWS_CMD describe-instances --instance-ids $instance_ids | expect_instance_states terminated)
if [[ "$instance_states" == "" ]]; then
echo "All instances deleted"
break
else
echo "Instances not yet deleted: $instance_states"
echo "Sleeping for 3 seconds..."
sleep 3
fi
done
fi

echo "Deleting VPC: ${vpc_id}"
default_sg_id=$($AWS_CMD --output text describe-security-groups \
--filters Name=vpc-id,Values=$vpc_id Name=group-name,Values=default \
Expand Down
14 changes: 14 additions & 0 deletions cluster/saltbase/salt/helpers/init.sls
@@ -0,0 +1,14 @@
{% if grains['cloud'] is defined and grains['cloud'] == 'aws' %}
/usr/share/google:
file.directory:
- user: root
- group: root
- dir_mode: 755

/usr/share/google/safe_format_and_mount:
file.managed:
- source: salt://helpers/safe_format_and_mount
- user: root
- group: root
- mode: 755
{% endif %}
145 changes: 145 additions & 0 deletions cluster/saltbase/salt/helpers/safe_format_and_mount
@@ -0,0 +1,145 @@
#! /bin/bash
# Copyright 2013 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Mount a disk, formatting it if necessary. If the disk looks like it may
# have been formatted before, we will not format it.
#
# This script uses blkid and file to search for magic "formatted" bytes
# at the beginning of the disk. Furthermore, it attempts to use fsck to
# repair the filesystem before formatting it.

FSCK=fsck.ext4
MOUNT_OPTIONS="discard,defaults"
MKFS="mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 -F"
if grep -q '6\..' /etc/redhat-release; then
# lazy_journal_init is not recognized in redhat 6
MKFS="mkfs.ext4 -E lazy_itable_init=0 -F"
elif grep -q '7\..' /etc/redhat-release; then
FSCK=fsck.xfs
MKFS=mkfs.xfs
fi

LOGTAG=safe_format_and_mount
LOGFACILITY=user

function log() {
local readonly severity=$1; shift;
logger -t ${LOGTAG} -p ${LOGFACILITY}.${severity} -s "$@"
}

function log_command() {
local readonly log_file=$(mktemp)
local readonly retcode
log info "Running: $*"
$* > ${log_file} 2>&1
retcode=$?
# only return the last 1000 lines of the logfile, just in case it's HUGE.
tail -1000 ${log_file} | logger -t ${LOGTAG} -p ${LOGFACILITY}.info -s
rm -f ${log_file}
return ${retcode}
}

function help() {
cat >&2 <<EOF
$0 [-f fsck_cmd] [-m mkfs_cmd] [-o mount_opts] <device> <mountpoint>
EOF
exit 0
}

while getopts ":hf:o:m:" opt; do
case $opt in
h) help;;
f) FSCK=$OPTARG;;
o) MOUNT_OPTIONS=$OPTARG;;
m) MKFS=$OPTARG;;
-) break;;
\?) log error "Invalid option: -${OPTARG}"; exit 1;;
:) log "Option -${OPTARG} requires an argument."; exit 1;;
esac
done

shift $(($OPTIND - 1))
readonly DISK=$1
readonly MOUNTPOINT=$2

[[ -z ${DISK} ]] && help
[[ -z ${MOUNTPOINT} ]] && help

function disk_looks_unformatted() {
blkid ${DISK}
if [[ $? == 0 ]]; then
return 0
fi

local readonly file_type=$(file --special-files ${DISK})
case ${file_type} in
*filesystem*)
return 0;;
esac

return 1
}

function format_disk() {
log_command ${MKFS} ${DISK}
}

function try_repair_disk() {
log_command ${FSCK} -a ${DISK}
local readonly fsck_return=$?
if [[ ${fsck_return} -ge 8 ]]; then
log error "Fsck could not correct errors on ${DISK}"
return 1
fi
if [[ ${fsck_return} -gt 0 ]]; then
log warning "Fsck corrected errors on ${DISK}"
fi
return 0
}

function try_mount() {
local mount_retcode
try_repair_disk

log_command mount -o ${MOUNT_OPTIONS} ${DISK} ${MOUNTPOINT}
mount_retcode=$?
if [[ ${mount_retcode} == 0 ]]; then
return 0
fi

# Check to see if it looks like a filesystem before formatting it.
disk_looks_unformatted ${DISK}
if [[ $? == 0 ]]; then
log error "Disk ${DISK} looks formatted but won't mount. Giving up."
return ${mount_retcode}
fi

# The disk looks like it's not been formatted before.
format_disk
if [[ $? != 0 ]]; then
log error "Format of ${DISK} failed."
fi

log_command mount -o ${MOUNT_OPTIONS} ${DISK} ${MOUNTPOINT}
mount_retcode=$?
if [[ ${mount_retcode} == 0 ]]; then
return 0
fi
log error "Tried everything we could, but could not mount ${DISK}."
return ${mount_retcode}
}

try_mount
exit $?
1 change: 1 addition & 0 deletions cluster/saltbase/salt/top.sls
Expand Up @@ -11,6 +11,7 @@ base:
{% else %}
- sdn
{% endif %}
- helpers
- cadvisor
- kubelet
- kube-proxy
Expand Down
32 changes: 27 additions & 5 deletions cmd/e2e/e2e.go
Expand Up @@ -17,19 +17,22 @@ limitations under the License.
package main

import (
"fmt"
"os"
goruntime "runtime"
"strings"

"github.com/GoogleCloudPlatform/kubernetes/pkg/client/clientcmd"
"github.com/GoogleCloudPlatform/kubernetes/pkg/cloudprovider"
"github.com/GoogleCloudPlatform/kubernetes/pkg/util"
"github.com/GoogleCloudPlatform/kubernetes/test/e2e"
"github.com/golang/glog"
flag "github.com/spf13/pflag"
)

var (
context = &e2e.TestContextType{}
gceConfig = &context.GCEConfig
context = &e2e.TestContextType{}
cloudConfig = &context.CloudConfig

orderseed = flag.Int64("orderseed", 0, "If non-zero, seed of random test shuffle order. (Otherwise random.)")
reportDir = flag.String("report_dir", "", "Path to the directory where the JUnit XML reports should be saved. Default is empty, which doesn't generate these reports.")
Expand All @@ -47,9 +50,11 @@ func init() {
flag.StringVar(&context.Host, "host", "", "The host, or apiserver, to connect to")
flag.StringVar(&context.RepoRoot, "repo_root", "./", "Root directory of kubernetes repository, for finding test files. Default assumes working directory is repository root")
flag.StringVar(&context.Provider, "provider", "", "The name of the Kubernetes provider (gce, gke, local, vagrant, etc.)")
flag.StringVar(&gceConfig.MasterName, "kube_master", "", "Name of the kubernetes master. Only required if provider is gce or gke")
flag.StringVar(&gceConfig.ProjectID, "gce_project", "", "The GCE project being used, if applicable")
flag.StringVar(&gceConfig.Zone, "gce_zone", "", "GCE zone being used, if applicable")

// TODO: Flags per provider? Rename gce_project/gce_zone?
flag.StringVar(&cloudConfig.MasterName, "kube_master", "", "Name of the kubernetes master. Only required if provider is gce or gke")
flag.StringVar(&cloudConfig.ProjectID, "gce_project", "", "The GCE project being used, if applicable")
flag.StringVar(&cloudConfig.Zone, "gce_zone", "", "GCE zone being used, if applicable")
}

func main() {
Expand All @@ -63,5 +68,22 @@ func main() {
glog.Error("Invalid --times (negative or no testing requested)!")
os.Exit(1)
}

if context.Provider == "aws" {
awsConfig := "[Global]\n"
if cloudConfig.Zone == "" {
glog.Error("gce_zone must be specified for AWS")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "gce_zone" correct here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky. I'm not sure whether we want flags for each provider (i.e. potentially duplicate kube_master, gce_project, gce_zone -> aws_kube_master, aws_gce_project, aws_gce_zone) or whether we should treat these as "cloud_project" & "cloud_zone". If the latter, a secondary question is whether we should rename the flags.

Edit: I don't much care either way. I'm guessing we'll end up with duplicating (i.e. aws_zone)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can only have one cloud provider at a time, right? Perhaps this is a good place for generic cloud patterns, so long as it's not planned to allow a cluster span infrastructure providers... If that's the case, that's probably an easy small PR instead of adding more to this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the concepts fundamentally match up, I'm in favor of generification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with not renaming in this PR, but please file a bug on this. We just need to figure out the ripple of renaming them.

os.Exit(1)
}
awsConfig += fmt.Sprintf("Zone=%s\n", cloudConfig.Zone)

var err error
cloudConfig.Provider, err = cloudprovider.GetCloudProvider(context.Provider, strings.NewReader(awsConfig))
if err != nil {
glog.Error("Error building AWS provider: ", err)
os.Exit(1)
}
}

e2e.RunE2ETests(context, *orderseed, *times, *reportDir, testList)
}