# EKS CSI FSX Lustre Setup

Amazon FSx for Lustre is a high-performance file system optimized for deep learning workloads. FSx provides POSIX-compliant file system access to S3 for multiple readers and writers simultaneously.
  
The Amazon FSx for Lustre Container Storage Interface (CSI) driver provides a CSI interface that allows Amazon EKS clusters to manage the lifecycle of Amazon FSx for Lustre file systems.  

* https://docs.aws.amazon.com/eks/latest/userguide/fsx-csi.html
* https://github.com/kubernetes-sigs/aws-fsx-csi-driver

In [1]:
import boto3
import json
from botocore.exceptions import ClientError

iam = boto3.client('iam')
sts = boto3.client('sts')
cfn = boto3.client('cloudformation')
eks = boto3.client('eks')

region = boto3.Session().region_name
cluster_name = 'workshop'

# 1. Install the FSx CSI Driver for Kubernetes

## Create IAM Policy

Create an IAM policy and service account that allows the driver to make calls to AWS APIs on your behalf.

In [2]:
!pygmentize fsx/fsx-csi-driver.json

{
    [94m"Version"[39;49;00m:[33m"2012-10-17"[39;49;00m,
    [94m"Statement"[39;49;00m:[
        {
            [94m"Effect"[39;49;00m:[33m"Allow"[39;49;00m,
            [94m"Action"[39;49;00m:[
                [33m"iam:CreateServiceLinkedRole"[39;49;00m,
                [33m"iam:AttachRolePolicy"[39;49;00m,
                [33m"iam:PutRolePolicy"[39;49;00m
            ],
            [94m"Resource"[39;49;00m:[33m"arn:aws:iam::*:role/aws-service-role/s3.data-source.lustre.fsx.amazonaws.com/*"[39;49;00m
        },
        {
            [94m"Action"[39;49;00m:[33m"iam:CreateServiceLinkedRole"[39;49;00m,
            [94m"Effect"[39;49;00m:[33m"Allow"[39;49;00m,
            [94m"Resource"[39;49;00m:[33m"*"[39;49;00m,
            [94m"Condition"[39;49;00m:{
                [94m"StringLike"[39;49;00m:{
                    [94m"iam:AWSServiceName"[39;49;00m:[
                        [33m"fsx.amazonaws.com"[39;49;00m
                    ]
              

In [None]:
# !aws iam create-policy \
#     --policy-name Amazon_FSx_Lustre_CSI_Driver \
#     --policy-document file://fsx/fsx-csi-driver.json

In [3]:
with open('fsx/fsx-csi-driver.json') as json_file:
    data = json.load(json_file)
    policy = json.dumps(data)

try:
    response = iam.create_policy(
        PolicyName='Amazon_FSx_Lustre_CSI_Driver',
        PolicyDocument=policy
    )
    print("[OK] Policy created.")

except ClientError as e:
    if e.response['Error']['Code'] == 'EntityAlreadyExists':
        print("[OK] Policy already exists.")
    else:
        print("Error: %s" % e)

[OK] Policy created.


In [4]:
account_id = sts.get_caller_identity()['Account']
csi_policy_arn = 'arn:aws:iam::{}:policy/Amazon_FSx_Lustre_CSI_Driver'.format(account_id)
print(csi_policy_arn)

arn:aws:iam::665714183901:policy/Amazon_FSx_Lustre_CSI_Driver


## Create Kubernetes IAM Service Account

Create a Kubernetes service account for the driver and attach the policy to the service account. Replacing the ARN of the policy with the ARN returned in the previous step.

## _The next cell runs for about 10min. Please be patient._

In [5]:
!eksctl create iamserviceaccount \
     --region $region \
     --name fsx-csi-controller-sa \
     --namespace kube-system \
     --cluster $cluster_name \
     --attach-policy-arn $csi_policy_arn \
     --approve

[36m[ℹ]  eksctl version 0.32.0
[0m[36m[ℹ]  using region us-west-2
[0m[32m[!]  retryable error (RequestError: send request failed
caused by: Put "http://169.254.169.254/latest/api/token": context deadline exceeded (Client.Timeout exceeded while awaiting headers)) from ec2metadata/GetToken - will retry after delay of 35.741093ms
[0m[32m[!]  retryable error (RequestError: send request failed
caused by: Put "http://169.254.169.254/latest/api/token": context deadline exceeded (Client.Timeout exceeded while awaiting headers)) from ec2metadata/GetToken - will retry after delay of 118.362288ms
[0m[32m[!]  retryable error (RequestError: send request failed
caused by: Put "http://169.254.169.254/latest/api/token": context deadline exceeded (Client.Timeout exceeded while awaiting headers)) from ec2metadata/GetToken - will retry after delay of 173.270868ms
[0m[32m[!]  retryable error (RequestError: send request failed
caused by: Put "http://169.254.169.254/latest/api/token": context dea

In [6]:
cf_stack_name = 'eksctl-{}-addon-iamserviceaccount-kube-system-fsx-csi-controller-sa'.format(cluster_name)
print(cf_stack_name)

eksctl-workshop-addon-iamserviceaccount-kube-system-fsx-csi-controller-sa


In [7]:
response = cfn.list_stack_resources(
    StackName=cf_stack_name
)
print(response)

{'StackResourceSummaries': [{'LogicalResourceId': 'Role1', 'PhysicalResourceId': 'eksctl-workshop-addon-iamserviceaccount-kube-Role1-7HVCFDTTQ2ZD', 'ResourceType': 'AWS::IAM::Role', 'LastUpdatedTimestamp': datetime.datetime(2020, 11, 22, 17, 7, 49, 200000, tzinfo=tzlocal()), 'ResourceStatus': 'CREATE_COMPLETE', 'DriftInformation': {'StackResourceDriftStatus': 'NOT_CHECKED'}}], 'ResponseMetadata': {'RequestId': '67b860cb-321c-49f6-8517-aa96a3c4d098', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '67b860cb-321c-49f6-8517-aa96a3c4d098', 'content-type': 'text/xml', 'content-length': '858', 'date': 'Sun, 22 Nov 2020 17:18:17 GMT'}, 'RetryAttempts': 0}}


In [8]:
iam_role_name = response['StackResourceSummaries'][0]['PhysicalResourceId']
print(iam_role_name)

eksctl-workshop-addon-iamserviceaccount-kube-Role1-7HVCFDTTQ2ZD


In [9]:
iam_role_arn = iam.get_role(RoleName=iam_role_name)['Role']['Arn']
print(iam_role_arn)

arn:aws:iam::665714183901:role/eksctl-workshop-addon-iamserviceaccount-kube-Role1-7HVCFDTTQ2ZD


# Deploy CSI Driver

In [10]:
!kubectl apply -k "github.com/kubernetes-sigs/aws-fsx-csi-driver/deploy/kubernetes/overlays/stable/?ref=master"


serviceaccount/fsx-csi-controller-sa configured
clusterrole.rbac.authorization.k8s.io/fsx-csi-external-provisioner-role created
clusterrolebinding.rbac.authorization.k8s.io/fsx-csi-external-provisioner-binding created
deployment.apps/fsx-csi-controller created
daemonset.apps/fsx-csi-node created
csidriver.storage.k8s.io/fsx.csi.aws.com created


Patch the driver deployment to add the service account that you just created, replacing the ARN with the correct role ARN.

In [11]:
!kubectl annotate serviceaccount -n kube-system fsx-csi-controller-sa \
 eks.amazonaws.com/role-arn=$iam_role_arn --overwrite=true

serviceaccount/fsx-csi-controller-sa annotated


# Check S3 Bucket For FSX

In [12]:
bucket = 's3://fsx-container-demo'

In [13]:
!aws s3 mb $bucket

make_bucket: fsx-container-demo


In [14]:
!aws s3 ls $bucket

In [15]:
!aws s3 ls $bucket --recursive

# Download Storage Class Manifest

In [None]:
!curl -o storageclass.yaml https://raw.githubusercontent.com/kubernetes-sigs/aws-fsx-csi-driver/master/examples/kubernetes/dynamic_provisioning_s3/specs/storageclass.yaml
    

## Get VPC ID and Subnet ID

In [19]:
%%bash

source ~/.bash_profile

#### Get VPC ID
export VPC_ID=$(aws ec2 describe-vpcs --filters "Name=tag:Name,Values=eksctl-${AWS_CLUSTER_NAME}-cluster/VPC" --query "Vpcs[0].VpcId" --output text)
echo "export VPC_ID=${VPC_ID}" | tee -a ~/.bash_profile

#### Get Subnet ID
export SUBNET_ID=$(aws ec2 describe-subnets --filters "Name=vpc-id,Values=${VPC_ID}" --query "Subnets[0].SubnetId" --output text)
echo "export SUBNET_ID=${SUBNET_ID}" | tee -a ~/.bash_profile

export VPC_ID=vpc-08a961479e87b4848
export SUBNET_ID=subnet-069a722e18c7d27db


## Create Security Group

In [20]:
%%bash

source ~/.bash_profile

export SEC_GROUP_ID=$(aws ec2 create-security-group --group-name eks-fsx-security-group --vpc-id ${VPC_ID} --description "FSx for Lustre Security Group" --query "GroupId" --output text)
echo "export SEC_GROUP_ID=${SEC_GROUP_ID}" | tee -a ~/.bash_profile

export SEC_GROUP_ID=sg-00c62d7ceb2e27036


## Add an ingress rule that opens up port 988 from the 192.168.0.0/16 CIDR range

In [21]:
%%bash

source ~/.bash_profile

aws ec2 authorize-security-group-ingress --group-id ${SEC_GROUP_ID} --protocol tcp --port 988 --cidr 192.168.0.0/16

## Update the environment variables in the `storageclass.yaml` file

In [None]:
!pygmentize fsx/storageclass.yaml

# Create FSX Storage Class

In [None]:
#!kubectl delete -f fsx/storageclass.yaml

In [22]:
!kubectl create -f fsx/storageclass.yaml

storageclass.storage.k8s.io/fsx-sc created


In [23]:
!kubectl get sc

NAME            PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
fsx-sc          fsx.csi.aws.com         Delete          Immediate              false                  2s
gp2 (default)   kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   false                  41m


# Create Claim

In [None]:
#!curl -o claim.yaml https://raw.githubusercontent.com/kubernetes-sigs/aws-fsx-csi-driver/master/examples/kubernetes/dynamic_provisioning_s3/specs/claim.yaml

In [None]:
!pygmentize fsx/claim.yaml

In [None]:
#!kubectl delete -f fsx/claim.yaml

In [24]:
!kubectl apply -f fsx/claim.yaml

persistentvolumeclaim/fsx-claim created


In [29]:
!kubectl get pvc fsx-claim

NAME        STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
fsx-claim   Pending                                      fsx-sc         8m12s


In [30]:
!kubectl describe pvc fsx-claim

Name:          fsx-claim
Namespace:     default
StorageClass:  fsx-sc
Status:        Bound
Volume:        pvc-630b5a8d-c36d-4040-b0c6-2c5425c1352f
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: fsx.csi.aws.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1200Gi
Access Modes:  RWX
VolumeMode:    Filesystem
Mounted By:    <none>
Events:
  Type     Reason                Age                    From                                                                                      Message
  ----     ------                ----                   ----                                                                                      -------
  Normal   Provisioning          3m29s (x3 over 13m)    fsx.csi.aws.com_fsx-csi-controller-55bcb55d5d-bpfww_c2f64f0d-2ce6-11eb-b5df-66867e10f3a0  External provisioner is provisioning volume for

## _Wait for status == Bound_

## Update FSX to `autoImportPolicy: NEW_CHANGED`

In [None]:
fsx = boto3.client('fsx')

In [None]:
response = fsx.describe_file_systems()
fsx_id = response['FileSystems'][0]['FileSystemId']
print(fsx_id)

In [None]:
response = fsx.update_file_system(
    FileSystemId=fsx_id,
    LustreConfiguration={
        'AutoImportPolicy': 'NEW_CHANGED'
    }
)
print(response)