Skip to content

Latest commit

 

History

History
263 lines (193 loc) · 13.5 KB

S3.md

File metadata and controls

263 lines (193 loc) · 13.5 KB

S3 - Simple Secure Storage

Table of contents

Amazon Simple Storage Service (Amazon S3) is a scalable, secure, and highly durable object storage service, offering industry-leading scalability, data availability, security, and performance. It is designed to store and retrieve any amount of data from anywhere on the web. S3 is commonly used as a storage backend for various applications, and it offers a simple web services interface that allows developers to easily integrate it into their applications. Storage = buckets. Objects = files in buckets. It is reliable, secure and durable as it has multiple copies of the same data so that it will not be lost. (Free to get files in the same region.)

Alt text

AWS- S3

S3 article

Do not share credentials - there will be a fine.

Here are some of the common use cases for AWS S3:

  • Data Backup and Recovery: S3 is often used as a backup solution for enterprises and individuals. It provides a reliable and cost-effective way to store backups of critical data. S3's durability and redundancy features ensure that your data is protected against hardware failures and can be easily restored when needed.

  • Static Website Hosting: S3 can be used to host static websites. You can store HTML, CSS, JavaScript, and other web assets in S3 buckets and configure them for static website hosting. S3 integrates with Amazon CloudFront, a content delivery network (CDN), to deliver the website content globally with low latency.

  • Data Archiving: S3's cost-effective storage options make it suitable for long-term data archiving. You can store infrequently accessed data, such as log files, historical records, or regulatory compliance data, in S3 Glacier or S3 Glacier Deep Archive. These storage classes provide low-cost archival storage with retrieval options optimized for long-term retention.

  • Content Distribution: S3 integrates seamlessly with Amazon CloudFront, allowing you to distribute your content globally with low latency and high data transfer speeds. This is especially useful for delivering large files, such as media files, software downloads, or game assets, to users around the world.

  • Data Lakes and Analytics: S3 is often used as a data lake, where organizations can store large amounts of structured and unstructured data for analytics purposes. Data can be directly ingested into S3 from various sources, and analytics services like Amazon Athena, Amazon Redshift, or Apache Spark can process and analyze the data stored in S3.

  • Application Data Storage: S3 can be used as a primary data store for applications. It provides a reliable and scalable storage solution for various types of data, including user files, documents, images, videos, and more. Many applications leverage S3's features for storing and retrieving user-generated content.

  • Disaster Recovery: S3 can be used as part of a disaster recovery strategy. By replicating data across multiple regions or using cross-region replication, organizations can ensure that their data is protected and available in case of a regional outage or disaster.

By leveraging the following mechanisms, AWS S3 provides a highly available and reliable storage solution for storing and accessing your data in the cloud.

  • Data Replication: AWS S3 automatically replicates data across multiple devices and multiple facilities within a region. This replication ensures that even if one device or facility fails, your data remains accessible from alternative storage locations. S3 uses a technique called "cross-region replication" to replicate data across different regions, providing additional redundancy and disaster recovery capabilities.
  • Redundancy: S3 stores multiple copies of your data within a single region by default. Each object is redundantly stored across different servers and multiple storage devices, which helps protect against hardware failures. The redundancy ensures that if one server or device fails, your data remains available from another server or device.
  • Regional Availability: AWS S3 is designed to be highly available within an AWS region. Regions are composed of multiple availability zones (AZs), which are physically separate data centers. S3 stores data across multiple AZs within a region, ensuring that your data remains accessible even if an entire AZ becomes unavailable.
  • Scalability: S3 is designed to scale automatically to accommodate increased demand and storage requirements. It can handle high levels of request traffic and automatically distribute the load across multiple servers. This ensures that your applications can continue to access data stored in S3 without interruption, even during periods of high demand.
  • Service Level Agreement (SLA): AWS provides a service level agreement for S3, which guarantees a certain level of availability. The SLA typically promises 99.99% availability for S3 within a region. This means that S3 is designed to be highly reliable and accessible, and AWS takes responsibility for maintaining its availability.
  • Durability: S3 offers high durability, meaning that your data is designed to be highly protected against data loss. S3 achieves this by automatically storing multiple copies of your data across different devices and facilities, as mentioned earlier. The durability level of S3 is extremely high, with AWS guaranteeing 99.999999999% (11 nines) durability for S3 objects.

C - Create, R - Read, U - Update, D - Delete

CLI = Command Line Interface

  1. Open terminal in PyCharm.
  2. cd to directory you want as bucket
  3. pip install awscli
  1. aws configure
  2. Enter Access Key. Just Enter if you have already input.
  3. Enter Secret Access Key. Just Enter if you have already input.
  4. Enter region name as 'eu-west-1'. Just Enter if you have already input.
  5. output format: json. Just Enter if you have already input.

Check if the log in succeeded: aws s3 ls, which should return a list of the available buckets.

Note all CLI commands are using 'tech230-esther-bucket' as the bucket name, 'sampletext.txt' is the file that we move/delete to/from bucket.

To create a bucket use aws s3 mb s3://tech230-esther-bucket, returns 'make_bucket: tech230-esther-bucket'.

To upload a file to a bucket use aws s3 cp sampletext.txt s3://tech230-esther-bucket (specify the path to the file if not in folder where 'sampletext.txt' is; 'cp' = copy), returns 'upload: .\sampletext.txt to s3://tech230-esther-bucket/sampletext.txt'.

To download a file use aws s3 sync s3://tech230-luke-bucket s3-downloads, returns 'download: s3://tech230-luke-bucket/sampletext.txt to s3-downloads\sampletext.txt', after which you can navigate to the specified directory ('s3-downloads'), which contains the downloaded files.

Note that you cannot delete a bucket if it is not empty.

Remove an individual file with aws s3 rm s3://tech230-esther-bucket/sampletext.txt, returns 'delete: s3://tech230-esther-bucket/sampletext.txt'.

Caution: Remove everything from a bucket: aws s3 rm s3://tech230-esther-bucket --recursive.

Delete bucket: aws s3 rb s3://tech230-esther-bucket, returns 'remove_bucket: tech230-esther-bucket'.

Install boto3 python library with pip install boto3. AWS - boto3 client vs resource, AWS - boto3, Real Python - boto3

To see all buckets:

import boto3
# connect to s3
s3 = boto3.resource("s3")
# lists buckets
for bucket in s3.buckets.all():
  print(bucket.name)
import boto3
# connect to s3
s3 = boto3.client("s3")
# create a bucket on s3
bucket_name = s3.create_bucket(Bucket = "tech230-esther-boto", CreateBucketConfiguration={"LocationConstraint": "eu-west-1"})
import boto3
# connect to s3
s3 = boto3.resource("s3")
# open the file we want to send, stores it in a variable called data
data = open("sampletext.txt", "rb")
# specify what bucket we are sending the file to, .put_object names the file and sends its' contents
s3.Bucket("tech230-esther-boto").put_object(Key="sampletext.txt", Body=data)
import boto3
# connection to s3
s3 = boto3.client("s3")
# downloads from s3 bucket
s3.download_file("tech230-esther-boto", "sampletext.txt", "sampletext1.txt")
import boto3
# connect to s3
s3 = boto3.resource("s3")
# bucket variable
bucket = s3.Bucket("tech230-esther-boto")
# deletes bucket
response = bucket.delete()
import boto3
# connect to s3
s3 = boto3.resource("s3")
# delete a file in a particular bucket
s3.Object("tech230-esther-boto","sampletext.txt").delete()
  1. Create an Ubuntu EC2 instance.

Use the following commands to set it up (for ubuntu 20.04):

#!/bin/bash
# Update the sources list
sudo apt-get update -y

# upgrade any packages available
sudo apt-get upgrade -y

# gets required updates for following installations
sudo apt update

# install python
sudo apt install python3 -y

# install pip
sudo apt install python3-pip -y

# install awscli
#pip install --upgrade awscli
sudo pip3 install awscli

To check required installations:

# checks python is installed + version
python3 --version
# checks if aws cli is installed + version
aws --version
# check aws configurations
aws configure list
  1. Configure AWS CLI with this command: aws configure. Enter the required information. (See above section: Log in with Access Key if you are stuck).
  2. Install boto3 with this command pip install boto3.
  1. Ensure dependencies and installations done from the steps 1 and 2 above in Installations and set up.
  2. Create a bucket with: aws s3 mb s3://tech230-esther-bucket --region eu-west-1. (You may check this with aws s3 ls).

before after

  1. Upload a file to a bucket with: aws s3 cp ~/sampletext.txt s3://tech230-esther-bucket, verify upload with aws s3 ls s3://tech230-esther-bucket.

upload success

  1. Download a file from a bucket with: aws s3 cp s3://tech230-esther-bucket/sampletext.txt S3_downloads verify on your system with ls.

download success

  1. Delete a file from a bucket with: aws s3 rm s3://tech230-esther-bucket/sampletext.txt or delete multiple files with: aws s3 rm s3://tech230-esther-bucket --recursive. Verify with aws s3 ls s3://tech230-esther-bucket.

deletion success

  1. Delete a bucket with: aws s3 rb s3://tech230-esther-bucket, to verify aws s3 ls.

before

  1. Ensure dependencies and installations done from the steps 1 to 3 above in Installations and set up. Alternatively set up a python environment
# set up and activate a python virtual environment
sudo apt install python3.8-venv
python3 -m venv <env-directory>
source ~/<env-directory>/bin/activate
# install pip
sudo apt-get install python3-pip
# install boto3
pip install boto3
# run the python interpreter
python3
  1. Create a Python script with touch file-name.py, then enter it with nano create-bucket.py and edit it to contain the appropriate python script for what you want to accomplish as found above in the Python CRUD: section. Save and exit the file. (This is not necessary, but you can edit the file permissions so you can execute the file using: chmod u+x file-name.py.)
  2. Run the 'file-name.py' file using python3 file-name.py to run the commands you desire.