# AWS Assignment

* Understanding the difference between AWS Regions, Availability Zones, and Edge Locations is crucial for designing performant, reliable, and cost-effective cloud-based applications—especially for data analysis and latency-sensitive workloads.

1. AWS Regions
Definition: A Region is a geographic area that contains multiple, physically separated and isolated Availability Zones.

Examples: us-east-1 (N. Virginia), eu-west-1 (Ireland), ap-southeast-1 (Singapore).

Purpose: Allows users to deploy applications close to their users or to meet legal and compliance requirements regarding data residency.

2. Availability Zones (AZs)
Definition: Each Region consists of 2 or more AZs, which are physically separate data centers within the same region.

Purpose: Designed for high availability and fault tolerance. If one AZ goes down (e.g., power outage), others can take over.

Use Case: Deploying services across multiple AZs improves resilience (e.g., RDS Multi-AZ deployments, EC2 auto-scaling groups).

3. Edge Locations
Definition: Edge Locations are data centers in major cities around the world used by AWS services like CloudFront (CDN) and Route 53 (DNS).

Purpose: Deliver content and services closer to users, improving performance by reducing latency.

Use Case: Caching static assets (images, videos, etc.), accelerating API responses with CloudFront.

Why This Matters for Data Analysis & Latency-Sensitive Applications
Data Analysis
Region Choice: Select Regions where data is collected or where compliance rules apply.

AZ Use: For high availability in data pipelines (e.g., EMR clusters, Redshift, Glue).

Data Transfer Costs: Transferring data across Regions is expensive—keeping it in-region reduces cost and latency.

Latency-Sensitive Applications
Edge Locations: Serve content via CDN from nearest edge for millisecond latency.

Regions: Choose a region geographically close to your users for real-time systems (e.g., live analytics, gaming).

AZs: Deploy across AZs to ensure that an outage doesn’t affect performance.

* aws ec2 describe-regions --all-regions --query "Regions[*].RegionName" --output table

 Explanation:
aws ec2 describe-regions: Retrieves information about all EC2 regions.

--all-regions: Includes all regions, even those not enabled by default in your account.

--query "Regions[*].RegionName": Filters the output to only show the region names.

--output table: Formats the output as a readable table (you can also use json or text).

---------------------------------
|       DescribeRegions         |
+-------------------------------+
|  af-south-1                   |
|  ap-east-1                    |
|  ap-northeast-1              |
|  ap-northeast-2              |
|  ap-northeast-3              |
|  ap-south-1                  |
|  ap-south-2                  |
|  ap-southeast-1              |
|  ap-southeast-2              |
|  ap-southeast-3              |
|  ca-central-1                |
|  eu-central-1                |
|  eu-central-2                |
|  eu-north-1                  |
|  eu-south-1                  |
|  eu-south-2                  |
|  eu-west-1                   |
|  eu-west-2                   |
|  eu-west-3                   |
|  me-central-1                |
|  me-south-1                  |
|  sa-east-1                   |
|  us-east-1                   |
|  us-east-2                   |
|  us-west-1                   |
|  us-west-2                   |
+-------------------------------+


*Step-by-Step (Using AWS CLI)
Create IAM User:

aws iam create-user --user-name s3-read-write-user


Attach Inline Policy (Least Privilege):
Here's a JSON policy that grants:

Read and write access to only a specific bucket (example-bucket)

No access to other services or buckets

Least Privilege S3 Access Policy (JSON):





In [1]:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::example-bucket/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket"
      ],
      "Resource": "arn:aws:s3:::example-bucket"
    }
  ]
}


{'Version': '2012-10-17',
 'Statement': [{'Effect': 'Allow',
   'Action': ['s3:GetObject', 's3:PutObject', 's3:DeleteObject'],
   'Resource': 'arn:aws:s3:::example-bucket/*'},
  {'Effect': 'Allow',
   'Action': ['s3:ListBucket'],
   'Resource': 'arn:aws:s3:::example-bucket'}]}

Attach the Policy:
Save the JSON policy above in a file (e.g., s3-policy.json) and run:

aws iam put-user-policy --user-name s3-read-write-user --policy-name S3LeastPrivilegePolicy --policy-document file://s3-policy.json





When to Use in Data Analytics Workflows
1. S3 Standard
Use when: You are actively using the data for ETL jobs, querying (e.g., Athena/Redshift Spectrum), or machine learning.

Examples:

Data lakes used for daily dashboards

Streaming pipeline output

Training data for ML

2. S3 Intelligent-Tiering
Use when: Access patterns are unpredictable—S3 automatically moves objects between tiers to optimize cost.

Examples:

Data exploration environments

Ad hoc analytics queries

Variable reporting workloads

3. S3 Glacier Instant Retrieval
Use when: You need to archive data but still want to retrieve it quickly for analytics or audits.

Examples:

Archived transaction logs

Old campaign data needed for comparison or back-testing

4. S3 Glacier Flexible Retrieval
Use when: Data is rarely needed, and retrieval speed isn't critical (minutes to hours is acceptable).

Examples:

Compliance logs

Quarterly backups of processed datasets

5. S3 Glacier Deep Archive
Use when: Data is almost never accessed but must be stored for compliance or historical purposes.

Examples:

Regulatory archives

Long-term raw sensor data from IoT

 Step-by-Step: S3 Bucket with Versioning and File Versions
1. Create an S3 Bucket

aws s3api create-bucket --bucket my-sample-data-bucket-123456 \
    --region us-east-1 \
    --create-bucket-configuration LocationConstraint=us-east-1
Replace my-sample-data-bucket-123456 with a unique name.

 2. Enable Versioning

aws s3api put-bucket-versioning --bucket my-sample-data-bucket-123456 \
    --versioning-configuration Status=Enabled
3. Create and Upload a Sample CSV File (Version 1)
Create a file:

echo "id,name,value\n1,Alice,100" > data.csv
Upload it:


aws s3 cp data.csv s3://my-sample-data-bucket-123456/data.c4. Modify and Upload the Same File (Version 2)
Update the file:


echo "id,name,value\n1,Alice,100\n2,Bob,200" > data.csv
Upload again:

aws s3 cp data.csv s3://my-sample-data-bucket-123456/data.csv
5. List All Versions of the File
bash
aws s3api list-object-versions --bucket my-sample-data-bucket-123456 --prefix data.csv

In [2]:
{
  "Rules": [
    {
      "ID": "MoveToGlacierAndDelete",
      "Prefix": "",
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "GLACIER"
        }
      ],
      "Expiration": {
        "Days": 90
      }
    }
  ]
}


{'Rules': [{'ID': 'MoveToGlacierAndDelete',
   'Prefix': '',
   'Status': 'Enabled',
   'Transitions': [{'Days': 30, 'StorageClass': 'GLACIER'}],
   'Expiration': {'Days': 90}}]}

Comparison Table: RDS vs DynamoDB vs Redshift
Feature / Service	Amazon RDS	Amazon DynamoDB	Amazon Redshift
Type	Relational database (SQL)	NoSQL (key-value / document store)	Analytical data warehouse
Best for	OLTP (transactional workloads)	Low-latency, high-throughput key-value access	OLAP (complex analytics & reporting)
Scaling	Vertical (limited horizontal with read replicas)	Horizontal (auto scaling)	Massively parallel (MPP)
Schema	Fixed schema (SQL-based)	Schema-less or flexible schema	Structured schema (columnar)
Latency	Milliseconds	Single-digit milliseconds	Seconds to minutes (batch queries)
Data Size	GBs to TBs	KBs to TBs (scales well)	TBs to PBs
Cost Model	Instance-based	On-demand or provisioned throughput	Storage + compute (separate)

When to Use in a Data Pipeline
Stage	Service	Use Case
Ingestion + Processing	DynamoDB	Real-time ingestion of IoT device data for later analysis
Transactional Storage	RDS	Storing cleaned and structured customer/order data from a web application
Analytics + BI	Redshift	Running daily/monthly dashboards and reports on aggregated sales data

Use Case Details
Amazon RDS
Use Case: Store transactional data from a web or mobile app (e.g., customer orders).

Why: Supports SQL, ACID compliance, joins, and complex relationships.

Example: MySQL RDS stores user profiles, orders, and payment logs.

Amazon DynamoDB
Use Case: Capture real-time telemetry or clickstream data.

Why: High throughput, low latency, and fully managed with built-in scaling.

Example: DynamoDB stores millions of IoT sensor readings per minute for a smart factory.

Amazon Redshift
Use Case: Analyze historical sales trends or product performance.

Why: Optimized for large-scale, complex queries over structured data.

Example: Redshift aggregates product sales from the last 5 years for executive dashboards.

