# Architecture Documentation for Snowflake Data Access Solution

## Overview
This architecture describes a solution for secure and scalable data access to Snowflake, utilizing AWS cloud services. It includes components for data ingestion, access control, data cataloging, and querying.


In [None]:
from IPython.display import Image, display
display(Image(filename='build_streamlit_app.png'))

## 1. Components

### **1.1 User Interaction and Data Upload**
- **Upload CSV**: Users can upload CSV files, which are stored in **S3**.
- **Easy IDA**: Provides user authentication and identity management, leveraging **Route53 DNS** for secure access.

### **1.2 Data Ingestion and Cataloging**
- **S3**: Stores CSV files uploaded by users.
- **Crawlers**: Automatically discover and catalog data stored in S3.
- **Data Catalog**: Stores metadata about the data, providing a centralized view of available datasets.

### **1.3 AWS Cloud/Application Layer**
- **Control Plane API**: Manages access control and permissions.
- **Streamlit Application**: A web application providing a user interface to interact with the data.
- **Entitlement API**: Controls data access based on user entitlements.
- **boto3 SDK**: Used for interacting with AWS services.

### **1.4 Snowflake Connectivity/Access (FID Based)**
- **Snowflake DB**: The Snowflake data warehouse.
- **IAM Anywhere**: Provides identity and access management across various resources.
- **Amazon Aurora**: Used as datalake.

### **1.5 Data Querying**
- **Athena SQL**: Enables querying data stored in S3 using SQL.



## 2. Workflow

1.  **User Upload**: Users upload CSV files via the web interface (Streamlit Application) and authenticate using Easy IDA.
2.  **Data Storage**: Uploaded CSV files are stored in S3 buckets.
3.  **Data Cataloging**: AWS Crawlers automatically crawl the data in S3 and update the Data Catalog with metadata.
4.  **Access Control**: The Streamlit Application, Control Plane API, and Entitlement API ensure that users have the necessary permissions to access specific datasets in Snowflake.
5.  **Data Querying**: Users can query the data in Snowflake through Athena SQL. Boto3 SDK is used to interact with various AWS services.


## 3. Key Technologies

| Component | Technology |
| --------- | ---------- |
| Authentication | Easy IDA, Route53 DNS |
| Data Storage | S3 |
| Data Warehouse | Snowflake DB |
| Querying | Athena SQL |
| IAM | IAM Anywhere |
| Data Lake | Amazon Aurora|
| Data Catalog | AWS Glue Data Catalog |
| Web Application | Streamlit Application |


## 4. Benefits

- **Scalability**: Leverages AWS cloud services for scalable data storage and querying.
- **Security**: Secure access to data through Easy IDA, Entitlement API, and IAM.
- **Centralized Data Catalog**: Provides a unified view of data available for querying.
- **Flexibility**: Supports querying data using standard SQL (Athena SQL).
