Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable storage provider #1937

Open
sainak opened this issue Mar 2, 2024 · 7 comments
Open

Configurable storage provider #1937

sainak opened this issue Mar 2, 2024 · 7 comments
Labels
good first issue Good for newcomers P2

Comments

@sainak
Copy link
Member

sainak commented Mar 2, 2024

Currently, we only support s3 compatible storage providers for storing and retrieving files, we need to make the storage provider more configurable

we have 2 buckets currently:

  • for serving facility cover images (public)
  • for serving sensitive patient files (private)

The solution is to create a configurable storage provider class that can support s3/azure blob storage/local storage to serve files.

requirements:

  • there should be a strong access control to prevent public access to patient/internal files
  • we should be able to serve static files and cover images thought the public bucket
  • storage providers should expose signed upload URLs for patient files
  • storage providers should expose patient files only with signed URLs that can expire after a certain amount of time
  • have proper checks for file mime-types
  • add proper test cases to handle the above requirements

Share your implementation plan before working on this issue

@sainak sainak added good first issue Good for newcomers P2 labels Mar 2, 2024
@ab1123
Copy link

ab1123 commented Mar 2, 2024

Can I work on this issue??

@balaji-sivasakthi
Copy link

@sainak Will you provide test secrets and credentials to work on this issue?

@sainak
Copy link
Member Author

sainak commented Mar 3, 2024

@balaji-sivasakthi you will have to use local development tools like localstack and azureite

@balaji-sivasakthi
Copy link

balaji-sivasakthi commented Mar 3, 2024

@sainak Is these tools FOSS?

@sainak
Copy link
Member Author

sainak commented Mar 4, 2024

@sainak Is these tools FOSS?

For our use case, yes

@rash-27
Copy link
Contributor

rash-27 commented Mar 4, 2024

  • for this already the security checks are made in the serializers
  • So , my idea is to create a class and use the instance of it where presently the s3 bucket is used (i.e the methods of the models that return URL and etc, in case of file storage) so that multiple storage providers can be accessed via it,
  • the basic idea of the class :
    class StorageProvider:
    def init(self , provider ,bucket_type ):
    if provider == "s3":
    self.client = boto3.client("s3", **config)
    //similarly we can provide the options for remaining storage providers
    // Here implementing the methods required , like signed_url , put_object , get_object , etc ,
  • regarding private and public buckets is we can decide with the help of bucket type and we can change the upload url accordingly
    @sainak This is the basic idea of my implementation, and mention if I am wrong somewhere or suggest any changes need to done

@balaji-sivasakthi
Copy link

balaji-sivasakthi commented Mar 5, 2024

Approach

To decouple the storage provider from specific use cases like patient and facility, we can introduce more flexibility by allowing the specification of bucket types and access controls through parameters. We'll utilize enums for access control and provide the bucket name as a parameter.

class ClientConfig(TypedDict):
    region_name: str
    aws_access_key_id: str
    aws_secret_access_key: str
    endpoint_url: str

class CSProvider(enum.Enum):
    AWS = "AWS"
    GCP = "GCP"
    AZURE = 'AZURE'
    DOCKER = "DOCKER"  # localstack in docker
    LOCAL = "LOCAL"  # localstack on host

class AccessControl(enum.Enum):
    PUBLIC = "PUBLIC"
    PRIVATE = "PRIVATE"

class BucketType(enum.Enum):
    PATIENT = "PATIENT"
    FACILITY = "FACILITY"

Then we will create a factory method for storages based on the CSProvider enum, and also initialise the clientConfig and bucket name for getting the signed url

I will add a class method named from_provider inside the ConfigurableStorageProvider class. This method acts as a factory method to create instances of ConfigurableStorageProvider based on the specified CSProvider. When calling the from_provider method, you provide the desired CSProvider, account_url, and credential. It internally creates a ClientConfig object and initialises the ConfigurableStorageProvider instance accordingly. This approach makes it easier to instantiate the ConfigurableStorageProvider class by encapsulating the logic of creating the ClientConfig object based on the specified provider

LLD

def __init__(self, provider: CSProvider, client_config: ClientConfig):
def from_provider(cls, provider: CSProvider, account_url: str, credential: str) -> 'ConfigurableStorageProvider':
def get_signed_url(self, bucket_name: BucketName, object_key: str, expiration: int, access_control: AccessControl) -> str:
def check_mime_type(self, file_path: str, expected_mime_type: str) -> bool:

cc @sainak @rash-27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers P2
Projects
None yet
Development

No branches or pull requests

4 participants