# Notebook to Test Loading Results from S3 for Paper Analysis and Plotting

## Data Format in S3 Buckets

- Bucket: ndmg-data
- Files: 
    - Graph (.png)
    - Adjacency Matrix (.csv)
    - ? Streamlines ? (.trk)
    
    

## Download Data

- Use AWS CLI config for keys
- Adapted from M2G boto3 code s3_get_data

goal: extract relevant files for paper analysis

In [None]:
def s3_get__relevant_data(bucket, remote, local, info="", file_types = ('.csv','.png') ,force=False):
    """Given and s3 directory, copies files/subdirectories in that directory to local
    Parameters
    ----------
    bucket : str
        s3 bucket you are accessing data from
    remote : str
        The path to the data on your S3 bucket. The data will be
        downloaded to the provided bids_dir on your machine.
    local : list
        Local input directory where you want the files copied to and subject/session info [input, sub-#/ses-#]
    info : str, optional
        Relevant subject and session information in the form of sub-#/ses-#
    file_types : tuple of str's, optional
        extensions of relevant file types
    force : bool, optional
        Whether to overwrite the local directory containing the s3 files if it already exists, by default False
    """
    if info == "sub-":
        print("Subject not specified, comparing input folder to remote directory...")
    else:
        if os.path.exists(os.path.join(local, info)) and not force:
            if os.listdir(os.path.join(local, info)):
                print(
                    f"Local directory: {os.path.join(local,info)} already exists. Not pulling s3 data. Delete contents to re-download data."
                )
                return  # TODO: make sure this doesn't append None a bunch of times to a list in a loop on this function
    # get client with credentials if they exist
    client = s3_client(service="s3")
    # check that bucket exists
    bkts = [bk["Name"] for bk in client.list_buckets()["Buckets"]]
    if bucket not in bkts:
        raise ValueError(
            "Error: could not locate bucket. Available buckets: " + ", ".join(bkts)
        )
    info = info.rstrip("/") + "/"
    bpath = get_matching_s3_objects(bucket, f"{remote}/{info}")
    
    # go through all folders inside of remote directory and download relevant files
    for obj in bpath:
        bdir, data = os.path.split(obj)
        localpath = os.path.join(local, bdir.replace(f"{remote}/", ""))
        
        #check if the file is correct type 
        if obj.endswith(file_types):
            # Make directory for data if it doesn't exist
            if not os.path.exists(localpath):
                os.makedirs(localpath)
            if not os.path.exists(f"{localpath}/{data}"):
                print(f"Downloading {bdir}/{data} from {bucket} s3 bucket...")
                # Download file
                client.download_file(bucket, f"{bdir}/{data}", f"{localpath}/{data}")
                if os.path.exists(f"{localpath}/{data}"):
                    print("Success!")
                else:
                    print("Error: File not downloaded")
            else:
                print(f"File {data} already exists at {localpath}/{data}")

## Load and Analyze Data

- Extract metadata about patient and subjects from the folder (general demographic distributions (age, gender
- Discriminability


- ? Mean level analysis on all the connectomes data like in bioarvix?