<a href="https://colab.research.google.com/github/Olalekan-Ojo/Deeplearning/blob/main/How_to_import_dataset_from_S3_Bucket.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



```
# Install Libraries
```
Boto3 is a library for setting up and interating with Amazon Web Services (AWS) in your python code.

tqdm is a library for displaying progress bars



In [None]:
!pip install boto3
!pip install tqdm

In [None]:
import boto3
from pathlib import Path
from botocore import UNSIGNED
from botocore.client import Config
from tqdm.notebook import tqdm

-File and Folder names are retrived from s3 bucket using `list_object_v2` method from boto3 library initially imported.

'bucket_name' is the name of the s3 bucket you want to retrieve data from

-The loop is used to ilterate through the results of the `list_objects_v2` and check if it represents a file or a folder based on the presence of a trailing slash in the key

-The process is repeated till all the files are checked



In [None]:
def get_file_folders(s3_client, bucket_name, prefix=""):
    file_names = []
    folders = []

    default_kwargs = {
        "Bucket": bucket_name,
        "Prefix": prefix
    }
    next_token = ""

    while next_token is not None:
        updated_kwargs = default_kwargs.copy()
        if next_token != "":
            updated_kwargs["ContinuationToken"] = next_token

        response = s3_client.list_objects_v2(**updated_kwargs)
        contents = response.get("Contents")

        for result in contents:
            key = result.get("Key")
            if key[-1] == "/":
                folders.append(key)
            else:
                file_names.append(key)

        next_token = response.get("NextContinuationToken")

    return file_names, folders

The `download_files` downloads files from the S3 bucket to a local directory

In [None]:
def download_files(s3_client, bucket_name, local_path, file_names, folders):
    local_path = Path(local_path)

    for folder in tqdm(folders):
        folder_path = Path.joinpath(local_path, folder)
				# Create all folders in the path
        folder_path.mkdir(parents=True, exist_ok=True)

    for file_name in tqdm(file_names):
        file_path = Path.joinpath(local_path, file_name)
				# Create folder for parent directory
        file_path.parent.mkdir(parents=True, exist_ok=True)
        s3_client.download_file(
            bucket_name,
            file_name,
            str(file_path)
        )

Here the files are downloaded as a zipped file using the `get_file_folders` and `download_files` function

The local directory for the downloaded dataset is also defined.

*PLEASE NOTE*
The name of the s3 bucket file should be inputted as shown in AWS



In [None]:
client = boto3.client('s3', config=Config(signature_version=UNSIGNED))
file_names, folders = get_file_folders(client, 'INSERT NAME OF THE DATASET')

In [None]:
download_files(
        client,
        'INSERT NAME OF THE DATASET',
        "LOCAL DIRECTORY TO DOWNLOAD DATASET",
        file_names,
        folders
    )

Unzipping the Downlaoded dataset to explore the content

In [None]:
import zipfile
# specify the path of the zip file
zip_file_path = "images.zip"
# create a ZipFile object
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    # extract all files to the current directory
    zip_ref.extractall()