<a href="https://colab.research.google.com/github/bharath5807/EdgeAI_Simulator_dataset/blob/main/notebooks/loading-datasets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Loading Datasets

One of the first things you need to know to dive into the EdgeSimPy universe is to load datasets. Once you understand how EdgeSimPy loads data, you can use existing datasets or even build your own simulated scenarios to prototype resource management strategies. This tutorial will guide you through the different ways of loading data supported by EdgeSimPy.

Before digging into EdgeSimPy's load dataset features, we must load the simulator modules. We can do that with the following command:

In [1]:
try:
    # Importing EdgeSimPy components
    from edge_sim_py import *
    import networkx as nx
    import msgpack

    # Importing Matplotlib, Pandas, and NumPy for logs parsing and visualization
    import matplotlib.pyplot as plt
    import pandas as pd
    import numpy as np

except ModuleNotFoundError:
    # Downloading EdgeSimPy binaries from GitHub (the "-q" parameter suppresses Pip's output. You check the full logs by removing it)
    %pip install -q git+https://github.com/EdgeSimPy/EdgeSimPy.git

    # Downloading Pandas, NumPy, and Matplotlib (these are not directly used here, but they can be useful for logs parsing and visualization)
    %pip install -q pandas==1.3.5
    %pip install -q numpy==1.26.4
    %pip install -q matplotlib==3.5.0

    # Importing EdgeSimPy components and its built-in libraries (NetworkX and MessagePack)
    from edge_sim_py import *
    import networkx as nx
    import msgpack

    # Importing Matplotlib, Pandas, and NumPy for logs parsing and visualization
    import matplotlib.pyplot as plt
    import pandas as pd
    import numpy as np

## Loading Datasets from URLs

With the rise of open science and reproducibility, researchers are increasingly publishing online the research artifacts of their papers. Considering that, EdgeSimPy allows you to load datasets through public URLs without burden.

To load external datasets into EdgeSimPy, we simply need to call the `initialize()` method informing the dataset's URL in the `input_file` attribute, as shown below.


In [2]:
# Creating a Simulator object
simulator = Simulator()

# Loading the dataset file from the external JSON file
simulator.initialize(input_file="https://raw.githubusercontent.com/EdgeSimPy/edgesimpy-tutorials/master/datasets/sample_dataset1.json")

# Displaying some of the objects loaded from the dataset
for user in User.all():
    print(f"{user}. Coordinates: {user.coordinates}")

User_1. Coordinates: [6, 0]
User_2. Coordinates: [3, 1]
User_3. Coordinates: [2, 2]
User_4. Coordinates: [6, 0]
User_5. Coordinates: [4, 2]
User_6. Coordinates: [3, 3]


## Loading Datasets from Local Files

EdgeSimPy also facilitates loading data from local dataset files. In this case, we just need to call the `initialize()` method, passing the location of the local dataset file in the `input_file` attribute.

EdgeSimPy automatically identifies both absolute paths (e.g., `/home/user/my_research/dataset.json`) and relative paths (e.g., `my_research/dataset.json`). In the code below, EdgeSimPy loads a dataset from a local file called `dataset.json`.

Please notice that we must download the `dataset.json` file before calling the `initialize()` method, or it will not work.


In [3]:
!curl https://raw.githubusercontent.com/EdgeSimPy/edgesimpy-tutorials/master/datasets/sample_dataset1.json --output dataset.json

# Creating a Simulator object
simulator = Simulator()

# Loading the dataset from the local "dataset.json" file
simulator.initialize(input_file="dataset.json")

# Displaying some of the objects loaded from the dataset
for edge_server in EdgeServer.all():
    print(f"{edge_server}. CPU Capacity: {edge_server.cpu} cores")

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  142k  100  142k    0     0   609k      0 --:--:-- --:--:-- --:--:--  611k
EdgeServer_1. CPU Capacity: 8 cores
EdgeServer_2. CPU Capacity: 8 cores
EdgeServer_3. CPU Capacity: 8 cores
EdgeServer_4. CPU Capacity: 8 cores
EdgeServer_5. CPU Capacity: 12 cores
EdgeServer_6. CPU Capacity: 12 cores


## Loading Datasets from Python Dictionaries

In addition to allowing us to load datasets from external and local files written in JSON format, EdgeSimPy also reads datasets encoded as Python dictionaries. To use that feature, we just need to pass a valid Python dictionary to the `input_file` attribute of the `initialize()` method. In the example below, EdgeSimPy reads a dataset from a Python dictionary containing a couple of users. For simplicity, users only have two attributes: `id` and `coordinates`—regular `User` objects would have other attributes.


In [4]:
# Creating a Python dictionary representing a sample dataset with a couple of users
my_simplified_dataset = {
    "User": [
        {
            "attributes": {
                "id": 1,
                "coordinates": [
                    1,
                    1
                ]
            },
            "relationships": {}
        },
        {
            "attributes": {
                "id": 2,
                "coordinates": [
                    3,
                    3
                ]
            },
            "relationships": {}
        },
        {
            "attributes": {
                "id": 3,
                "coordinates": [
                    5,
                    1
                ]
            },
            "relationships": {}
        },
        {
            "attributes": {
                "id": 4,
                "coordinates": [
                    0,
                    0
                ]
            },
            "relationships": {}
        }
    ]
}

# Creating a Simulator object
simulator = Simulator()

# Loading the dataset from the dictionary "my_simplified_dataset"
simulator.initialize(input_file=my_simplified_dataset)

# Displaying the objects loaded from the dataset
for user in User.all():
    print(f"{user}. Coordinates: {user.coordinates}")

User_1. Coordinates: [1, 1]
User_2. Coordinates: [3, 3]
User_3. Coordinates: [5, 1]
User_4. Coordinates: [0, 0]
