# Parsing .sb3 files using Python

## Extracting information from a .sb3 file
Let's assume that we have a simple `hello_world.sb3` file available and we need to extract some information from it. The first thing that we need to do is understand how a .sb3 works behind the scenes, identify main entities and produce simple text lists and reports. In order to do that we convert the .sb3 file to a zip file, unzip it and access its contents (NOTE: This can easily be done in 1-2 lines using Python modules but, for now, let's do it the old fashioned way so that we have a good understanding of every step).

In [1]:
!rm -rf ./hello_world/
!cp ../sb3_files/hello_world.sb3 ./hello_world.zip
!unzip ./hello_world.zip -d ./hello_world/
!rm -rf ./hello_world.zip

Archive:  ./hello_world.zip
  inflating: ./hello_world/project.json  
  inflating: ./hello_world/83a9787d4cb6f3b7632b4ddfebf74367.wav  
  inflating: ./hello_world/83c36d806dc92327b9e7049a565c6bff.wav  
  inflating: ./hello_world/8f05636838ccb48ecaaa50fa33e286e1.svg  
  inflating: ./hello_world/bcf454acf82e4504149f7ffe07081dbc.svg  
  inflating: ./hello_world/0fb9be3e8397c983338cb71dc84d0b25.svg  


After unzipping, we take a look at the produced files. The `.svg` files correspond to the costumes of all of the project's sprites and the `.wav` files correspond to the sprite's sounds. Finally, the `project.json` file contains all of the important infromation that we need, such as sprites, sprite's blocks, metadata etc.

## The `project.json` file
Let's take a deeper dive in the `project.json` file. The file follows the `.json` format (aka a dictionary format) and is composed of the following keys:
1. **target:** This key corresponds to a list of targets. Each target is essentially another dictionary that represents a Scratch sprite/object (including the stage itself) and contains useful information such as the sprite's name, blocks, costumes, sounds, etc. As you might have expected, the length of the list is equal to the number of sprites found on the Scratch project.
2. **monitors:** This section describes data that is visible on the stage, such as variable watchers, list viewers etc.  
3. **extensions:** This key corresponds to a list of extensions that the current Scratch program uses (e.g. pen, music, etc.).
4. **meta:** This final key corresponds to a dictionary that contains various metadata about the project, such as Scratch version, agent and virtual machine. 

In [2]:
import json

# Load the json file
project_json = None
with open('./hello_world/project.json', 'r', encoding='utf-8') as f:
    project_json = json.load(f)

# Print its contents
for key, value in project_json.items():
    print(f"{key}:")
    print(f"\t{value}")

targets:
	[{'isStage': True, 'name': 'Stage', 'variables': {'`jEk@4|i[#Fk?(8x)AV.-my variable': ['my variable', 0]}, 'lists': {}, 'broadcasts': {}, 'blocks': {}, 'comments': {}, 'currentCostume': 0, 'costumes': [{'assetId': '8f05636838ccb48ecaaa50fa33e286e1', 'name': 'backdrop2', 'bitmapResolution': 1, 'md5ext': '8f05636838ccb48ecaaa50fa33e286e1.svg', 'dataFormat': 'svg', 'rotationCenterX': 188.829980484665, 'rotationCenterY': 130.15914249041822}], 'sounds': [{'assetId': '83a9787d4cb6f3b7632b4ddfebf74367', 'name': 'pop', 'dataFormat': 'wav', 'format': '', 'rate': 48000, 'sampleCount': 1123, 'md5ext': '83a9787d4cb6f3b7632b4ddfebf74367.wav'}], 'volume': 100, 'layerOrder': 0, 'tempo': 60, 'videoTransparency': 50, 'videoState': 'on', 'textToSpeechLanguage': None}, {'isStage': False, 'name': 'Sprite1', 'variables': {}, 'lists': {}, 'broadcasts': {}, 'blocks': {'ncBy$Ax_kKb|E9$f9tAv': {'opcode': 'event_whenflagclicked', 'next': 'DKJj)mi[,`O}sSW|:]lt', 'parent': None, 'inputs': {}, 'fields': 

## Targets
Having codeOrama and eCodeOrama in mind, I think that we should focus on understanding how targets are implemented, how we can edit them and, most importantly, how we can extract information from them.

As I mentioned before, a target is nothing more than a dictionary depicting a Scratch object. More specifically, a target contains the following key properties:
* **isStage:** A boolean value to check if an object is the stage or not.
* **name:** The name of the object.
* **variables:** A dictionary of variables belonging to the object. The keys are unique internal IDs and the values lists in the form of `[variable name, value]`. 
* **lists:** A dictionary of lists belonging to the object. The keys are unique internal IDs and the values lists in the form of `[list name, value]`.
* **broadcasts:** A dictionary that maps message passing between objects.
* **blocks:** A dictionary where all the code is stored. We'll look into this further in the next section.
* **comments:** A dictionary containing Scratch comments.
* **currentCostume:** Index of the currently displayed costume.
* **costumes:** A list of the object's costumes. Each costume is depicted as a dictionary.
* **sounds:** A list of the object's sounds. Each sound is depicted as a dictionary.
* **volume:** The object's current volume level (integer between 0 and 100).
* **layerOrder:** Z-index for rendering. The higher the number the more in front it is.

### Stage-specific fields
If the `isStage` setting is set to `True`, then these additional keys are present:
* **tempo:** Default BPM for music blocks.
* **videoTransparency:** Opacity of webcam feed.
* **videoState:** `on`, `off`, or `on-flipped` depending on the video state.
* **textToSpeechLanguage:** Text to speech language code (if used)

### Sprite-specific fields
If the `isStage` setting is set to `False`, then these additional keys are present:
* **visible:** Boolean value to show whether the sprite is currently shown or not.
* **x, y:** Position on the stage.
* **size:** Scaling percentage (integer).
* **direction:** Rotation angle in degrees.
* **draggable:** Whether the user can drag it during runtime or not.
* **rotationStyle:** "all around", "left-right", or "don't rotate"

In [3]:
import json

# Load the json file
project_json = None
with open('./hello_world/project.json', 'r', encoding='utf-8') as f:
    project_json = json.load(f)

# Get the list of objects
object_list = project_json["targets"]

# Print all objects
for object in object_list:
    name =  object["name"]
    print(f"-----Object '{name}'-----")
    for key, value in object.items():
        print(f"{key}:")
        print(f"\t{value}")
    print(f"-------------------------\n\n")

-----Object 'Stage'-----
isStage:
	True
name:
	Stage
variables:
	{'`jEk@4|i[#Fk?(8x)AV.-my variable': ['my variable', 0]}
lists:
	{}
broadcasts:
	{}
blocks:
	{}
comments:
	{}
currentCostume:
	0
costumes:
	[{'assetId': '8f05636838ccb48ecaaa50fa33e286e1', 'name': 'backdrop2', 'bitmapResolution': 1, 'md5ext': '8f05636838ccb48ecaaa50fa33e286e1.svg', 'dataFormat': 'svg', 'rotationCenterX': 188.829980484665, 'rotationCenterY': 130.15914249041822}]
sounds:
	[{'assetId': '83a9787d4cb6f3b7632b4ddfebf74367', 'name': 'pop', 'dataFormat': 'wav', 'format': '', 'rate': 48000, 'sampleCount': 1123, 'md5ext': '83a9787d4cb6f3b7632b4ddfebf74367.wav'}]
volume:
	100
layerOrder:
	0
tempo:
	60
videoTransparency:
	50
videoState:
	on
textToSpeechLanguage:
	None
-------------------------


-----Object 'Sprite1'-----
isStage:
	False
name:
	Sprite1
variables:
	{}
lists:
	{}
broadcasts:
	{}
blocks:
	{'ncBy$Ax_kKb|E9$f9tAv': {'opcode': 'event_whenflagclicked', 'next': 'DKJj)mi[,`O}sSW|:]lt', 'parent': None, 'inputs

## Target blocks
It is essential to understand how commands are actually stored inside the `blocks` field of each `target`. Each block has a unique id that is used as a key in the `blocks` dictionary. That unique id is mapped into another dictionary that contains important information regarding the block. Let's go over each field of that dictionary so that we have a better grasp of how information is stored:
* **opcode:** A string that identifies the type of block
* **next:** The unique id of the block that succeeds this one.
* **parent:** The unique id of the block that precedes this one.
* **inputs:** A dictionary of the input parameters that the block needs.
* **fields:** A dictionary of dropdowns or fixed fields that the block has.
* **shadow:** A boolean value that shows if a block has a shadow block or not. Shadow blocks are placeholder blocks that contain default values, such as the `10` in the `move 10 steps` block.
* **topLevel:** A boolean value that shows whether a block is a top-level block or is found inside a script.

### Block-specific fields
Some blocks might contain extra information. For example, the `say "Hello World!" for 2 seconds` block also contains `x` and `y` parameters that indicate the position of the string in the Scratch editor workspace.

In [4]:
import json

# Load the json file
project_json = None
with open('./hello_world/project.json', 'r', encoding='utf-8') as f:
    project_json = json.load(f)

# Get the list of objects
object_list = project_json["targets"]

# Print all blocks
for object in object_list:
    name =  object["name"]
    print(f"-----Object '{name}'-----")

    # Get current object's blocks
    blocks_dict = object["blocks"]

    # Print each block
    for block_id, data in blocks_dict.items():
        block_code = data["opcode"]
        print(f"Block with id = '{block_id}' and code = '{block_code}':")
        for key, value in data.items():
            print(f"\t{key}\t{value}")
        print()
    print("------------------------\n\n")

-----Object 'Stage'-----
------------------------


-----Object 'Sprite1'-----
Block with id = 'ncBy$Ax_kKb|E9$f9tAv' and code = 'event_whenflagclicked':
	opcode	event_whenflagclicked
	next	DKJj)mi[,`O}sSW|:]lt
	parent	None
	inputs	{}
	fields	{}
	shadow	False
	topLevel	True
	x	464
	y	277

Block with id = 'DKJj)mi[,`O}sSW|:]lt' and code = 'looks_sayforsecs':
	opcode	looks_sayforsecs
	next	None
	parent	ncBy$Ax_kKb|E9$f9tAv
	inputs	{'MESSAGE': [1, [10, 'Hello World!']], 'SECS': [1, [4, '2']]}
	fields	{}
	shadow	False
	topLevel	False

------------------------




## Parsing
I will now show a script that completely parses the contents of a .sb3 file and prints them clearly on the terminal:

In [5]:
import json

# Load the JSON file
project_json = None
with open('./hello_world/project.json', 'r', encoding='utf-8') as f:
    project_json = json.load(f)

object_list = project_json["targets"]

# Get some statistics
object_count = len(object_list)
costume_count = 0
sound_count = 0
variable_count = 0
list_count = 0
for object in object_list:
    costume_count += len(object["costumes"])
    sound_count += len(object["sounds"])
    variable_count += len(object["variables"])
    list_count += len(object["lists"])
print("This Scratch project consists of:")
print(f"{object_count} objects")
print(f"{costume_count} costumes")
print(f"{sound_count} sounds")
print(f"{variable_count} variables")
print(f"{list_count} lists\n\n")

# Print all objects and blocks in detail
print("Printing all objects and blocks in detail...")
for object in object_list:
    name =  object["name"]
    print(f"-----Object '{name}'-----")

    # Print current object's fields
    for key, value in object.items():
        print(f"{key}:")
        
        # Print all fields except blocks normally
        if key != "blocks":
            print(f"\t{value}")
            continue

        # Print blocks
        for block_id, data in value.items():
            block_code = data["opcode"]
            print(f"\tBlock with id = '{block_id}' and code = '{block_code}':")
            for key, value in data.items():
                print(f"\t\t{key}\t{value}")
            print()
        

    print(f"-------------------------\n\n")

This Scratch project consists of:
2 objects
3 costumes
2 sounds
1 variables
0 lists


Printing all objects and blocks in detail...
-----Object 'Stage'-----
isStage:
	True
name:
	Stage
variables:
	{'`jEk@4|i[#Fk?(8x)AV.-my variable': ['my variable', 0]}
lists:
	{}
broadcasts:
	{}
blocks:
comments:
	{}
currentCostume:
	0
costumes:
	[{'assetId': '8f05636838ccb48ecaaa50fa33e286e1', 'name': 'backdrop2', 'bitmapResolution': 1, 'md5ext': '8f05636838ccb48ecaaa50fa33e286e1.svg', 'dataFormat': 'svg', 'rotationCenterX': 188.829980484665, 'rotationCenterY': 130.15914249041822}]
sounds:
	[{'assetId': '83a9787d4cb6f3b7632b4ddfebf74367', 'name': 'pop', 'dataFormat': 'wav', 'format': '', 'rate': 48000, 'sampleCount': 1123, 'md5ext': '83a9787d4cb6f3b7632b4ddfebf74367.wav'}]
volume:
	100
layerOrder:
	0
tempo:
	60
videoTransparency:
	50
videoState:
	on
textToSpeechLanguage:
	None
-------------------------


-----Object 'Sprite1'-----
isStage:
	False
name:
	Sprite1
variables:
	{}
lists:
	{}
broadcasts:
	{