<a href="https://colab.research.google.com/github/catafest/colab_google/blob/master/catafest_070.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Understanding the `.proto` File Definition

The **<span style="color:red;">`.proto` file</span>** definition serves as a blueprint or schema for defining the structure of your data in Protocol Buffers. It plays a crucial role in ensuring data consistency and enabling efficient serialization and deserialization across different programming languages and systems.

Here's a breakdown of its key functions:

1.  **Defines Message Structure:**
    *   It specifies the fields within a message.
    *   It defines the data types for each field (e.g., `string`, `int32`, `bool`, `repeated`, nested messages).
    *   It assigns a unique number (tag) to each field, which is used for identifying fields in the binary format.

2.  **Ensures Data Consistency:**
    *   By providing a clear and language-agnostic definition of your data structure, it helps maintain consistency when exchanging data between systems or applications written in different programming languages.

3.  **Used for Code Generation:**
    *   The `.proto` file is the input for the Protocol Buffer compiler (`protoc`).
    *   `protoc` generates code in various programming languages (like Python, Java, C++, Go, etc.) based on the `.proto` definition.
    *   The generated code includes classes and functions that make it easy to serialize (convert data into a binary format for storage or transmission) and deserialize (convert the binary data back into language-specific objects) your defined messages.

In essence, this file, written in the Protocol Buffers language, helps organize your data efficiently for serialization and deserialization, forming the foundation for structured data exchange using Protocol Buffers.

In [25]:
%%writefile simple.proto
syntax = "proto3";

message SimpleMessage {
  string name = 1;
  int32 id = 2;
  bool is_valid = 3;
}

Overwriting simple.proto


In [26]:
# Compile the proto file to generate Python code
!protoc --python_out=. simple.proto

In [27]:
# Import the generated Python code
import simple_pb2

# Create a SimpleMessage object
message = simple_pb2.SimpleMessage()
message.name = "Test Message"
message.id = 123
message.is_valid = True

# Serialize the message to a byte string
serialized_message = message.SerializeToString()
print("Serialized message:")
print(serialized_message)

# Deserialize the message from the byte string
new_message = simple_pb2.SimpleMessage()
new_message.ParseFromString(serialized_message)

# Print the deserialized message
print("\nDeserialized message:")
print(f"Name: {new_message.name}")
print(f"ID: {new_message.id}")
print(f"Is Valid: {new_message.is_valid}")

Serialized message:
b'\n\x0cTest Message\x10{\x18\x01'

Deserialized message:
Name: Test Message
ID: 123
Is Valid: True


In [28]:
# Display the content of the generated Python file
!cat simple_pb2.py

# -*- coding: utf-8 -*-
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: simple.proto

from google.protobuf import descriptor as _descriptor
from google.protobuf import message as _message
from google.protobuf import reflection as _reflection
from google.protobuf import symbol_database as _symbol_database
# @@protoc_insertion_point(imports)

_sym_db = _symbol_database.Default()




DESCRIPTOR = _descriptor.FileDescriptor(
  name='simple.proto',
  package='',
  syntax='proto3',
  serialized_options=None,
  create_key=_descriptor._internal_create_key,
  serialized_pb=b'\n\x0csimple.proto\";\n\rSimpleMessage\x12\x0c\n\x04name\x18\x01 \x01(\t\x12\n\n\x02id\x18\x02 \x01(\x05\x12\x10\n\x08is_valid\x18\x03 \x01(\x08\x62\x06proto3'
)




_SIMPLEMESSAGE = _descriptor.Descriptor(
  name='SimpleMessage',
  full_name='SimpleMessage',
  filename=None,
  file=DESCRIPTOR,
  containing_type=None,
  create_key=_descriptor._internal_create_key,
  fields=[
    _descriptor.FieldDescrip

This not work

In [29]:
# Compile the proto file to generate Python code with a custom name
!protoc --python_out=./catafest_001_test.py simple.proto

./catafest_001_test.py/: Not a directory


# Task
Generate multiple Python files from "simple.proto" using the best practice for organizing the output.

## Create output directory

### Subtask:
Create a directory where the generated Python files will be stored (e.g., `proto_out`).


**Reasoning**:
Create a directory to store the generated Python files using the `os` module.



In [33]:
import os

output_dir = "proto_out"
os.makedirs(output_dir, exist_ok=True)

## Compile to directory

### Subtask:
Compile the `simple.proto` file and direct the output to the created directory.


**Reasoning**:
Compile the proto file and output the generated Python code to the `proto_out` directory.



In [41]:
!protoc --python_out={output_dir} simple.proto

In [42]:
import os

output_dir = "proto_out"
os.makedirs(output_dir, exist_ok=True)

In [43]:
# Compile the proto file and output to the created directory
!protoc --python_out={output_dir} simple.proto

In [44]:
import sys
import os

# Add the output directory to the system path
output_dir = "proto_out"
if output_dir not in sys.path:
    sys.path.append(output_dir)

# Now you can import the generated module
import simple_pb2

# Use the generated code as before
message = simple_pb2.SimpleMessage()
message.name = "Message from proto_out"
message.id = 456
message.is_valid = False

# Serialize and deserialize (optional, but demonstrates functionality)
serialized_message = message.SerializeToString()
new_message = simple_pb2.SimpleMessage()
new_message.ParseFromString(serialized_message)

print("Using message from proto_out:")
print(f"Name: {new_message.name}")
print(f"ID: {new_message.id}")
print(f"Is Valid: {new_message.is_valid}")

# Clean up the path (optional, good practice if not permanently needed)
# sys.path.remove(output_dir)

Using message from proto_out:
Name: Message from proto_out
ID: 456
Is Valid: False


You can use a new one more complex example

In [45]:
%%writefile complex_001.proto
syntax = "proto3";

message NestedMessage {
  string description = 1;
  int32 value = 2;
}

message ComplexMessage {
  string main_name = 1;
  repeated int32 id_list = 2;
  NestedMessage details = 3;
  repeated NestedMessage nested_list = 4;
}

Writing complex_001.proto


In [47]:
# Compile the complex proto file and output to the created directory
!protoc --python_out={output_dir} complex_001.proto

In [48]:
import sys
import os

# Ensure the output directory is in the system path
output_dir = "proto_out"
if output_dir not in sys.path:
    sys.path.append(output_dir)

# Import the generated Python code for complex_001.proto
import complex_001_pb2

# Create a ComplexMessage object
complex_message = complex_001_pb2.ComplexMessage()
complex_message.main_name = "Main Complex Message"

# Add values to the repeated field
complex_message.id_list.extend([101, 102, 103])

# Create and populate the nested message
complex_message.details.description = "Details for the complex message"
complex_message.details.value = 99

# Add multiple nested messages to the repeated nested field
nested1 = complex_001_pb2.NestedMessage()
nested1.description = "First nested item"
nested1.value = 1

nested2 = complex_001_pb2.NestedMessage()
nested2.description = "Second nested item"
nested2.value = 2

complex_message.nested_list.extend([nested1, nested2])

# Serialize the message
serialized_complex_message = complex_message.SerializeToString()
print("Serialized complex message:")
print(serialized_complex_message)

# Deserialize the message
new_complex_message = complex_001_pb2.ComplexMessage()
new_complex_message.ParseFromString(serialized_complex_message)

# Print the deserialized message
print("\nDeserialized complex message:")
print(f"Main Name: {new_complex_message.main_name}")
print(f"ID List: {list(new_complex_message.id_list)}") # Convert repeated field to list for printing
print(f"Details Description: {new_complex_message.details.description}")
print(f"Details Value: {new_complex_message.details.value}")
print("Nested List:")
for item in new_complex_message.nested_list:
    print(f"  Description: {item.description}, Value: {item.value}")

# Clean up the path (optional)
# if output_dir in sys.path:
#     sys.path.remove(output_dir)

Serialized complex message:
b'\n\x14Main Complex Message\x12\x03efg\x1a#\n\x1fDetails for the complex message\x10c"\x15\n\x11First nested item\x10\x01"\x16\n\x12Second nested item\x10\x02'

Deserialized complex message:
Main Name: Main Complex Message
ID List: [101, 102, 103]
Details Description: Details for the complex message
Details Value: 99
Nested List:
  Description: First nested item, Value: 1
  Description: Second nested item, Value: 2


# Task
Generate a Python script that uses the `simple_pb2` module located in the `proto_out` directory and demonstrate how to execute this script from a shell command within a notebook cell.

## Create a python script

### Subtask:
Write a Python script that imports and uses the generated `simple_pb2` module from the `proto_out` directory.


**Reasoning**:
Write a Python script that imports and uses the `simple_pb2` module from the `proto_out` directory, then save it to a file.



In [50]:
script_content = """
import sys
import os

# Add the proto_out directory to the system path to find simple_pb2
output_dir = "proto_out"
if output_dir not in sys.path:
    sys.path.append(output_dir)

try:
    import simple_pb2

    # Create a SimpleMessage object
    message = simple_pb2.SimpleMessage()
    message.name = "Message from script"
    message.id = 789
    message.is_valid = False

    # Print the message details
    print("Using SimpleMessage from script:")
    print(f"Name: {message.name}")
    print(f"ID: {message.id}")
    print(f"Is Valid: {message.is_valid}")

except ImportError as e:
    print(f"Error importing simple_pb2: {e}")
    print("Please ensure that simple.proto has been compiled and the 'proto_out' directory exists and is in the system path.")

finally:
    # Optional: Clean up the path if not permanently needed
    if output_dir in sys.path:
        sys.path.remove(output_dir)
"""

with open("use_simple_proto.py", "w") as f:
    f.write(script_content)

print("Python script 'use_simple_proto.py' created.")

Python script 'use_simple_proto.py' created.


## Execute the script in shell

### Subtask:
Use a shell command (`!python`) in a notebook cell to run the Python script created in the previous step.


**Reasoning**:
Use a shell command to execute the generated Python script.



In [52]:
!python use_simple_proto.py

Using SimpleMessage from script:
Name: Message from script
ID: 789
Is Valid: False


## Explain the process

### Subtask:
Briefly explain what was done and how it allows using the generated code from a shell context.


**Reasoning**:
Explain how the previous steps created a Python script that uses the generated Protocol Buffers code and how it was executed using a shell command within the notebook.



In [54]:
%%markdown
This step involved creating a Python script named `use_simple_proto.py`. This script was designed to demonstrate how to import and utilize the `simple_pb2` module, which was generated from the `simple.proto` file and placed in the `proto_out` directory during previous steps.

To make the generated module available to the script, the `proto_out` directory was added to Python's system path (`sys.path`) within the script itself.

We then executed this script directly from a notebook cell using the `!python use_simple_proto.py` shell command. This command invokes the Python interpreter to run the specified script, and its output is displayed within the notebook, effectively demonstrating how to leverage generated code from a separate script and execute it within the notebook environment using shell commands.

This step involved creating a Python script named `use_simple_proto.py`. This script was designed to demonstrate how to import and utilize the `simple_pb2` module, which was generated from the `simple.proto` file and placed in the `proto_out` directory during previous steps.

To make the generated module available to the script, the `proto_out` directory was added to Python's system path (`sys.path`) within the script itself.

We then executed this script directly from a notebook cell using the `!python use_simple_proto.py` shell command. This command invokes the Python interpreter to run the specified script, and its output is displayed within the notebook, effectively demonstrating how to leverage generated code from a separate script and execute it within the notebook environment using shell commands.


## Summary:

### Data Analysis Key Findings

*   A Python script named `use_simple_proto.py` was created to import and use the `simple_pb2` module generated from a Protocol Buffers definition.
*   The script adds the `proto_out` directory to `sys.path` to allow Python to locate the generated module.
*   The script demonstrates creating an instance of the `SimpleMessage` class from the `simple_pb2` module, setting its fields (`name`, `id`, `is_valid`), and printing these values.
*   Error handling for `ImportError` is included in the script to inform the user if the `simple_pb2` module is not found.
*   The `use_simple_proto.py` script was successfully executed directly from a notebook cell using the `!python use_simple_proto.py` shell command.
*   Executing the script via the shell command in the notebook displayed the script's output, confirming its successful operation and the ability to use generated code from a separate script executed in this manner.

### Insights or Next Steps

*   This process provides a clear method for integrating generated code (like Protocol Buffers stubs) into workflows within a notebook environment, allowing for testing or utilization of such code without embedding the entire logic directly in the notebook cells.
*   Consider including error handling in the notebook cell itself to catch potential issues during the execution of the external script, providing more robust feedback to the user.
