# Chapter 6: String Mastery for VLSI Professionals üî§

## üéØ **Learning Objectives**
By the end of this chapter, you will master Python strings for professional VLSI automation:

### **Core String Concepts**
- String creation, immutability, and memory management
- Unicode handling for international design teams
- String indexing, slicing, and iteration patterns

### **Essential String Methods (25+ methods)**
- **Creation & Conversion**: `str()`, `.encode()`, `.decode()`
- **Search & Find**: `.find()`, `.rfind()`, `.index()`, `.count()`
- **Validation**: `.startswith()`, `.endswith()`, `.isdigit()`, `.isalpha()`
- **Transformation**: `.upper()`, `.lower()`, `.title()`, `.swapcase()`
- **Cleaning**: `.strip()`, `.lstrip()`, `.rstrip()`, `.replace()`
- **Splitting & Joining**: `.split()`, `.rsplit()`, `.join()`, `.partition()`
- **Formatting**: `.format()`, f-strings, `.center()`, `.ljust()`, `.rjust()`

### **Advanced VLSI Applications**
- **Hierarchical Path Processing**: Module and signal name parsing
- **Report Analysis**: Timing, power, and area report extraction
- **Script Generation**: TCL/SDC/UPF command automation
- **File Management**: Design database organization
- **Performance Optimization**: Large-scale string processing

### **Professional Techniques**
- Regular expressions for complex pattern matching
- String performance optimization and memory management
- Input validation and security for production environments
- Error handling and robust parsing strategies

---

## üîß **Why Strings Matter in VLSI**

In VLSI design automation, strings are everywhere:
- **Design Hierarchies**: `cpu_core/alu_unit/adder_inst/sum[31:0]`
- **File Paths**: `/designs/cpu_core/implementation/place_route/cpu_core.def`
- **Tool Commands**: `create_clock -period 10.0 [get_ports clk]`
- **Report Parsing**: Extracting timing violations from million-line reports
- **Configuration Files**: Technology libraries and design constraints

Mastering strings enables powerful automation that saves hours of manual work daily.

In [None]:
# STRING FUNDAMENTALS AND CREATION METHODS
# =========================================
# Deep dive into string creation, properties, and memory behavior

print("üî§ STRING FUNDAMENTALS AND CREATION METHODS")
print("=" * 50)

# =============================================================================
# STRING CREATION METHODS - COMPREHENSIVE COVERAGE
# =============================================================================

print("\nüìù STRING CREATION METHODS:")

# Method 1: String literals (most common)
design_name = "cpu_core"  # Single quotes
technology = 'tsmc28'     # Double quotes (same as single)
print(f"   Literal creation: '{design_name}' and '{technology}'")

# Method 2: Triple quotes for multi-line strings
verilog_module = """
module cpu_core (
    input wire clk,
    input wire reset_n,
    input wire [31:0] instruction,
    output reg [31:0] result
);
    // Module implementation
endmodule
"""
print(f"   Multi-line string length: {len(verilog_module)} characters")

# Method 3: str() constructor
instance_count = 125000
count_string = str(instance_count)
binary_string = str(bin(255))  # Convert binary to string
print(f"   str() constructor: '{count_string}' and '{binary_string}'")

# Method 4: Raw strings (no escape processing)
file_path_windows = r"C:\designs\cpu_core\netlist\cpu_core.v"
regex_pattern = r"([a-zA-Z_]\w*)/(\w+)\[(\d+):(\d+)\]"
print(f"   Raw strings: {file_path_windows}")
print(f"   Regex pattern: {regex_pattern}")

# Method 5: Formatted strings (f-strings, format, %)
corner = "ss_0p72v_125c"
temperature = 125
voltage = 0.72

# f-string (Python 3.6+) - preferred method
netlist_f = f"/designs/{design_name}/netlist/{design_name}_{corner}.v"

# .format() method - compatible with older Python
sdc_format = "/designs/{design}/constraints/{design}_{corner}.sdc".format(
    design=design_name, corner=corner
)

# % formatting (legacy, but still used)
report_percent = "/designs/%s/reports/timing_%s_T%.0fC.rpt" % (design_name, corner, temperature)

print(f"   f-string: {netlist_f}")
print(f"   .format(): {sdc_format}")
print(f"   % format: {report_percent}")

# =============================================================================
# STRING PROPERTIES AND IMMUTABILITY
# =============================================================================

print(f"\nüîç STRING PROPERTIES AND IMMUTABILITY:")

# Basic string properties
signal_path = "cpu_core/alu_unit/adder_inst/sum[31:0]"

print(f"   Signal path: '{signal_path}'")
print(f"   Length: {len(signal_path)} characters")
print(f"   Memory size: {signal_path.__sizeof__()} bytes")
print(f"   Type: {type(signal_path).__name__}")
print(f"   ID (memory location): {id(signal_path)}")

# String immutability demonstration
original_signal = "cpu_core/reg_file"
print(f"\n   Original signal: '{original_signal}' (ID: {id(original_signal)})")

# Strings are immutable - operations create new strings
modified_signal = original_signal.replace("reg_file", "cache")
print(f"   Modified signal: '{modified_signal}' (ID: {id(modified_signal)})")
print(f"   Original unchanged: '{original_signal}' (ID: {id(original_signal)})")

# This would cause an error (uncomment to see):
# original_signal[0] = 'X'  # TypeError: 'str' object does not support item assignment

# Memory efficiency - Python interns small strings
small_str1 = "clk"
small_str2 = "clk"
print(f"\n   String interning:")
print(f"   small_str1 ID: {id(small_str1)}")
print(f"   small_str2 ID: {id(small_str2)}")
print(f"   Same object: {small_str1 is small_str2}")  # True for small strings

# =============================================================================
# STRING INDEXING AND SLICING
# =============================================================================

print(f"\nüéØ STRING INDEXING AND SLICING:")

# Signal name with bus notation
bus_signal = "data_bus[31:0]"
print(f"   Bus signal: '{bus_signal}'")

# Positive indexing (0-based)
print(f"   First char: bus_signal[0] = '{bus_signal[0]}'")
print(f"   Fifth char: bus_signal[4] = '{bus_signal[4]}'")
print(f"   Last char: bus_signal[{len(bus_signal)-1}] = '{bus_signal[len(bus_signal)-1]}'")

# Negative indexing (from end)
print(f"   Last char: bus_signal[-1] = '{bus_signal[-1]}'")
print(f"   Second last: bus_signal[-2] = '{bus_signal[-2]}'")

# Slicing [start:end:step]
print(f"\n   Slicing examples:")
print(f"   First 4 chars: bus_signal[:4] = '{bus_signal[:4]}'")
print(f"   Last 5 chars: bus_signal[-5:] = '{bus_signal[-5:]}'")
print(f"   Middle section: bus_signal[5:8] = '{bus_signal[5:8]}'")
print(f"   Every 2nd char: bus_signal[::2] = '{bus_signal[::2]}'")
print(f"   Reverse string: bus_signal[::-1] = '{bus_signal[::-1]}'")

# Practical VLSI slicing
hierarchy_path = "cpu_core/alu_unit/adder_inst/carry_out"
print(f"\n   Hierarchy path: '{hierarchy_path}'")
print(f"   Top module: '{hierarchy_path.split('/')[0]}'")
print(f"   Signal name: '{hierarchy_path.split('/')[-1]}'")
print(f"   Parent path: '{'/'.join(hierarchy_path.split('/')[:-1])}'")

# Extract bus width from signal name
def extract_bus_width(signal_name):
    """Extract bus width from signal like 'data[31:0]'"""
    if '[' in signal_name and ']' in signal_name:
        start_bracket = signal_name.find('[')
        end_bracket = signal_name.find(']')
        bus_range = signal_name[start_bracket+1:end_bracket]
        if ':' in bus_range:
            msb, lsb = bus_range.split(':')
            return int(msb) - int(lsb) + 1
    return 1

test_signals = ["clk", "data[31:0]", "addr[15:0]", "ctrl[7:0]", "enable"]
print(f"\n   Bus width extraction:")
for signal in test_signals:
    width = extract_bus_width(signal)
    print(f"     {signal:12} ‚Üí {width:2d} bits")

# =============================================================================
# STRING ITERATION AND MEMBERSHIP
# =============================================================================

print(f"\nüîÑ STRING ITERATION AND MEMBERSHIP:")

# Character iteration
module_name = "CPU_CORE"
print(f"   Module name: '{module_name}'")
print("   Character iteration:", end=" ")
for char in module_name:
    print(f"'{char}'", end=" ")
print()

# Membership testing
test_chars = ['C', 'P', 'U', '_', 'x', '1']
print(f"   Membership testing in '{module_name}':")
for char in test_chars:
    result = char in module_name
    print(f"     '{char}' in module_name: {result}")

# Substring membership
test_substrings = ["CPU", "CORE", "ALU", "_", "cpu"]
print(f"\n   Substring testing in '{module_name}':")
for substring in test_substrings:
    result = substring in module_name
    print(f"     '{substring}' in module_name: {result}")

# Practical example: Check file extensions
design_files = [
    "cpu_core.v", "cpu_core.sdc", "cpu_core.lib",
    "memory.db", "io_ring.lef", "timing.rpt"
]

valid_extensions = ['.v', '.sv', '.sdc', '.lib', '.db', '.lef']
print(f"\n   File extension validation:")
for filename in design_files:
    is_valid = any(filename.endswith(ext) for ext in valid_extensions)
    status = "‚úÖ" if is_valid else "‚ùå"
    print(f"     {filename:15} {status}")

## üõ†Ô∏è **Comprehensive String Methods for VLSI Automation**

Python strings provide 25+ essential methods for VLSI automation. Master these for professional development:

### **Search and Find Methods**
- `.find(sub)` / `.rfind(sub)`: Find substring position (returns -1 if not found)
- `.index(sub)` / `.rindex(sub)`: Find substring position (raises exception if not found)  
- `.count(sub)`: Count non-overlapping occurrences
- `.startswith(prefix)` / `.endswith(suffix)`: Check string boundaries

### **Validation and Testing Methods**
- `.isdigit()`, `.isalpha()`, `.isalnum()`: Character type validation
- `.islower()`, `.isupper()`, `.istitle()`: Case checking
- `.isspace()`, `.isdecimal()`, `.isnumeric()`: Specialized validation

### **Case Conversion Methods**
- `.upper()`, `.lower()`: Convert case
- `.title()`, `.capitalize()`: Title and sentence case
- `.swapcase()`: Invert case of all characters

### **Cleaning and Trimming Methods**  
- `.strip()`, `.lstrip()`, `.rstrip()`: Remove whitespace/characters
- `.replace(old, new)`: Replace substrings
- `.translate()`: Character-level translation

### **Splitting and Joining Methods**
- `.split(sep)`, `.rsplit(sep)`: Split string into list
- `.partition(sep)`, `.rpartition(sep)`: Split into 3-tuple
- `.join(iterable)`: Join sequence into string
- `.splitlines()`: Split on line boundaries

### **Formatting and Alignment Methods**
- `.format()`: Template-based formatting
- `.center(width)`, `.ljust(width)`, `.rjust(width)`: Text alignment  
- `.zfill(width)`: Zero-pad numbers
- `.expandtabs()`: Convert tabs to spaces

**üí° Pro Tip**: Each method returns a new string (immutability). Chain operations efficiently: `text.strip().lower().replace('_', '')`

In [None]:
# COMPREHENSIVE STRING METHODS FOR VLSI AUTOMATION
# =================================================
# Master all essential string methods with VLSI examples

print("üõ†Ô∏è COMPREHENSIVE STRING METHODS FOR VLSI AUTOMATION")
print("=" * 55)

# =============================================================================
# SEARCH AND FIND METHODS
# =============================================================================

print("\nüîç SEARCH AND FIND METHODS:")

# Sample timing violation log line
log_line = "ERROR: Setup violation on path cpu_core/reg_a to cpu_core/reg_b, slack = -0.123ns at corner ss_0p72v_125c"

print(f"Log line: {log_line}")
print(f"Length: {len(log_line)} characters")

# .find() vs .index() methods
print(f"\n   find() vs index() comparison:")
print(f"   log_line.find('ERROR'): {log_line.find('ERROR')}")           # Returns position
print(f"   log_line.find('WARNING'): {log_line.find('WARNING')}")       # Returns -1 if not found
print(f"   log_line.rfind('_'): {log_line.rfind('_')}")                 # Find from right

try:
    print(f"   log_line.index('ERROR'): {log_line.index('ERROR')}")      # Returns position
    print(f"   log_line.index('WARNING'): ", end="")
    print(log_line.index('WARNING'))  # This will raise ValueError
except ValueError as e:
    print(f"ValueError - {e}")

# .count() method
print(f"\n   count() method examples:")
print(f"   Underscores in path: log_line.count('_') = {log_line.count('_')}")
print(f"   Letter 'r' count: log_line.count('r') = {log_line.count('r')}")
print(f"   'reg' occurrences: log_line.count('reg') = {log_line.count('reg')}")

# .startswith() and .endswith() methods
test_files = [
    "cpu_core.v", "memory_ctrl.sv", "io_ring.sdc",
    "tech_lib.lib", "timing_report.txt", "layout.def"
]

print(f"\n   File type classification:")
verilog_extensions = ('.v', '.sv')
constraint_extensions = ('.sdc', '.upf')
library_extensions = ('.lib', '.db')

for filename in test_files:
    if filename.endswith(verilog_extensions):
        file_type = "Verilog"
    elif filename.endswith(constraint_extensions):
        file_type = "Constraints"
    elif filename.endswith(library_extensions):
        file_type = "Library"
    else:
        file_type = "Other"
    print(f"     {filename:18} ‚Üí {file_type}")

# =============================================================================
# VALIDATION AND TESTING METHODS
# =============================================================================

print(f"\n‚úÖ VALIDATION AND TESTING METHODS:")

# Signal name validation examples
signal_names = [
    "clk", "reset_n", "data_in_32", "123invalid",
    "VALID_SIGNAL", "signal-with-dash", "signal_with_space ", "signal123"
]

print(f"   Signal name validation:")
print(f"   {'Signal Name':<20} {'Alpha':<6} {'Alnum':<6} {'Digit':<6} {'Lower':<6} {'Upper':<6}")
print(f"   {'-'*20:<20} {'-'*5:<6} {'-'*5:<6} {'-'*5:<6} {'-'*5:<6} {'-'*5:<6}")

for signal in signal_names:
    clean_signal = signal.strip()  # Remove trailing spaces
    # Remove bus notation for testing
    base_signal = clean_signal.split('[')[0].replace('_', 'a')  # Replace _ for alpha test

    print(f"   {signal:<20} {base_signal.isalpha()!s:<6} {clean_signal.replace('_','').isalnum()!s:<6} "
          f"{clean_signal.isdigit()!s:<6} {clean_signal.islower()!s:<6} {clean_signal.isupper()!s:<6}")

# Numeric validation for timing values
timing_values = ["10.5", "0.123", "-2.45", "1e-9", "invalid", "123", ""]

print(f"\n   Timing value validation:")
for value in timing_values:
    is_decimal = value.replace('.', '').replace('-', '').replace('e', '').replace('+', '').isdigit() if value else False
    is_numeric = value.replace('.', '').replace('-', '').isdigit() if value else False

    try:
        float_val = float(value) if value else None
        is_convertible = True
    except ValueError:
        is_convertible = False
        float_val = None

    print(f"     {value:<10} ‚Üí Numeric: {is_numeric!s:<5} Decimal: {is_decimal!s:<5} Convertible: {is_convertible!s:<5}")

# =============================================================================
# CASE CONVERSION METHODS
# =============================================================================

print(f"\nüî§ CASE CONVERSION METHODS:")

# Module names from different sources
module_names = ["CPU_Core", "alu_unit", "MEMORY_CTRL", "io_Ring", "cache_L1"]

print(f"   Module name standardization:")
print(f"   {'Original':<12} {'lower()':<12} {'upper()':<12} {'title()':<12} {'capitalize()':<12} {'swapcase()':<12}")
print(f"   {'-'*11:<12} {'-'*11:<12} {'-'*11:<12} {'-'*11:<12} {'-'*11:<12} {'-'*11:<12}")

for name in module_names:
    print(f"   {name:<12} {name.lower():<12} {name.upper():<12} {name.title():<12} "
          f"{name.capitalize():<12} {name.swapcase():<12}")

# Corner name formatting
corner_raw = "ss_0p72v_125c"
print(f"\n   Corner name formatting:")
print(f"   Raw corner: '{corner_raw}'")
print(f"   Title case: '{corner_raw.title()}'")
print(f"   Formatted:  '{corner_raw.replace('p', '.').replace('_', ' ').title()}'")

# =============================================================================
# CLEANING AND TRIMMING METHODS
# =============================================================================

print(f"\nüßπ CLEANING AND TRIMMING METHODS:")

# Parse messy signal names from tool outputs
messy_signals = [
    "  signal_name  ", "\tclk\n", "reset_n\r\n",
    "  data[31:0]  \t", "enable   ", "\n\n  addr_bus  \n"
]

print(f"   Signal cleaning:")
for signal in messy_signals:
    original_repr = repr(signal)
    cleaned = signal.strip()
    print(f"     {original_repr:<20} ‚Üí '{cleaned}'")

# Character replacement for file path normalization
file_paths = [
    "cpu-core/alu-unit", "memory ctrl/cache", "io ring\\layout",
    "design@version2", "module#temp.v"
]

print(f"\n   File path normalization:")
for path in file_paths:
    # Replace problematic characters
    normalized = (path.replace('-', '_')
                      .replace(' ', '_')
                      .replace('\\', '/')
                      .replace('@', '_v')
                      .replace('#', '_'))
    print(f"     '{path}' ‚Üí '{normalized}'")

# Advanced cleaning with .translate()
print(f"\n   Advanced character removal with translate():")
signal_with_special = "signal@#$%^&*()name_123"
# Create translation table to remove special characters
import string
remove_chars = "!@#$%^&*()"
translator = str.maketrans('', '', remove_chars)
cleaned_signal = signal_with_special.translate(translator)
print(f"     Original: '{signal_with_special}'")
print(f"     Cleaned:  '{cleaned_signal}'")

# =============================================================================
# SPLITTING AND JOINING METHODS
# =============================================================================

print(f"\n‚úÇÔ∏è SPLITTING AND JOINING METHODS:")

# Hierarchical path processing
hierarchy_path = "cpu_core/alu_unit/adder_inst/sum_output"

print(f"   Hierarchy path: '{hierarchy_path}'")
print(f"   .split('/'):     {hierarchy_path.split('/')}")
print(f"   .rsplit('/', 1): {hierarchy_path.rsplit('/', 1)}")  # Split only once from right

# .partition() vs .split() for simple splits
signal_with_bus = "data_bus[31:0]"
print(f"\n   Signal: '{signal_with_bus}'")
print(f"   .partition('['):  {signal_with_bus.partition('[')}")
print(f"   .rpartition('['): {signal_with_bus.rpartition('[')}")

# Multiple separators using replace + split
mixed_separators = "cpu_core.alu_unit/adder-inst"
print(f"\n   Mixed separators: '{mixed_separators}'")
normalized = mixed_separators.replace('.', '/').replace('-', '_')
parts = normalized.split('/')
print(f"   Normalized and split: {parts}")

# .join() method for path reconstruction
path_parts = ["designs", "cpu_core", "implementation", "route"]
unix_path = "/".join(path_parts)
windows_path = "\\".join(path_parts)
dot_notation = ".".join(path_parts)

print(f"\n   Path reconstruction:")
print(f"   Parts: {path_parts}")
print(f"   Unix:    '{unix_path}'")
print(f"   Windows: '{windows_path}'")
print(f"   Dot:     '{dot_notation}'")

# .splitlines() for multi-line report processing
timing_report_sample = """Path 1: cpu_core/reg_a to cpu_core/reg_b
slack (VIOLATED) -0.123
Path 2: cpu_core/reg_c to cpu_core/reg_d
slack (MET) 0.456"""

print(f"\n   Report line processing:")
lines = timing_report_sample.splitlines()
for i, line in enumerate(lines):
    if line.strip():  # Skip empty lines
        print(f"     Line {i+1}: '{line.strip()}'")

## üìê **String Formatting and Alignment Methods**

Professional VLSI reports require precise formatting and alignment:

### **Template-Based Formatting**
- **f-strings** (Python 3.6+): `f"Design: {name}, Area: {area:.2f}¬µm¬≤"`
- **.format() method**: `"Design: {}, Area: {:.2f}¬µm¬≤".format(name, area)`
- **% formatting**: `"Design: %s, Area: %.2f ¬µm¬≤" % (name, area)`

### **Alignment and Padding Methods**
- `.center(width, fillchar)`: Center text within specified width
- `.ljust(width, fillchar)`: Left-justify text
- `.rjust(width, fillchar)`: Right-justify text  
- `.zfill(width)`: Zero-pad numbers from left

### **Advanced Formatting**
- `.expandtabs(tabsize)`: Convert tabs to spaces
- Format specifications: `{:>10.2f}` (right-align, width 10, 2 decimals)
- Thousands separators: `{:,}` for numbers like `1,234,567`

### **VLSI Report Formatting Applications**
- **Timing Reports**: Align path delays and slack values
- **Area Reports**: Format instance counts and areas
- **Power Reports**: Display power consumption with units
- **Pin Maps**: Create structured signal listings

In [None]:
# STRING FORMATTING AND ALIGNMENT FOR PROFESSIONAL REPORTS
# =========================================================
# Master formatting methods for high-quality VLSI reports

print("üìê STRING FORMATTING AND ALIGNMENT FOR PROFESSIONAL REPORTS")
print("=" * 65)

# =============================================================================
# TEMPLATE-BASED FORMATTING METHODS
# =============================================================================

print("\nüìã TEMPLATE-BASED FORMATTING METHODS:")

# Design data for formatting examples
design_data = {
    'name': 'cpu_core',
    'instances': 125000,
    'area': 1250.45,
    'power': 0.0825,
    'frequency': 1000.0,
    'corners': ['ss_0p72v_125c', 'tt_0p8v_25c', 'ff_0p88v_m40c']
}

# Method 1: f-strings (Python 3.6+) - Recommended
print("   f-string formatting (Python 3.6+):")
area_report_f = f"Design: {design_data['name']}, Instances: {design_data['instances']:,}, Area: {design_data['area']:.2f}¬µm¬≤"
power_report_f = f"Power: {design_data['power']:.4f}W @ {design_data['frequency']:.1f}MHz"
print(f"     {area_report_f}")
print(f"     {power_report_f}")

# Method 2: .format() method - Universal compatibility
print("\n   .format() method:")
area_report_fmt = "Design: {name}, Instances: {instances:,}, Area: {area:.2f}¬µm¬≤".format(**design_data)
power_report_fmt = "Power: {power:.4f}W @ {frequency:.1f}MHz".format(**design_data)
print(f"     {area_report_fmt}")
print(f"     {power_report_fmt}")

# Method 3: % formatting - Legacy but still used
print("\n   % formatting (legacy):")
area_report_pct = "Design: %s, Instances: %s, Area: %.2f¬µm¬≤" % (
    design_data['name'], f"{design_data['instances']:,}", design_data['area']
)
print(f"     {area_report_pct}")

# Advanced f-string formatting
print("\n   Advanced f-string formatting:")
for corner in design_data['corners']:
    # Extract voltage and temperature from corner name
    parts = corner.split('_')
    voltage = parts[1].replace('p', '.')
    temp = parts[2].replace('c', '')

    formatted_corner = f"{corner:>15} ‚Üí Voltage: {voltage:>5}V, Temperature: {temp:>4}¬∞C"
    print(f"     {formatted_corner}")

# =============================================================================
# ALIGNMENT AND PADDING METHODS
# =============================================================================

print(f"\nüìè ALIGNMENT AND PADDING METHODS:")

# Sample timing paths for report formatting
timing_paths = [
    {'path': 'cpu_core/reg_a', 'slack': -0.123, 'delay': 2.45},
    {'path': 'cpu_core/reg_b', 'slack': 0.456, 'delay': 1.23},
    {'path': 'memory_ctrl/reg_c', 'slack': -0.089, 'delay': 3.78},
    {'path': 'io_ring/reg_d', 'slack': 0.234, 'delay': 0.95}
]

# Create aligned timing report
print("   Aligned timing report:")
print("     " + "Path Name".ljust(20) + "Slack".rjust(10) + "Delay".rjust(10))
print("     " + "-" * 20 + "-" * 10 + "-" * 10)

for path_data in timing_paths:
    path_name = path_data['path'].ljust(20)
    slack_str = f"{path_data['slack']:+.3f}".rjust(10)
    delay_str = f"{path_data['delay']:.3f}".rjust(10)
    print(f"     {path_name}{slack_str}{delay_str}")

# .center(), .ljust(), .rjust() demonstrations
print(f"\n   Text alignment examples:")
module_name = "CPU_CORE"
print(f"     Original: '{module_name}'")
print(f"     .center(20, '-'): '{module_name.center(20, '-')}'")
print(f"     .ljust(20, '.'): '{module_name.ljust(20, '.')}'")
print(f"     .rjust(20, '*'): '{module_name.rjust(20, '*')}'")

# .zfill() for instance numbering
print(f"\n   Instance numbering with .zfill():")
instance_numbers = [1, 25, 456, 7890, 12345]
for num in instance_numbers:
    padded = str(num).zfill(6)
    instance_name = f"inst_{padded}"
    print(f"     {num:5d} ‚Üí '{instance_name}'")

# =============================================================================
# ADVANCED FORMATTING SPECIFICATIONS
# =============================================================================

print(f"\nüéØ ADVANCED FORMATTING SPECIFICATIONS:")

# Format specification syntax: {[field_name]:[format_spec]}
# format_spec: [[fill]align][sign][#][0][width][,][.precision][type]

metrics_data = [
    {'metric': 'Area', 'value': 1250.456789, 'unit': '¬µm¬≤'},
    {'metric': 'Power', 'value': 0.082567, 'unit': 'W'},
    {'metric': 'Frequency', 'value': 1000.0, 'unit': 'MHz'},
    {'metric': 'Instances', 'value': 125000, 'unit': 'count'}
]

print("   Advanced format specifications:")
print("     " + "Metric".ljust(12) + "Value".rjust(15) + "Unit".rjust(8))
print("     " + "-" * 12 + "-" * 15 + "-" * 8)

for data in metrics_data:
    metric = data['metric'].ljust(12)

    # Different format specifications based on data type
    if data['metric'] == 'Instances':
        value = f"{data['value']:>12,}"  # Thousands separator
    elif data['metric'] == 'Frequency':
        value = f"{data['value']:>12.1f}"  # 1 decimal place
    else:
        value = f"{data['value']:>12.6f}"  # 6 decimal places

    unit = data['unit'].rjust(8)
    print(f"     {metric}{value}{unit}")

# Binary and hex formatting for register values
print(f"\n   Number base formatting:")
register_values = [255, 1024, 65535, 4095]
print("     " + "Decimal".rjust(8) + "Binary".rjust(20) + "Hex".rjust(8))
print("     " + "-" * 8 + "-" * 20 + "-" * 8)

for value in register_values:
    decimal = f"{value:>8d}"
    binary = f"{value:>20b}"
    hexadecimal = f"{value:>8X}"
    print(f"     {decimal}{binary}{hexadecimal}")

# =============================================================================
# MULTI-LINE STRING FORMATTING
# =============================================================================

print(f"\nüìÑ MULTI-LINE STRING FORMATTING:")

# Create a formatted SDC constraint file
sdc_template = """
# SDC Constraints for {design_name}
# Generated automatically for {technology} technology

# Clock Definition
create_clock -period {period:.2f} -name {clock_name} [get_ports {clock_port}]
set_clock_uncertainty {uncertainty:.3f} [get_clocks {clock_name}]

# Input/Output Delays
set_input_delay {input_delay:.2f} -clock {clock_name} [all_inputs]
set_output_delay {output_delay:.2f} -clock {clock_name} [all_outputs]

# Operating Conditions
set_operating_conditions {corner}
set_voltage {voltage:.2f} [get_ports VDD]
set_temperature {temperature}

# Load and Drive Strength
set_load {load_cap:.3f} [all_outputs]
set_driving_cell -lib_cell {drive_cell} [all_inputs]
""".strip()

# Format the template with actual values
sdc_content = sdc_template.format(
    design_name="cpu_core",
    technology="TSMC28",
    period=10.0,
    clock_name="clk",
    clock_port="clk",
    uncertainty=0.1,
    input_delay=2.0,
    output_delay=1.5,
    corner="ss_0p72v_125c",
    voltage=0.72,
    temperature=125,
    load_cap=0.01,
    drive_cell="BUFX4"
)

print("   Generated SDC file:")
print("   " + "\n   ".join(sdc_content.split('\n')[:10]))  # Show first 10 lines
print("   ... (truncated)")

# Table formatting for pin mapping
print(f"\n   Pin mapping table:")
pin_data = [
    ('clk', 'input', 1, 'A1'),
    ('reset_n', 'input', 1, 'A2'),
    ('data_in', 'input', 32, 'B1-B32'),
    ('data_out', 'output', 32, 'C1-C32'),
    ('valid', 'output', 1, 'D1')
]

# Create formatted table
header = f"{'Pin Name':<12} {'Direction':<10} {'Width':<6} {'Location':<10}"
separator = "-" * len(header)
print(f"     {header}")
print(f"     {separator}")

for pin_name, direction, width, location in pin_data:
    row = f"{pin_name:<12} {direction:<10} {width:<6} {location:<10}"
    print(f"     {row}")

print(f"\nüèÜ STRING FORMATTING BENEFITS:")
print("‚úÖ **Professional Reports**: Clean, aligned output for presentations")
print("‚úÖ **Automated Scripts**: Template-based file generation")
print("‚úÖ **Data Visualization**: Structured display of metrics and results")
print("‚úÖ **Documentation**: Consistent formatting across all outputs")
print("‚úÖ **Debugging**: Clear formatting aids in troubleshooting")
print("‚úÖ **Standards Compliance**: Meet industry formatting requirements")

## üöÄ **Advanced String Operations and Performance**

Professional VLSI automation demands sophisticated string handling:

### **Regular Expressions for Complex Parsing**
- **Pattern Matching**: Extract data from timing reports, netlists, and logs
- **Validation**: Verify signal names, file formats, and constraint syntax
- **Substitution**: Bulk text replacements in large design files

### **Performance Optimization Techniques**
- **Efficient Concatenation**: Use `.join()` instead of `+=` for large datasets
- **Memory Management**: Stream processing for multi-gigabyte files
- **Caching**: Store frequently used string operations
- **Generator Functions**: Process large files without loading into memory

### **Professional Error Handling**
- **Input Validation**: Sanitize user inputs and file paths
- **Encoding Handling**: Support UTF-8 for international teams
- **Fallback Strategies**: Graceful handling of malformed data
- **Security**: Prevent injection attacks in generated scripts

### **Real-World Applications**
- **Tool Integration**: Parse outputs from 20+ EDA tools
- **Report Generation**: Create professional documentation
- **Script Automation**: Generate TCL/Python/Shell scripts
- **Database Processing**: Handle millions of design objects efficiently

In [None]:
# ADVANCED STRING OPERATIONS AND PROFESSIONAL TECHNIQUES
# =======================================================
# Pattern matching, performance optimization, and production-ready code

print("üöÄ ADVANCED STRING OPERATIONS AND PROFESSIONAL TECHNIQUES")
print("=" * 60)

# =============================================================================
# REGULAR EXPRESSIONS FOR COMPLEX PARSING
# =============================================================================

print("\nüéØ REGULAR EXPRESSIONS FOR COMPLEX PARSING:")

import re
import time
from typing import Dict, List, Optional, Tuple

# Complex timing report parsing with regex
timing_report_complex = """
Information: Timing report generated at Thu Sep 05 14:30:25 2025
Design: cpu_core
Technology: tsmc28nm_hpc
Corner: ss_0p72v_125c

Startpoint: cpu_core/alu_unit/reg_a (rising edge-triggered flip-flop clocked by clk)
Endpoint: cpu_core/mem_unit/reg_b (rising edge-triggered flip-flop clocked by clk)
Path Group: clk
Path Type: max

  Point                                    Incr       Path
  --------------------------------------------------------
  clock clk (rise edge)                    0.00       0.00
  clock network delay (ideal)              0.15       0.15
  cpu_core/alu_unit/reg_a/CK (DFFX1)       0.00       0.15 r
  cpu_core/alu_unit/reg_a/Q (DFFX1)        0.45       0.60 f
  cpu_core/alu_unit/add_inst/sum[0] (ADD32) 0.85     1.45 f
  cpu_core/mem_unit/reg_b/D (DFFX1)        0.05       1.50 f
  data arrival time                                   1.50

  clock clk (rise edge)                   10.00      10.00
  clock network delay (ideal)              0.15      10.15
  cpu_core/mem_unit/reg_b/CK (DFFX1)       0.00      10.15 r
  library setup time                      -0.25       9.90
  data required time                                  9.90
  --------------------------------------------------------
  data required time                                  9.90
  data arrival time                                  -1.50
  --------------------------------------------------------
  slack (VIOLATED)                                    -0.60
"""

class TimingReportParser:
    """Professional timing report parser using regex"""

    def __init__(self):
        # Compile regex patterns for efficiency
        self.patterns = {
            'header_info': re.compile(r'^(\w+):\s+(.+)$', re.MULTILINE),
            'startpoint': re.compile(r'Startpoint:\s+([^\s]+)'),
            'endpoint': re.compile(r'Endpoint:\s+([^\s]+)'),
            'slack': re.compile(r'slack\s+\(([^)]+)\)\s+([-+]?\d*\.?\d+)'),
            'path_point': re.compile(r'^\s*([^()]+)\s+\(([^)]+)\)\s+([-+]?\d*\.?\d+)\s+([-+]?\d*\.?\d+)\s*([rf]?)$', re.MULTILINE),
            'clock_period': re.compile(r'clock\s+(\w+)\s+\(rise edge\)\s+([-+]?\d*\.?\d+)'),
            'library_cell': re.compile(r'/([A-Z0-9_]+)\)'),
        }

    def parse_report(self, report_text: str) -> Dict:
        """Parse timing report and extract all relevant information"""
        result = {
            'header': {},
            'startpoint': None,
            'endpoint': None,
            'slack_status': None,
            'slack_value': None,
            'path_points': [],
            'clock_period': None,
            'violations': []
        }

        # Parse header information
        for match in self.patterns['header_info'].finditer(report_text):
            key = match.group(1).lower()
            value = match.group(2).strip()
            result['header'][key] = value

        # Parse startpoint and endpoint
        start_match = self.patterns['startpoint'].search(report_text)
        if start_match:
            result['startpoint'] = start_match.group(1)

        end_match = self.patterns['endpoint'].search(report_text)
        if end_match:
            result['endpoint'] = end_match.group(1)

        # Parse slack information
        slack_match = self.patterns['slack'].search(report_text)
        if slack_match:
            result['slack_status'] = slack_match.group(1)
            result['slack_value'] = float(slack_match.group(2))

        # Parse clock period
        clock_match = self.patterns['clock_period'].search(report_text)
        if clock_match:
            result['clock_period'] = float(clock_match.group(2))

        # Parse path points (simplified for demonstration)
        path_matches = self.patterns['path_point'].finditer(report_text)
        for match in path_matches:
            point_info = {
                'instance': match.group(1).strip(),
                'cell_type': match.group(2),
                'incremental_delay': float(match.group(3)),
                'path_delay': float(match.group(4)),
                'transition': match.group(5) if match.group(5) else 'n/a'
            }
            result['path_points'].append(point_info)

        return result

# Demonstrate advanced parsing
parser = TimingReportParser()
parsed_data = parser.parse_report(timing_report_complex)

print("   Parsed timing report:")
print(f"     Design: {parsed_data['header'].get('design', 'Unknown')}")
print(f"     Technology: {parsed_data['header'].get('technology', 'Unknown')}")
print(f"     Corner: {parsed_data['header'].get('corner', 'Unknown')}")
print(f"     Startpoint: {parsed_data['startpoint']}")
print(f"     Endpoint: {parsed_data['endpoint']}")
print(f"     Slack: {parsed_data['slack_value']:.3f}ns ({parsed_data['slack_status']})")
print(f"     Path points: {len(parsed_data['path_points'])}")

# Signal name validation with complex regex
print(f"\n   Advanced signal validation:")
signal_validation_pattern = re.compile(r'^[a-zA-Z_][a-zA-Z0-9_]*(?:\[[0-9]+(?::[0-9]+)?\])?$')

test_signals = [
    "clk", "reset_n", "data_bus[31:0]", "addr[15]",
    "123invalid", "valid_signal_name", "signal-invalid", "signal[invalid]"
]

for signal in test_signals:
    is_valid = bool(signal_validation_pattern.match(signal))
    status = "‚úÖ" if is_valid else "‚ùå"
    print(f"     {signal:<20} {status}")

# =============================================================================
# PERFORMANCE OPTIMIZATION TECHNIQUES
# =============================================================================

print(f"\n‚ö° PERFORMANCE OPTIMIZATION TECHNIQUES:")

# Benchmark different concatenation methods
def benchmark_string_operations():
    """Comprehensive string performance benchmarks"""
    results = {}
    test_sizes = [100, 1000, 10000]

    for size in test_sizes:
        test_data = [f"instance_{i:06d}" for i in range(size)]

        # Method 1: += operator (inefficient)
        start_time = time.perf_counter()
        result1 = ""
        for item in test_data:
            result1 += item + "\n"
        time1 = time.perf_counter() - start_time

        # Method 2: join() method (efficient)
        start_time = time.perf_counter()
        result2 = "\n".join(test_data) + "\n"
        time2 = time.perf_counter() - start_time

        # Method 3: List comprehension + join (most efficient)
        start_time = time.perf_counter()
        result3 = "\n".join([item for item in test_data]) + "\n"
        time3 = time.perf_counter() - start_time

        # Method 4: Generator + join (memory efficient)
        start_time = time.perf_counter()
        result4 = "\n".join(item for item in test_data) + "\n"
        time4 = time.perf_counter() - start_time

        results[size] = {
            'plus_equals': time1,
            'join': time2,
            'list_join': time3,
            'generator_join': time4
        }

    return results

performance_results = benchmark_string_operations()

print("   String concatenation performance comparison:")
print("     " + "Size".ljust(8) + "+= Method".rjust(12) + "join()".rjust(12) +
      "list+join".rjust(12) + "gen+join".rjust(12) + "Speedup".rjust(10))
print("     " + "-" * 8 + "-" * 12 + "-" * 12 + "-" * 12 + "-" * 12 + "-" * 10)

for size, times in performance_results.items():
    speedup = times['plus_equals'] / times['join']
    row = (f"{size:<8} {times['plus_equals']:>11.6f} {times['join']:>11.6f} "
           f"{times['list_join']:>11.6f} {times['generator_join']:>11.6f} {speedup:>9.1f}x")
    print(f"     {row}")

# Memory-efficient large file processing
def process_large_design_file_simulation(num_instances: int = 100000):
    """Simulate memory-efficient processing of large design files"""

    def generate_netlist_content():
        """Generator function for memory-efficient processing"""
        yield "// Auto-generated netlist"
        yield f"module large_design();"

        for i in range(num_instances):
            yield f"  NAND2X1 inst_{i:06d} (.A(net_{i}), .B(enable), .Y(out_{i}));"
            if i % 10000 == 0 and i > 0:
                yield f"  // Processed {i:,} instances"

        yield "endmodule"

    # Process without loading entire content into memory
    start_time = time.perf_counter()
    line_count = 0
    instance_count = 0

    for line in generate_netlist_content():
        line_count += 1
        if "NAND2X1" in line:
            instance_count += 1

    process_time = time.perf_counter() - start_time

    return {
        'lines_processed': line_count,
        'instances_found': instance_count,
        'processing_time': process_time,
        'memory_efficient': True
    }

# Demonstrate large file processing
large_file_stats = process_large_design_file_simulation(50000)
print(f"\n   Large design file processing:")
print(f"     Lines processed: {large_file_stats['lines_processed']:,}")
print(f"     Instances found: {large_file_stats['instances_found']:,}")
print(f"     Processing time: {large_file_stats['processing_time']:.3f} seconds")
print(f"     Memory efficient: {large_file_stats['memory_efficient']}")
print(f"     Rate: {large_file_stats['lines_processed']/large_file_stats['processing_time']:,.0f} lines/sec")

# =============================================================================
# PROFESSIONAL ERROR HANDLING AND VALIDATION
# =============================================================================

print(f"\nüõ°Ô∏è PROFESSIONAL ERROR HANDLING AND VALIDATION:")

class VLSIStringProcessor:
    """Production-ready string processor with comprehensive error handling"""

    @staticmethod
    def validate_design_hierarchy(hierarchy_path: str) -> Tuple[bool, str, List[str]]:
        """Validate and parse design hierarchy path"""
        if not isinstance(hierarchy_path, str):
            return False, f"Path must be string, got {type(hierarchy_path).__name__}", []

        if not hierarchy_path.strip():
            return False, "Path cannot be empty", []

        # Clean and validate path
        clean_path = hierarchy_path.strip()

        # Check for invalid characters
        invalid_chars = set(clean_path) - set('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_/')
        if invalid_chars:
            return False, f"Invalid characters found: {invalid_chars}", []

        # Split and validate each level
        levels = clean_path.split('/')
        for level in levels:
            if not level:
                return False, "Empty hierarchy level found", []
            if level[0].isdigit():
                return False, f"Hierarchy level '{level}' cannot start with digit", []
            if not level.replace('_', '').isalnum():
                return False, f"Hierarchy level '{level}' contains invalid characters", []

        return True, "Valid hierarchy path", levels

    @staticmethod
    def safe_float_conversion(value_str: str, default: float = 0.0) -> Tuple[float, bool, str]:
        """Safely convert string to float with error handling"""
        if not isinstance(value_str, str):
            return default, False, f"Input must be string, got {type(value_str).__name__}"

        clean_value = value_str.strip()
        if not clean_value:
            return default, False, "Empty string cannot be converted to float"

        try:
            result = float(clean_value)
            return result, True, "Successful conversion"
        except ValueError:
            # Try to handle common formatting issues
            try:
                # Remove common non-numeric suffixes
                for suffix in ['ns', 'ps', 'ms', 'us', 'MHz', 'GHz', 'V', 'mV']:
                    if clean_value.endswith(suffix):
                        clean_value = clean_value[:-len(suffix)].strip()
                        break

                result = float(clean_value)
                return result, True, f"Converted after removing suffix"
            except ValueError:
                return default, False, f"Cannot convert '{value_str}' to float"

    @staticmethod
    def sanitize_filename(filename: str, max_length: int = 255) -> str:
        """Sanitize filename for cross-platform compatibility"""
        if not isinstance(filename, str):
            raise TypeError(f"Filename must be string, got {type(filename).__name__}")

        # Remove problematic characters
        invalid_chars = '<>:"/\\|?*'
        clean_name = filename
        for char in invalid_chars:
            clean_name = clean_name.replace(char, '_')

        # Remove control characters
        clean_name = ''.join(char for char in clean_name if ord(char) >= 32)

        # Trim whitespace and dots
        clean_name = clean_name.strip('. ')

        # Ensure reasonable length
        if len(clean_name) > max_length:
            name_part, ext = clean_name.rsplit('.', 1) if '.' in clean_name else (clean_name, '')
            available_length = max_length - len(ext) - 1 if ext else max_length
            clean_name = name_part[:available_length] + ('.' + ext if ext else '')

        # Ensure not empty
        if not clean_name:
            clean_name = "untitled"

        return clean_name

# Test professional validation
processor = VLSIStringProcessor()

print("   Hierarchy validation:")
test_hierarchies = [
    "cpu_core/alu_unit/adder_inst",
    "cpu_core//empty_level",
    "123invalid/start_with_digit",
    "valid_path/with_underscores",
    ""
]

for hierarchy in test_hierarchies:
    valid, message, levels = processor.validate_design_hierarchy(hierarchy)
    status = "‚úÖ" if valid else "‚ùå"
    print(f"     {hierarchy:<30} {status} {message}")

print(f"\n   Safe float conversion:")
test_values = ["10.5", "0.123ns", "-2.45V", "invalid", "", "1e-9", "100MHz"]

for value in test_values:
    result, success, message = processor.safe_float_conversion(value)
    status = "‚úÖ" if success else "‚ùå"
    print(f"     {value:<10} ‚Üí {result:8.3f} {status} {message}")

print(f"\n   Filename sanitization:")
test_filenames = [
    "design<>file.v", "path/with/slashes.sdc", "file:with:colons.lib",
    "very_long_filename_that_exceeds_normal_limits_and_should_be_truncated.v",
    "   file_with_spaces   .txt"
]

for filename in test_filenames:
    sanitized = processor.sanitize_filename(filename)
    print(f"     '{filename}' ‚Üí '{sanitized}'")

print(f"\nüèÜ ADVANCED STRING OPERATION BENEFITS:")
print("‚úÖ **Pattern Matching**: Extract complex data from any tool output")
print("‚úÖ **Performance**: Handle multi-gigabyte files efficiently")
print("‚úÖ **Error Handling**: Robust validation prevents script failures")
print("‚úÖ **Memory Efficiency**: Process large datasets without memory issues")
print("‚úÖ **Professional Quality**: Production-ready code with comprehensive testing")
print("‚úÖ **Cross-Platform**: Compatible filename and path handling")

## üí™ **Practice Exercises: String Mastery for VLSI**

Test your string manipulation skills with these hands-on VLSI exercises:

### **üéØ Exercise 1: Signal Name Parser**
Write a function that parses signal names and extracts:
- Base name (e.g., "data_bus" from "data_bus[31:0]")
- Bus width (e.g., 32 from "data_bus[31:0]")  
- MSB and LSB indices
- Whether it's a single bit or bus

**Test Cases:**
```python
parse_signal("clk")           # ‚Üí base="clk", width=1, msb=None, lsb=None
parse_signal("data[31:0]")    # ‚Üí base="data", width=32, msb=31, lsb=0
parse_signal("addr[15:0]")    # ‚Üí base="addr", width=16, msb=15, lsb=0
```

### **üéØ Exercise 2: Timing Report Extractor**
Create a function that extracts timing violations from a report string:
- Find all paths with negative slack
- Extract startpoint, endpoint, and slack value
- Count total violations and worst slack

### **üéØ Exercise 3: File Path Normalizer**
Build a robust file path normalizer that:
- Converts Windows paths to Unix format
- Handles relative paths (../, ./)
- Validates file extensions
- Creates backup filenames with timestamps

### **üéØ Exercise 4: TCL Script Generator**
Write a template-based TCL script generator for:
- Clock definitions with period and uncertainty
- Input/output delay constraints
- Operating conditions setup
- Load and drive cell assignments

### **üéØ Exercise 5: Netlist Instance Counter**
Create a memory-efficient function that:
- Counts instances of each cell type in a large netlist
- Handles streaming input (doesn't load entire file)
- Reports statistics (total instances, unique cell types)
- Identifies the most frequently used cells

### **üéØ Exercise 6: Design Hierarchy Validator**
Implement a comprehensive validator that:
- Checks hierarchy path syntax
- Validates naming conventions
- Detects circular references
- Suggests corrections for invalid names

---

## üèÜ **Chapter Summary: String Mastery Achieved**

Congratulations! You've mastered Python strings for professional VLSI automation:

### **‚úÖ Core Concepts Mastered**
- **String Creation**: 5+ methods including f-strings, raw strings, multi-line
- **Properties**: Immutability, indexing, slicing, iteration, membership
- **Essential Methods**: 25+ string methods for all common operations

### **‚úÖ VLSI Applications Learned**
- **Path Processing**: Hierarchical signal and file path manipulation
- **Report Parsing**: Extract data from timing, power, and area reports
- **Script Generation**: Create TCL, SDC, and configuration files
- **Data Validation**: Robust input checking and sanitization

### **‚úÖ Professional Techniques**
- **Performance**: Efficient concatenation and memory management
- **Error Handling**: Graceful failure recovery and validation
- **Security**: Safe string processing and injection prevention
- **Formatting**: Professional report generation and alignment

### **‚úÖ Advanced Skills**
- **Regular Expressions**: Complex pattern matching and extraction
- **Large File Processing**: Memory-efficient streaming techniques
- **Production Code**: Comprehensive error handling and validation

### **üöÄ Next Steps**
You're now ready to tackle any string processing challenge in VLSI automation. The techniques learned here will save hours of manual work and enable powerful automation scripts. 

**Recommended Practice**: Apply these concepts to your current VLSI projects‚Äîparse actual tool reports, generate real scripts, and build professional automation tools!