## `split()`- turns a string into a list of strings and you can split on whatever you want punctuation wise
* The idea here is that you can take a string, then turn it into a list
* THEN, you can isolate things in the list by indexing on them

In [1]:
text = "apple,banana,grape"
print(text.split(',')[:])
text.split(',')[1]

['apple', 'banana', 'grape']


'banana'

In [2]:
data = "name:John;age:30;city:New York"
fields = data.split(";") # now you have a key:value setup ['name:John', 'age:30', 'city:New York']
structured_data = {item.split(":")[0]: item.split(":")[1] for item in fields} # Make the dictionary
print(structured_data)

#structured_data = {item.split(":")[0]: item.split(":")[1] for item in fields}




{'name': 'John', 'age': '30', 'city': 'New York'}


## `join()`
* This method is used to concatenate elements of an iterable (like a list) into a string.
* It's the opposite of split()

In [3]:
words = ["Python", "is", "awesome"]
sentence = " ".join(words)  # Joining with space
print(sentence)


Python is awesome


In [4]:
row = ["Alice", "Data Engineer", "San Francisco"]
csv_line = ",".join(row)  # Convert list to CSV row
print(csv_line)


Alice,Data Engineer,San Francisco


## Removing Extra Spaces from a String

This cell demonstrates a common string manipulation pattern:

1. **`split()`** breaks the string into a list of words (splitting on whitespace)
2. **`join()`** combines those words back into a single string with a single space between each word

This effectively removes:
- Leading spaces
- Trailing spaces  
- Multiple consecutive spaces between words

### Example Breakdown:


python
text = "  Python  is   powerful!  "
# After split(): ["Python", "is", "powerful!"]
# After join with " ": "Python is powerful!"


In [None]:
text = "  Python  is   powerful!  "
cleaned_text = " ".join(text.split())  # Removes extra spaces
print(cleaned_text)



### An iterable can also be a dictionary

### An iterable can also be a dictionary

**Explanation:**

This cell demonstrates how to use Python's `join()` method with a dictionary to create a formatted string. Here's what's happening:

**Dictionary as an Iterable:**
- When iterating over a dictionary using `.items()`, you get key-value pairs
- You can format each pair as strings (e.g., "key=value")
- The `join()` method concatenates these formatted strings with a separator

**Example:**

python
data = {"name": "Alice", "role": "Data Engineer", "city": "SF"}
result = ",".join(f"{k}={v}" for k, v in data.items())
# Output: "name=Alice,role=Data Engineer,city=SF"


In [None]:

** Key
Points: **
- `.items()
` returns
key - value
pairs
from the dictionary

- Generator
expressions
create
formatted
strings
on - the - fly
- `join()`
takes
any
iterable
of
strings, including
generators
- This
pattern is useful
for logging, CSV formatting, and API parameters


python
data = {"name": "Alice", "role": "Data Engineer", "city": "SF"}



This creates a generator that:
- Iterates through each key-value pair using `.items()`
- Formats each pair as `"key=value"` using an f-string
- Yields strings like: `"name=Alice"`, `"role=Data Engineer"`, `"city=SF"`

**Join Operation:**



The `join()` method:
- Takes all the formatted strings from the generator
- Concatenates them with commas (`,`) as separators
- Returns a single string: `"name=Alice,role=Data Engineer,city=SF"`

**Output:**




**Key Concepts:**
1. **`.items()`** returns key-value pairs from a dictionary
2. **Generator expressions** create iterables on-the-fly without storing them in memory
3. **`join()`** expects an iterable of strings, which our generator expression provides
4. **f-strings** allow embedding variables directly into string literals

This pattern is commonly used in data engineering for:
- Creating log entries
- Formatting API parameters
- Building CSV/TSV output
- Serializing data for transmission


### .join() with a Dictionary – Quick Licks

In [26]:
d = {'a': 1, 'b': 2, 'c': 3}
result = ''.join(d)
print(result)
# Same as: ''.join(d.keys())



abc


#### Join the values (must convert to str first):

In [22]:
''.join(str(v) for v in d.values())


'123'

you can format the dictionary how you like

In [23]:
' '.join(f'{v}:{k}' for k, v in d.items())
# Output: 'a:1 b:2 c:3'


'1:a 2:b 3:c'

In [24]:
text = "  Python  is   powerful!  " # starting off with a string
cleaned_text = " ".join(text.split())  # Removes extra spaces- split into a list of strings, then join them
print(cleaned_text)  # Output: "Python is powerful!"


Python is powerful!


In [25]:
log_entry = "ERROR|2025-03-07|Server Down"
# desired output: "ERROR - 2025-03-07 - Server Down"
formatted_output = " - ".join(log_entry.split("|"))
print(formatted_output)

ERROR - 2025-03-07 - Server Down


## Performance Considerations
### `split()` Performance:
* Native `.split(delimiter)` is optimized and faster than using `re.split()`.
* `split(None)` (default) intelligently splits by any whitespace.
## `join()` Performance:
* `"".join(iterable)` is the *fastest* way to concatenate strings.
* **Always** prefer `"".join(list)` over `+=` inside loops.

## Tricky problem: We're gonna need the `replace()` function
### The Basics of `replace()`
The `replace()` function in Python is like a find-and-swap tool for strings. It lets you take a piece of text and swap out certain words, letters, or symbols for something else—kinda like remixing a track with different beats.
**Syntax**
`string.replace(old, new, count)`
* `old` → What you wanna swap out
* `new` → What you wanna replace it with
* `count` (optional) → How many times you wanna do the swap (default: all)

example:
```
text = "Man, Python is difficult!"
new_text = text.replace("difficult", "🔥")
print(new_text)
Man, Python is 🔥!
```

**Processing API Responses (JSON-like strings)**
This problem is trickier than it seems at first and you have to pay extra attention to the desired output. We

In [12]:
response = '{"name": "Alice", "age": "30", "city": "NY"}'
key_value_pairs = response.replace("{", "").replace("}", "").replace('"', '').split(",")
structured_dict = {_.split(":")[0]: _.split(":")[1] for _ in key_value_pairs}
print(structured_dict)



{'name': ' Alice', ' age': ' 30', ' city': ' NY'}


In [13]:
filepath = "/home/user/documents/report.csv"
# Output: "/home/user/documents"
"/".join(filepath.split("/")[0:4])


'/home/user/documents'

expected output
```
[
    {'level': 'INFO', 'timestamp': '2025-03-07', 'message': 'User login: john_doe'},
    {'level': 'ERROR', 'timestamp': '2025-03-07', 'message': 'Database timeout'},
    {'level': 'WARNING', 'timestamp': '2025-03-07', 'message': 'Low disk space'}
]

```

In [14]:
logs = [
    "[INFO] 2025-03-07 12:00:01 - User login: john_doe",
    "[ERROR] 2025-03-07 12:02:15 - Database timeout",
    "[WARNING] 2025-03-07 12:05:42 - Low disk space"
]

for log in logs:
    # Split once at the first space to separate [LEVEL] from the rest
    level_part, rest = log.split(" ", 1)


    # Clean brackets from level
    level = level_part[1:-1]

    # Now split rest into timestamp and message
    timestamp, message = rest.split(" - ", 1)

    print(f"Level: {level}")
    print(f"Timestamp: {timestamp}")
    print(f"Message: {message}")
    print("---")





Level: INFO
Timestamp: 2025-03-07 12:00:01
Message: User login: john_doe
---
Level: ERROR
Timestamp: 2025-03-07 12:02:15
Message: Database timeout
---
Timestamp: 2025-03-07 12:05:42
Message: Low disk space
---


In [15]:
logs = [
    "[INFO] 2025-03-07 12:00:01 - User login: john_doe",
    "[ERROR] 2025-03-07 12:02:15 - Database timeout",
    "[WARNING] 2025-03-07 12:05:42 - Low disk space"
]

def parse_logs(logs):
    parsed = []
    for log in logs:
        header_message = log.split(" - ", 1)
        header = header_message[0]
        message = header_message[1] if len(header_message) > 1 else ""
        tokens = header.split()
        level = tokens[0].strip("[]") if tokens else ""
        timestamp = tokens[1] if len(tokens) > 1 else ""
        parsed.append({"level": level, "timestamp": timestamp, "message": message})
    return parsed

structured_logs = parse_logs(logs)
print(structured_logs)


