## `split()`- turns a string into a list of strings and you can split on whatever you want punctuation wise
* The idea here is that you can take a string, then turn it into a list
* THEN, you can isolate things in the list by indexing on them

In [6]:
text = "apple,banana,grape"
text.split(',')[0]

'apple'

In [16]:
data = "name:John;age:30;city:New York"
fields = data.split(";") # now you have a key:value setup ['name:John', 'age:30', 'city:New York']
structured_data = {item.split(":")[0]: item.split(":")[1] for item in fields} # Make the dictionary
print(structured_data)

#structured_data = {item.split(":")[0]: item.split(":")[1] for item in fields}




{'name': 'John', 'age': '30', 'city': 'New York'}


## `join()`
* This method is used to concatenate elements of an iterable (like a list) into a string.
* It's the opposite of split()

In [17]:
words = ["Python", "is", "awesome"]
sentence = " ".join(words)  # Joining with space
print(sentence)  # Output: "Python is awesome"


Python is awesome


In [18]:
row = ["Alice", "Data Engineer", "San Francisco"]
csv_line = ",".join(row)  # Convert list to CSV row
print(csv_line)  # Output: "Alice,Data Engineer,San Francisco"


Alice,Data Engineer,San Francisco


text = "  Python  is   powerful!  "
cleaned_text = " ".join(text.split())  # Removes extra spaces
print(cleaned_text)  # Output: "Python is powerful!"
### An iterable can also be a dictionary

In [19]:
data = {"name": "Alice", "role": "Data Engineer", "city": "SF"}
csv_line = ",".join(f"{k}={v}" for k, v in data.items())
print(csv_line)


name=Alice,role=Data Engineer,city=SF


## Combining split() and join() for Data Cleaning
Example: Normalizing Text Data

In [21]:
text = "  Python  is   powerful!  " # starting off with a string
cleaned_text = " ".join(text.split())  # Removes extra spaces- split into a list of strings, then join them
print(cleaned_text)  # Output: "Python is powerful!"


Python is powerful!


In [24]:
log_entry = "ERROR|2025-03-07|Server Down"
# desired output: "ERROR - 2025-03-07 - Server Down"
formatted_output = " - ".join(log_entry.split("|"))
print(formatted_output)

ERROR - 2025-03-07 - Server Down


## Performance Considerations
### `split()` Performance:
* Native `.split(delimiter)` is optimized and faster than using `re.split()`.
* `split(None)` (default) intelligently splits by any whitespace.
## `join()` Performance:
* `"".join(iterable)` is the *fastest* way to concatenate strings.
* **Always** prefer `"".join(list)` over `+=` inside loops.

## Tricky problem: We're gonna need the `replace()` function
### The Basics of `replace()`
The `replace()` function in Python is like a find-and-swap tool for strings. It lets you take a piece of text and swap out certain words, letters, or symbols for something else—kinda like remixing a track with different beats.
**Syntax**
`string.replace(old, new, count)`
* `old` → What you wanna swap out
* `new` → What you wanna replace it with
* `count` (optional) → How many times you wanna do the swap (default: all)

example:
```
text = "Man, Python is difficult!"
new_text = text.replace("difficult", "🔥")
print(new_text)
Man, Python is 🔥!
```

**Processing API Responses (JSON-like strings)**
This problem is trickier than it seems at first and you have to pay extra attention to the desired output. We

In [49]:
response = '{"name": "Alice", "age": "30", "city": "NY"}'
key_value_pairs = response.replace("{", "").replace("}", "").replace('"', '').split(",")
structured_dict = {_.split(":")[0]: _.split(":")[1] for _ in key_value_pairs}
print(structured_dict)



{'name': ' Alice', ' age': ' 30', ' city': ' NY'}


In [55]:
filepath = "/home/user/documents/report.csv"
# Output: "/home/user/documents"
"/".join(filepath.split("/")[0:4])


'/home/user/documents'

expected output
```
[
    {'level': 'INFO', 'timestamp': '2025-03-07', 'message': 'User login: john_doe'},
    {'level': 'ERROR', 'timestamp': '2025-03-07', 'message': 'Database timeout'},
    {'level': 'WARNING', 'timestamp': '2025-03-07', 'message': 'Low disk space'}
]

```

In [69]:
logs = [
    "[INFO] 2025-03-07 12:00:01 - User login: john_doe",
    "[ERROR] 2025-03-07 12:02:15 - Database timeout",
    "[WARNING] 2025-03-07 12:05:42 - Low disk space"
]

for log in logs:
    parts = log.split(" ", 1)
    level, timestamp = parts[0][1:-1], parts[0].split(" ")[1]  # Extract log level & time
    print()





IndexError: list index out of range

In [70]:
logs = [
    "[INFO] 2025-03-07 12:00:01 - User login: john_doe",
    "[ERROR] 2025-03-07 12:02:15 - Database timeout",
    "[WARNING] 2025-03-07 12:05:42 - Low disk space"
]

def parse_logs(logs):
    parsed = []
    for log in logs:
        parts = log.split(" - ", 1)  # Split only at the first occurrence
        level, timestamp = parts[0][1:-1], parts[0].split(" ")[1]  # Extract log level & time
        message = parts[1] if len(parts) > 1 else ""
        parsed.append({"level": level, "timestamp": timestamp, "message": message})
    return parsed

structured_logs = parse_logs(logs)
print(structured_logs)


