# Big Data Analytics - Lab Week 1


Some of the python exercises are adapted from the course:

https://github.com/rajathkumarmp/Python-Lectures where you can find more exercises if you wish to review, and from

https://www.practicepython.org/ a great resource for simple exercises and solutions.
#### Books
If you want a quick revision of Python I highly recommend [this book](https://www.oreilly.com/library/view/learning-python-5th/9781449355722/) or [this](https://www.oreilly.com/library/view/python-in-a/9781098113544/). If you want more advanced revision of python, I recommend [this](https://www.oreilly.com/library/view/fluent-python-2nd/9781492056348/) and if you are looking to brush up on python for data analysis I recommend [this](https://www.oreilly.com/library/view/python-for-data/9781491957653/).

This notebook covers foundational Python skills needed to work with Spark, MLlib, and Databricks workflows.
We will review:
- Variables, operators (arithmetic, relational, bitwise; and the difference between logical and and bitwise &)
- Data types and type conversion
- Built-in functions
- Data structures: lists, tuples, dictionaries, sets, and strings
- Indexing, slicing, and unpacking
- The importance of understanding views vs. copies (a common beginner pitfall)
- Using f-strings for formatting
- Some advanced topics.

Note: Each code cell is paired with a markdown explanation. Some cells are left as "Your code here" for you to test your understanding with minimal interruptions.

> If you haven't used a notebook like this before you can type code into the cells below and then execute it (Shift+Enter or the ▶ sign in the up-right corner). Typically, I will provide some example code for you to run and add exercises for you to apply your understanding and solve problems.

## Basics

### Variables

#### Defining and Using Variables

In Python, a variable is simply a name that refers to a value. In our Friends-themed example, we assign ages to some characters and other attributes. Variables do not need a type declaration; they are created when you assign a value.


In [1]:
# Assign ages (in years) to Friends characters
ross_age = 29
rachel_age = 28
monica_age = 30
joey_age = 31
ross_name = 'Ross'
# You can inspect these variables by evaluating them:
ross_age, rachel_age, monica_age, joey_age

(29, 28, 30, 31)

> In the cell above, we have created four variables—`ross_age`, `rachel_age`, `monica_age`, and `joey_age`—and Python automatically determines their type (here, integers). Notice that simply writing the variable names (or a tuple of them) displays their values in a notebook cell.
> You can use the `type()`function to determine the type assigned to a variable. Try it. 

**Important: Strings are immutable!**

Try running the code below.

In [None]:
ross_age[0] = 'B'

### Operators

#### Arithmetic Operators

Arithmetic operators perform standard mathematical operations. They follow the BODMAS logic. 

| Symbol 	|Task Performed|
|--------|-------|
|+ 	|Addition|
|- 	|Subtraction|
|/ 	|division|
|% 	|mod|
|* 	|multiplication|
|// 	|floor division|
|** 	|to the power o|

#### Relational and Logical Operators

Relational operators compare values and return Boolean results (True or False). Logical operators combine Boolean values. Note that Python’s logical operator `and` is used for Boolean logic, whereas the bitwise operator `&` works on integers at the binary level.

|Symbol| 	Task Performed|
|----|------|
|== |	True, if it is equal|
|!= |	True, if not equal to|
|< |	less than|
|> |	greater than|
|<= |	less than or equal to|
|>= |	greater than or equal to|

 > **Additional Detail:**  
> The logical operator `and` evaluates both sides as Boolean values and returns True only if both are True. In contrast, the bitwise operator `&` operates on the bits of an integer. 

#### Bitwise Operators

Bitwise operators work directly on the binary representations of integers. These are particularly useful in low-level data processing.

### Built-in Functions

Python provides a rich set of built-in functions that simplify data manipulation. In this section, we demonstrate several categories.

#### Type Conversion
You can use `int()` `str()` etc. to cast and convert types. 
Try converting an integer to a variable, and check the type using the `type()`function


#### **Important: Pythons Dynamic Typing**



Consider these three statements. Do they change the printed value of A?
```python
A = ["spam"]
B= A
B[0] = "shrubbery
```


How about these—is A changed now?
```python
A = ["spam"]
B = A[:]
B[0] = "shrubbery"
```

`round( )` function rounds the input value to a specified number of places or to the nearest integer.

Syntax: round(number, digits)


In [19]:
print(round(5.6231)) 

print(round(4.55892, 2))

6
4.56


`isinstance( )` returns True, if the first argument is an instance of that class. Multiple classes can also be checked at once.

Syntax: isinstance(object, type

In [20]:
print( isinstance(1, int))

print(isinstance(1.0,int))

print(isinstance(1.0,(int,float)))

True
False
True


`range( )` function outputs the integers of the specified range. It can also be used to generate a series by specifying the difference between the two numbers within a particular range. The elements are returned in a list (will be discussing in detail later.)

syntax: range(start, stop, step)


In [21]:
a=range(0,10,2)

for n in a:

    print(n)

0
2
4
6
8


### Data Structures

We now cover the main data structures in Python using Friends-related examples.

#### Lists

Lists are mutable sequences. In our theme, a list of Friends characters is defined.


In [6]:
# List of Friends characters
friends = ["Rachel", "Monica", "Phoebe", "Joey", "Chandler", "Ross"]

# Evaluating the list will show its contents
friends

['Rachel', 'Monica', 'Phoebe', 'Joey', 'Chandler', 'Ross']

##### List Built-in Functions

Python provides many built-in functions for working with lists.


In [None]:

# Finding the number of friends
num_friends = len(friends)

# Alphabetically first and last friend
first_friend = min(friends)  
last_friend = max(friends)  

# Sorting
sorted_friends = sorted(friends)

num_friends, first_friend, last_friend, sorted_friends


##### Modifying Lists

Lists are mutable, meaning you can modify them after creation.

In [None]:
# Add Gunther to the group
friends.append("Gunther")

# Insert Janice at the second position
friends.insert(1, "Janice")

# Remove the last friend (Gunther, who got added)
friends.pop()

# Replace Chandler with Mike (modify an existing item)
friends[4] = "Mike"

friends


The `in` and `not in` operators check for membership in sequences such as lists, tuples, and dictionaries. These operators are commonly used for searching within a collection

In [None]:
# List of Friends characters
friends = ["Ross", "Rachel", "Monica", "Chandler", "Joey", "Phoebe"]

# Checking membership
is_ross_in_friends = "Ross" in friends
is_gunther_in_friends = "Gunther" in friends
is_gunther_not_in_friends = "Gunther" not in friends

is_ross_in_friends, is_gunther_in_friends, is_gunther_not_in_friends


#### Tuples

Tuples are immutable sequences. They can serve as records. For example, a tuple can represent a Friends character’s basic info.

> Here, the tuple `rachel_info` represents a record for Rachel. Since tuples are immutable, their contents cannot be changed after creation.

In [7]:
# Tuple representing a Friends character record: (name, age, occupation)
rachel_info = ("Rachel", 28, "Fashion Executive")
# Evaluating the tuple shows its immutable content
rachel_info

('Rachel', 28, 'Fashion Executive')

In [8]:
# Set of unique filming locations
locations = {"Central Perk", "NYU", "Vegas", "London"}
# Evaluating the set shows that duplicates are removed
locations

{'Central Perk', 'London', 'NYU', 'Vegas'}

#### Dictionaries

Dictionaries store key-value pairs. For instance, we can map a character’s name to their catchphrase.

In [9]:
# Dictionary mapping Friends characters to a famous catchphrase
friends_catchphrases = {
    "Chandler": "Could I BE any more sarcastic?",
    "Joey": "How you doin'?",
    "Monica": "I know!"
}
# Getting a catchphrase safely
joey_phrase = friends_catchphrases.get("Joey", "No catchphrase found")


# Adding a new entry
friends_catchphrases["Ross"] = "We were on a break!"

# Removing an entry
friends_catchphrases.pop("Monica")

# Checking all keys and values
all_keys = friends_catchphrases.keys()
all_values = friends_catchphrases.values()

joey_phrase, all_keys, all_values


{'Chandler': 'Could I BE any more sarcastic?',
 'Joey': "How you doin'?",
 'Phoebe': 'Smelly Cat, Smelly Cat...'}

In [None]:
# Nested dictionary with character details
friends_info = {
    "Ross": {"age": 29, "job": "Paleontologist"},
    "Monica": {"age": 30, "job": "Chef"},
    "Chandler": {"age": 31, "job": "Statistical Analyst"}
}

# Accessing nested values
ross_job = friends_info["Ross"]["job"]
chandler_age = friends_info["Chandler"]["age"]

ross_job, chandler_age


#### Sets

Sets are unordered collections of unique elements. For example, we can use a set to list the unique filming locations of Friends episodes.

> The set `locations` contains unique items. Even if "New York" is repeated, it appears only once in the set.

In [10]:
# Set of unique filming locations
locations = {"Central Perk", "NYU","NYU",  "Vegas", "London"}
# Evaluating the set shows that duplicates are removed
locations

{'Central Perk', 'London', 'NYU', 'Vegas'}

### Indexing, Slicing, and Unpacking

#### Indexing and Slicing

Indexing accesses a single element; slicing extracts a subsequence. 

> 
> We use square brackets `[]` for indexing and slicing. Negative indices start from the end of the sequence.

In Python, we can also index backward, from the end—positive indexes count from the left, and negative indexes count back from the right. 

In [None]:
friends[len(friends)-1]

> Notice that we can use an arbitrary expression in the square brackets, not just a hard- coded number literal—anywhere that Python expects a value, we can use a literal, a variable, or any expression we wish. Python’s syntax is completely general this way.

In [11]:
# List indexing and slicing with our friends list
first_friend = friends[0]        # "Rachel"
last_friend = friends[-1]        # "Ross"
middle_friends = friends[1:4]      # ["Monica", "Phoebe", "Joey"]

first_friend, last_friend, middle_friends

('Rachel', 'Ross', ['Monica', 'Phoebe', 'Joey'])

Probably the easiest way to think of slices is that they are a way to extract an entire column from a string in a single step. Their general form, X[I:J], means “give me ev- erything in X from offset I up to but not including offset J.” The result is returned in a new object.

##### Unpacking

Unpacking assigns elements of a sequence to variables. This helps avoid using indices.


In [12]:
# Unpack the friends list into individual variables
rachel, monica, phoebe, joey, chandler, ross = friends
# Evaluating the variables (each will display its value)
rachel, monica, phoebe, joey, chandler, ross

('Rachel', 'Monica', 'Phoebe', 'Joey', 'Chandler', 'Ross')

##### Extended Unpacking

The * operator lets you capture multiple values in a single variable.
> Here, `first` gets the first element, `last` gets the last element, and `middle` captures the remaining elements as a list.

In [13]:
first, *middle, last = friends
first, middle, last

('Rachel', ['Monica', 'Phoebe', 'Joey', 'Chandler'], 'Ross')

### Understanding Views vs. Copies

One common pitfall is misunderstanding when a slice returns a new copy versus a view. With lists, slicing creates a copy.

In [14]:
# Create a list of Friends ages
ages = [ross_age, rachel_age, monica_age, joey_age]
ages_copy = ages[:]  # This creates a new copy

# Modify the copy by appending a new age (e.g., Chandler's age)
ages_copy.append(32)

# Evaluating both lists shows that the original remains unchanged
ages, ages_copy

([29, 28, 30, 31], [29, 28, 30, 31, 32])

### Strings

Strings are immutable sequences of characters. They can be sliced, indexed, and formatted.
By and large, strings are fairly easy to use in Python. Perhaps the most complicated thing about them is that there are so many ways to write them in your code:
- Single quotes: 'spa"m'
- Double quotes: "spa'm"
- Triple quotes: '''... spam ...''', """... spam ..."""
- Escape sequences: "s\tp\na\0m"
- Raw strings: r"C:\new\test.spm" $\rightarrow$ this can come in handy when defining file paths



In [15]:
# A Friends-themed string
theme_song = "I'll be there for you"
# Evaluating a slice: the first 10 characters
theme_slice = theme_song[:10]

theme_song, theme_slice

("I'll be there for you", "I'll be th")

#### Some Basic String Stuff

In [None]:
tv_guide = 'chanandler' + 'bong'
print(len(tv_guide))
pivot = 'Pivot!' * 10
pivot

#### Extended Slicing
Python allows slicing with three parameters: `[start:stop:step]`.

In [None]:
# Reverse a Friends quote
quote = "We were on a break!"
reversed_quote = quote[::-1]  # Reverse the string

# Extract every second letter
every_second_letter = quote[::2]  

reversed_quote, every_second_letter


#### String Methods
Python also supports a number of built in string methods. 

In [None]:
phrase = "Friends is the best show ever!"

# Convert to uppercase
upper_phrase = phrase.upper()

# Replace words
replaced_phrase = phrase.replace("best", "worst")

# Check if phrase starts with a word
starts_with_friends = phrase.startswith("Friends")

upper_phrase, replaced_phrase, starts_with_friends


> Again, despite the names of these string methods, we are not changing the original strings here, but creating new strings as the results—because strings are immutable, this is the only way this can work. 



#### F String Formatting

f-strings are a powerful way to embed expressions in string literals. They make formatting clear and concise.

In [18]:
# Create an f-string using Friends data
character = "Joey"
catchphrase = friends_catchphrases.get(character, "No catchphrase")
message = f"{character} is known for saying: {catchphrase}"

# Evaluating the f-string shows the formatted message
message

"Joey is known for saying: How you doin'?"

### Loops
- Recap: If Statements, For Loops and While Loops.

In [None]:
# Checking if Ross is older than Rachel
if friends_info["Ross"]["age"] > friends_info["Monica"]["age"]:
    outcome = "Ross is older than Monica."
else:
    outcome = "Ross is not older than Monica."

outcome


In [None]:
# Looping through friends
friend_ages = {"Ross": 29, "Monica": 30, "Chandler": 31}

for friend, age in friend_ages.items():
    sentence = f"{friend} is {age} years old."
    print(sentence)


In [None]:
# Counting down from 5
counter = 5
while counter > 0:
    print(f"Counting down: {counter}")
    counter -= 1


##### Break, Continue, and Pass (Explained with Examples)

Each of these keywords changes how a loop executes:

- **break**: Immediately stops the loop.
- **continue**: Skips the current iteration and moves to the next.
- **pass**: Does nothing, acting as a placeholder.

###### Break

In [None]:
# Searching for Chandler in the Friends group
friends = ["Ross", "Rachel", "Monica", "Chandler", "Joey", "Phoebe"]

for friend in friends:
    if friend == "Chandler":
        break  # Stops when Chandler is found
    print(f"{friend} is not Chandler.")  


###### Continue

In [None]:
# The Friends characters are ordering coffee, except Joey, who already has one
for friend in friends:
    if friend == "Joey":
        continue  # Skip Joey
    print(f"{friend} orders coffee.")  


##### Pass

In [None]:
# Placeholder function for a future feature
def friend_introduction(friend):
    if friend == "Janice":
        pass  # Placeholder for Janice's introduction
    else:
        print(f"{friend} says hello!")

friend_introduction("Ross")  
friend_introduction("Janice")  


#### Nested Loops
Nested loops occur when one loop runs inside another.

In [None]:
# Friends meeting locations
locations = ["Central Perk", "Monica's Apartment"]
people = ["Ross", "Rachel", "Joey"]

# Nested loop: Each person visits each location
for location in locations:
    for person in people:
        print(f"{person} is at {location}.")


## Some Advanced Topics
### Understanding Iterables, Iteration, and List Comprehensions  

Before we dive into code, let’s first understand the fundamental concepts of iterables, iteration, and list comprehensions—key building blocks for working with data efficiently in Python.

#### What is an Iterable?  
An iterable is any Python object that can return its elements one at a time. Lists, tuples, dictionaries, sets, and even strings are iterables because they contain multiple elements that can be accessed sequentially. Think of an iterable like a script of a Friends episode—each line is stored in order, and you can go through it one by one.  

#### What is Iteration?  
Iteration is the process of going through an iterable, accessing each element in turn. This is done using loops or special functions that retrieve elements automatically. For example, when we loop through a list of Friends characters and print each name, we are iterating over that list.  

Iteration is crucial in programming because it allows us to process data dynamically—whether that means counting the number of lines Joey has in Season 2, finding the most common words in dialogues, or filtering episodes with high IMDb ratings.  

#### What is an Iterator?  
An iterator is a special type of iterable that remembers its position while iterating. It follows the iterator protocol, meaning it implements two methods:  
- `__iter__()` → Returns the iterator itself.  
- `__next__()` → Returns the next item in the sequence.  

Unlike regular iterables, iterators don’t restart when exhausted—they must be recreated. You can think of this like watching an episode of Friends: if you pause and resume later, you continue from where you left off.  

#### What is a List Comprehension?  
A list comprehension is a concise and readable way to create lists by applying an operation to each element in an iterable. Instead of writing a loop to filter or transform data, we can express it in a single line.  

For example, instead of writing multiple lines to collect all dialogue lines spoken by Ross, we can use a list comprehension to do it in one compact expression. This makes code more readable, efficient, and Pythonic.  

#### Why Use List Comprehensions?  
List comprehensions have several advantages:  
- Concise: Reduces the number of lines needed for simple operations.  
- Efficient: Runs faster than using explicit loops in many cases.  
- Readable: Expresses filtering and transformations in an intuitive way.  

For example, if we wanted to get all episodes with more than 10 million US viewers, a loop would take multiple lines, while a list comprehension could do it in just one.  



### Data
We have two CSV files:
- **friends.csv**: Contains dialogue data with columns such as `text`, `speaker`, `season`, `episode`, etc.
- **friends_info.csv**: Contains episode metadata with columns such as `season`, `episode`, `title`, `directed_by`, `written_by`, `air_date`, `us_views_millions`, and `imdb_rating`.

We’ll use Pandas just to load the data and then convert them into plain Python data structures.

In [None]:
import pandas as pd

# Load the CSV files using Pandas
friends_df = pd.read_csv(r"datasets/friends.csv")
friends_info_df = pd.read_csv(r"datasets/friends_info.csv")

# Convert to lists of dictionaries for plain Python processing
friends_data = friends_df.to_dict(orient="records")
friends_info_data = friends_info_df.to_dict(orient="records")

### Step 2. Basic Iteration

Let’s start by iterating over the list of dialogue records (from friends_data) to, for example, collect all lines spoken by **Rachel**.

In [None]:
# Basic iteration: Extract dialogue lines by Rachel
rachel_dialogues = []  # We'll collect these in a list
for record in friends_data:
    if record["speaker"] == "Rachel":
        # Each record is a dictionary; we add the dialogue text to our list.
        rachel_dialogues.append(record["text"])

# Show a few of Rachel's dialogue lines (first 3 for brevity)
rachel_dialogues[:3]

### Step 3. Basic List Comprehensions

Now we can rewrite the above loop as a list comprehension. This single line builds a list by filtering and mapping at once.

In [None]:
# Using a list comprehension to extract Rachel's dialogues in one line
rachel_dialogues_comp = [record["text"] 
                         for record in friends_data 
                         if record["speaker"] == "Rachel"]

rachel_dialogues_comp[:3]

> This list comprehension iterates over every record in `friends_data`, includes only those where `speaker` is `"Rachel"`, and extracts the `text` field. The result is functionally identical to the previous loop but in a more compact form.

### Step 4. Extended Examples: Filtering with Multiple Conditions

Let’s extend our example to work with both datasets. Suppose we want to extract all dialogue lines from episodes with a high IMDb rating (above 8.0). To do this efficiently, we first build a lookup dictionary for episode metadata keyed by `(season, episode)`

In [None]:
# Create a lookup dictionary from friends_info_data keyed by (season, episode)
info_lookup = {(record["season"], record["episode"]): record 
               for record in friends_info_data}

# Now, iterate over friends_data and select dialogues from episodes with IMDb rating > 8.0
high_rating_dialogues = []
for record in friends_data:
    key = (record["season"], record["episode"])
    # Get the episode info if available
    episode_info = info_lookup.get(key)
    if episode_info and episode_info["imdb_rating"] > 8.0:
        # Append a tuple: (episode title, speaker, dialogue)
        high_rating_dialogues.append((episode_info["title"], record["speaker"], record["text"]))

high_rating_dialogues[:3]

### Step 5. Extended List Comprehensions

We can achieve the same result as above using a nested list comprehension. This version is more concise but slightly denser.


In [None]:
# List comprehension version: Extract (title, speaker, dialogue) for high-rated episodes
high_rating_dialogues_comp = [
    (info_lookup[(record["season"], record["episode"])]["title"], 
     record["speaker"], 
     record["text"])
    for record in friends_data
    if (record["season"], record["episode"]) in info_lookup 
       and info_lookup[(record["season"], record["episode"])]["imdb_rating"] > 8.0
]

high_rating_dialogues_comp[:3]

> - This comprehension does the same filtering as before: it checks if the episode exists in `info_lookup` and whether its IMDb rating exceeds 8.0.
> - For each valid record, it builds a tuple with the episode title, speaker, and dialogue.

### Step 6. Advanced Examples: Nested and Conditional Comprehensions

Now let’s see an advanced example that uses multiple for‑clauses and conditional expressions. Suppose we want to build a summary of each episode in season 3 where we list all unique speakers and the number of lines they spoke.

#### Step 6.1: Build a Dictionary of Episodes to Speaker Counts

First, we create a dictionary that maps each episode (season, episode) to a sub‑dictionary of speaker counts.

In [None]:
# Advanced: Build a nested dictionary for season 3 episodes
episode_speaker_counts = {}

for record in friends_data:
    if record["season"] == 3:
        key = (record["season"], record["episode"])
        speaker = record["speaker"]
        # Initialize the nested dictionary if needed
        if key not in episode_speaker_counts:
            episode_speaker_counts[key] = {}
        # Count the dialogue line for the speaker
        episode_speaker_counts[key][speaker] = episode_speaker_counts[key].get(speaker, 0) + 1

episode_speaker_counts

> We iterate over all dialogue records from season 3.
> For each record, we use the tuple `(season, episode)` as a key.
> We then count how many lines each speaker delivers in that episode.


#### Step 6.2: Use a List Comprehension to Summarize the Data

Now, we can use a list comprehension to produce a summary list. Each element will be a tuple containing the episode (season, episode), a sorted list of speakers, and their dialogue counts.

In [None]:
# Advanced list comprehension: Summarize season 3 episodes with speaker counts
episode_summary = [
    ((season, episode), 
     sorted(speaker_counts.items(), key=lambda x: x[1], reverse=True))
    for (season, episode), speaker_counts in episode_speaker_counts.items()
]

episode_summary[:3]

> This comprehension iterates over each episode key and its corresponding speaker counts.
> For each episode, it creates a tuple: the episode identifier and a sorted list (by number of lines, descending) of speaker/count pairs.
> This gives a clear summary of which Friends character dominated each episode in season 3.

### Step 7. Iteration with Enumerate and Zip

Lastly, let’s illustrate using built-in functions like `enumerate()` and `zip()` to process our lists further. For instance, we can label the first 5 dialogue lines from a specific episode and then pair them with the speaker names.

In [None]:
# Get dialogue lines from season 1, episode 1
episode_1_dialogues = [record for record in friends_data 
                       if record["season"] == 1 and record["episode"] == 1]

# Use enumerate to number each dialogue line (starting from 1)
numbered_dialogues = [(i, record["speaker"], record["text"]) 
                      for i, record in enumerate(episode_1_dialogues, start=1)]
numbered_dialogues[:5]

# Suppose we extract two separate lists: one for speakers and one for dialogues
speakers = [record["speaker"] for record in episode_1_dialogues][:5]
dialogues = [record["text"] for record in episode_1_dialogues][:5]

# Use zip() to combine the speakers and dialogues into a formatted string
paired_lines = [f"{speaker}: {dialogue}" for speaker, dialogue in zip(speakers, dialogues)]
paired_lines

> The first block uses a list comprehension with an if‑condition to filter dialogue lines for season 1, episode 1.
> Then, `enumerate()` is used to assign a line number to each dialogue.
> Finally, we use `zip()` to pair the first five speakers with their dialogues, demonstrating how you can combine two lists element‑by‑element.

# Exercises!

Below are two exercises that challenge you to combine everything we’ve learned—from iteration and list methods to comprehensions, dictionary manipulations, string methods, type conversion, f‑strings, and careful handling of copies versus views. Use the two Friends datasets (friends.csv and friends_info.csv) that were loaded earlier (and converted into lists of dictionaries) to complete the following tasks.



### Exercise 1: Episode Dialogue Summary

**Task:**  
Using the `friends_data` (dialogues) and `friends_info_data` (episode metadata), create a summary for each episode in Season 3. For each episode, compute:
- **Episode Title:** (from friends_info_data)
- **Total Number of Dialogue Lines:** Count how many dialogue records occur in that episode.
- **Average Dialogue Length:** Calculate the average number of characters per dialogue line (convert numbers to integers as needed).
- **Unique Speakers:** A sorted (alphabetically) list of all speakers in that episode.

Your final result should be a list of dictionaries (one dictionary per episode), with keys: `"season"`, `"episode"`, `"title"`, `"dialogue_count"`, `"avg_length"`, and `"unique_speakers"`.

*Hint:*  
- First, build a lookup (dictionary) keyed by `(season, episode)` for the episode metadata.  
- Use iteration and list comprehensions to filter and process dialogue records from `friends_data`.  
- Use list methods and string methods where necessary.


In [None]:

# Your solution here:
# 1. Create a lookup dictionary for episodes in Season 3 using friends_info_data.
# 2. Iterate over friends_data to group dialogue records by (season, episode) for Season 3.
# 3. For each episode, compute the dialogue count, average dialogue length, and a sorted list of unique speakers.
# 4. Build and return the final list of dictionaries summarizing the episodes.



### **Exercise 2: High-Rated Episode Dialogue Transformation**

**Task:**  
Merge the two datasets by matching on season and episode. Then, for all dialogue records from episodes with an IMDb rating above 8.5, produce a formatted string for each record with the following rules:
- Use an f‑string to format the output as:  
  `"<Title> | <Speaker>: <Dialogue>"`
- If the dialogue text is longer than 50 characters, convert it to uppercase; otherwise, convert it to title case.
- Ensure that the extraction of dialogue lines is done using a method that creates a copy (so that modifying the new list does not affect the original data).

Your final output should be a list of these formatted strings.

*Hint:*  
- Build a lookup dictionary from `friends_info_data` keyed by `(season, episode)`.  
- Use a list comprehension (or nested comprehensions) to filter dialogue records from `friends_data` whose corresponding episode has an IMDb rating greater than 8.5.  
- Use conditional expressions inside your comprehension to decide whether to use `.upper()` or `.title()` on the dialogue text.



In [None]:
# Your solution here:
# 1. Build a lookup dictionary from friends_info_data keyed by (season, episode).
# 2. Filter the dialogues in friends_data to only include those from episodes with imdb_rating > 8.5.
# 3. For each such dialogue, create a formatted string: "<title> | <speaker>: <dialogue>".
#    - Convert the dialogue text to uppercase if its length > 50, else title-case it.
# 4. Ensure that you create a copy of the dialogue data before modification (e.g., via slicing).
# 5. Return the list of formatted strings.