In [149]:
import pandas as pd

# Big Data Analytics - Lab Week 1


Some of the python exercises are adapted from the course:

https://github.com/rajathkumarmp/Python-Lectures where you can find more exercises if you wish to review, and from

https://www.practicepython.org/ a great resource for simple exercises and solutions.
#### Books
If you want a quick revision of Python I highly recommend [this book](https://www.oreilly.com/library/view/learning-python-5th/9781449355722/) or [this](https://www.oreilly.com/library/view/python-in-a/9781098113544/). If you want more advanced revision of python, I recommend [this](https://www.oreilly.com/library/view/fluent-python-2nd/9781492056348/) and if you are looking to brush up on python for data analysis I recommend [this](https://www.oreilly.com/library/view/python-for-data/9781491957653/).

This notebook covers foundational Python skills needed to work with Spark, MLlib, and Databricks workflows.
We will review:
- Variables, operators (arithmetic, relational, bitwise; and the difference between logical and and bitwise &)
- Data types and type conversion
- Built-in functions
- Data structures: lists, tuples, dictionaries, sets, and strings
- Indexing, slicing, and unpacking
- The importance of understanding views vs. copies (a common beginner pitfall)
- Using f-strings for formatting
- Some advanced topics.

Note: Each code cell is paired with a markdown explanation. Some cells are left as "Your code here" for you to test your understanding with minimal interruptions.

> If you haven't used a notebook like this before you can type code into the cells below and then execute it (Shift+Enter or the ▶ sign in the up-right corner). Typically, I will provide some example code for you to run and add exercises for you to apply your understanding and solve problems.

## Basics

### Variables

#### Defining and Using Variables

In Python, a variable is simply a name that refers to a value. In our Friends-themed example, we assign ages to some characters and other attributes. Variables do not need a type declaration; they are created when you assign a value.


In [33]:
# Assign ages (in years) to Friends characters
ross_age = 29
rachel_age = 28
monica_age = 30
joey_age = 31
ross_name = 'Ross'
# You can inspect these variables by evaluating them:
ross_age, rachel_age, monica_age, joey_age

(29, 28, 30, 31)

> In the cell above, we have created four variables—`ross_age`, `rachel_age`, `monica_age`, and `joey_age`—and Python automatically determines their type (here, integers). Notice that simply writing the variable names (or a tuple of them) displays their values in a notebook cell.
> You can use the `type()`function to determine the type assigned to a variable. Try it. 

**Important: Strings are immutable!**

Try running the code below.

### Operators

#### Arithmetic Operators

Arithmetic operators perform standard mathematical operations. They follow the BODMAS logic. 

| Symbol 	|Task Performed|
|--------|-------|
|+ 	|Addition|
|- 	|Subtraction|
|/ 	|division|
|% 	|mod|
|* 	|multiplication|
|// 	|floor division|
|** 	|to the power o|

#### Relational and Logical Operators

Relational operators compare values and return Boolean results (True or False). Logical operators combine Boolean values. Note that Python’s logical operator `and` is used for Boolean logic, whereas the bitwise operator `&` works on integers at the binary level.

|Symbol| 	Task Performed|
|----|------|
|== |	True, if it is equal|
|!= |	True, if not equal to|
|< |	less than|
|> |	greater than|
|<= |	less than or equal to|
|>= |	greater than or equal to|

 > **Additional Detail:**  
> The logical operator `and` evaluates both sides as Boolean values and returns True only if both are True. In contrast, the bitwise operator `&` operates on the bits of an integer. 

In [81]:
num1 = 5
num2 = 12
if num1>3 and num2<10:
  print("both are correct")
else:
  print ("one is wrong")


one is wrong


In [147]:
num1 = 5
num2 = 12
if num1>3 & num2<10:
  print("both are correct")
else:
  print ("one is wrong")


both are correct


#### Bitwise Operators

Bitwise operators work directly on the binary representations of integers. These are particularly useful in low-level data processing.

### Built-in Functions

Python provides a rich set of built-in functions that simplify data manipulation. In this section, we demonstrate several categories.

#### Type Conversion
You can use `int()` `str()` etc. to cast and convert types. 
Try converting an integer to a variable, and check the type using the `type()`function


In [82]:
type(int('5'))

int

#### **Important: Pythons Dynamic Typing**



Consider these three statements. Do they change the printed value of A?
```python
A = ["spam"]
B= A
B[0] = "shrubbery
```


How about these—is A changed now?
```python
A = ["spam"]
B = A[:]
B[0] = "shrubbery"
```

`round( )` function rounds the input value to a specified number of places or to the nearest integer.

Syntax: round(number, digits)


In [35]:
print(round(5.6231)) 

print(round(4.55892, 2))

6
4.56


`isinstance( )` returns True, if the first argument is an instance of that class. Multiple classes can also be checked at once.

Syntax: isinstance(object, type

In [36]:
print( isinstance(1, int))

print(isinstance(1.0,int))

print(isinstance(1.0,(int,float)))

True
False
True


`range( )` function outputs the integers of the specified range. It can also be used to generate a series by specifying the difference between the two numbers within a particular range. The elements are returned in a list (will be discussing in detail later.)

syntax: range(start, stop, step)


In [158]:
for i in range(6):
    print(i)

0
1
2
3
4
5


In [37]:
a=range(0,10,2)

for n in a:

    print(n)

0
2
4
6
8


### Data Structures

We now cover the main data structures in Python using Friends-related examples.

#### Lists

Lists are mutable sequences. In our theme, a list of Friends characters is defined.


In [159]:
# List of Friends characters
friends = ["Rachel", "Monica", "Phoebe", "Joey", "Chandler", "Ross"]
#tpye converion
alist = list(pd.Series(friends))
print(type(alist))
# Evaluating the list will show its contents
friends

<class 'list'>


['Rachel', 'Monica', 'Phoebe', 'Joey', 'Chandler', 'Ross']

##### List Built-in Functions

Python provides many built-in functions for working with lists.


In [None]:

# Finding the number of friends
num_friends = len(friends)

# Alphabetically first and last friend
first_friend = min(friends)  
last_friend = max(friends)  

# Sorting
sorted_friends = sorted(friends)


(6,
 'Chandler',
 'Ross',
 ['Chandler', 'Joey', 'Monica', 'Phoebe', 'Rachel', 'Ross'])

##### Modifying Lists

Lists are mutable, meaning you can modify them after creation.

In [40]:
# Add Gunther to the group
friends.append("Gunther")

# Insert Janice at the second position
friends.insert(1, "Janice")

# Remove the last friend (Gunther, who got added)
friends.pop()

# Replace Chandler with Mike (modify an existing item)
friends[4] = "Mike"

friends


['Rachel', 'Janice', 'Monica', 'Phoebe', 'Mike', 'Chandler', 'Ross']

In [164]:
help(friends.append)

Help on built-in function append:

append(object, /) method of builtins.list instance
    Append object to the end of the list.



The `in` and `not in` operators check for membership in sequences such as lists, tuples, and dictionaries. These operators are commonly used for searching within a collection

In [83]:
# List of Friends characters
friends = ["Ross", "Rachel", "Monica", "Chandler", "Joey", "Phoebe"]

# Checking membership
is_ross_in_friends = "Ross" in friends
is_gunther_in_friends = "Gunther" in friends
is_gunther_not_in_friends = "Gunther" not in friends

is_ross_in_friends, is_gunther_in_friends, is_gunther_not_in_friends


(True, False, True)

#### Exercise

In [165]:
# Your solution here:
list_characters = ['Joey Tribbiani', 'Chandler Bing', 'Chandler Bing', 'Scene Directions', 'Rachel Green', 'Chandler Bing', 'Chandler Bing', 'Monica Geller',
 'Rachel Green',
 'Ross Geller',
 'Joey Tribbiani',
 'Joey Tribbiani',
 'Scene Directions',
 'Chandler Bing',
 'Phoebe Buffay',
 'Chandler Bing',
 'Chandler Bing',
 'Chandler Bing',
 'Ross Geller',
 'Joey Tribbiani',
 'Ross Geller',
 'Rachel Green',
 'Phoebe Buffay',
 'Chandler Bing',
 'Ross Geller',
 'Scene Directions',
 'Chandler Bing',
 'Rachel Green',
 'Phoebe Buffay',
 'Joey Tribbiani',
 'Phoebe Buffay',
 'Janice Litman Goralnik',
 'Monica Geller',
 'Chandler Bing',
 'Phoebe Buffay',
 'Joey Tribbiani', 'Ross Geller', 'Rachel Green' , 'Ross Geller', 'Ursula Buffay', 'Scene Directions', 'Monica Geller',
 'Jack Geller',
 'Rachel Green',
 'Chandler Bing',
 'Joey Tribbiani',
 'Phoebe Buffay',
 'Ross Geller', 'Ronni Rapalono', 'Monica Geller']
# 2. Theres a new member in the group: add them to the list using .extend().
# 3. Theres been a breakup in the group:remove a specific character using .remove().
# 4. Ross and rachel are on another break: use .pop() with a specific index to remove one of them from the middle.
# 5. Could there be any more Chandlers? Count how many times a character appears using .count().
# 6. When did Monica join the list? Find the index of a character using .index().
# 7. Reverse the list using .reverse() and then clear the list with .clear().


In [167]:
#2
list_characters.extend(['Dhruv', 'giovanni'])
print(list_characters)
list_characters.remove('Chandler Bing')
print(list_characters)
print(list_characters.count('Chandler Bing'))

print(list_characters.index('Monica Geller'))

list_characters.reverse()
list_characters

['Dhruv', 'Monica Geller', 'Ronni Rapalono', 'Ross Geller', 'Phoebe Buffay', 'Joey Tribbiani', 'Chandler Bing', 'Rachel Green', 'Jack Geller', 'Monica Geller', 'Scene Directions', 'Ursula Buffay', 'Ross Geller', 'Rachel Green', 'Ross Geller', 'Joey Tribbiani', 'Phoebe Buffay', 'Chandler Bing', 'Monica Geller', 'Janice Litman Goralnik', 'Phoebe Buffay', 'Joey Tribbiani', 'Phoebe Buffay', 'Rachel Green', 'Chandler Bing', 'Scene Directions', 'Ross Geller', 'Chandler Bing', 'Phoebe Buffay', 'Rachel Green', 'Ross Geller', 'Joey Tribbiani', 'Ross Geller', 'Chandler Bing', 'Chandler Bing', 'Chandler Bing', 'Phoebe Buffay', 'Chandler Bing', 'Scene Directions', 'Joey Tribbiani', 'Joey Tribbiani', 'Ross Geller', 'Rachel Green', 'Monica Geller', 'Chandler Bing', 'Chandler Bing', 'Rachel Green', 'Scene Directions', 'Chandler Bing', 'Joey Tribbiani', 'Dhruv', 'giovanni']
['Dhruv', 'Monica Geller', 'Ronni Rapalono', 'Ross Geller', 'Phoebe Buffay', 'Joey Tribbiani', 'Rachel Green', 'Jack Geller', 'Mo

['giovanni',
 'Dhruv',
 'Joey Tribbiani',
 'Chandler Bing',
 'Scene Directions',
 'Rachel Green',
 'Chandler Bing',
 'Chandler Bing',
 'Monica Geller',
 'Rachel Green',
 'Ross Geller',
 'Joey Tribbiani',
 'Joey Tribbiani',
 'Scene Directions',
 'Chandler Bing',
 'Phoebe Buffay',
 'Chandler Bing',
 'Chandler Bing',
 'Chandler Bing',
 'Ross Geller',
 'Joey Tribbiani',
 'Ross Geller',
 'Rachel Green',
 'Phoebe Buffay',
 'Chandler Bing',
 'Ross Geller',
 'Scene Directions',
 'Chandler Bing',
 'Rachel Green',
 'Phoebe Buffay',
 'Joey Tribbiani',
 'Phoebe Buffay',
 'Janice Litman Goralnik',
 'Monica Geller',
 'Chandler Bing',
 'Phoebe Buffay',
 'Joey Tribbiani',
 'Ross Geller',
 'Rachel Green',
 'Ross Geller',
 'Ursula Buffay',
 'Scene Directions',
 'Monica Geller',
 'Jack Geller',
 'Rachel Green',
 'Joey Tribbiani',
 'Phoebe Buffay',
 'Ross Geller',
 'Ronni Rapalono',
 'Monica Geller',
 'Dhruv']

In [88]:
list_characters.reverse()

**Notice: you dont need to return the list after running reverse, it modifies it in place**

In [89]:
print(list_characters)

['Dhruv', 'Monica Geller', 'Ronni Rapalono', 'Ross Geller', 'Phoebe Buffay', 'Joey Tribbiani', 'Chandler Bing', 'Rachel Green', 'Jack Geller', 'Monica Geller', 'Scene Directions', 'Ursula Buffay', 'Ross Geller', 'Rachel Green', 'Ross Geller', 'Joey Tribbiani', 'Phoebe Buffay', 'Chandler Bing', 'Monica Geller', 'Janice Litman Goralnik', 'Phoebe Buffay', 'Joey Tribbiani', 'Phoebe Buffay', 'Rachel Green', 'Chandler Bing', 'Scene Directions', 'Ross Geller', 'Chandler Bing', 'Phoebe Buffay', 'Rachel Green', 'Ross Geller', 'Joey Tribbiani', 'Ross Geller', 'Chandler Bing', 'Chandler Bing', 'Chandler Bing', 'Phoebe Buffay', 'Chandler Bing', 'Scene Directions', 'Joey Tribbiani', 'Joey Tribbiani', 'Ross Geller', 'Rachel Green', 'Monica Geller', 'Chandler Bing', 'Chandler Bing', 'Rachel Green', 'Scene Directions', 'Chandler Bing', 'Joey Tribbiani']


#### Tuples

Tuples are immutable sequences. They can serve as records. For example, a tuple can represent a Friends character’s basic info.

> Here, the tuple `rachel_info` represents a record for Rachel. Since tuples are immutable, their contents cannot be changed after creation.

In [43]:
# Tuple representing a Friends character record: (name, age, occupation)
rachel_info = ("Rachel", 28, "Fashion Executive")
# Evaluating the tuple shows its immutable content
rachel_info

('Rachel', 28, 'Fashion Executive')

In [90]:
ntuple = ((1, 2), (3, 4))
print(ntuple)

((1, 2), (3, 4))


In [44]:
# Set of unique filming locations
locations = {"Central Perk", "NYU", "Vegas", "London"}
# Evaluating the set shows that duplicates are removed
locations

{'Central Perk', 'London', 'NYU', 'Vegas'}

#### Dictionaries

Dictionaries store key-value pairs. For instance, we can map a character’s name to their catchphrase.

In [91]:
friends_catchphrases = {
    "Chandler": "Could I BE any more sarcastic?",
    "Joey": "How you doin'?",
    "Monica": "I know!"
}

In [93]:
friends_catchphrases["Ross"] = "We were on a break!"
friends_catchphrases['List of catch'] = ['How you  doin', 'Pivot']


In [None]:
# Dictionary mapping Friends characters to a famous catchphrase

# Getting a catchphrase safely
joey_phrase = friends_catchphrases.get("Joey", "No catchphrase found")


# Adding a new entry

# Removing an entry
friends_catchphrases.pop("Monica")

# Checking all keys and values
all_keys = friends_catchphrases.keys()
all_values = friends_catchphrases.values()

joey_phrase, all_keys, all_values


("How you doin'?",
 dict_keys(['Chandler', 'Joey', 'Ross']),
 dict_values(['Could I BE any more sarcastic?', "How you doin'?", 'We were on a break!']))

In [96]:
# Nested dictionary with character details
friends_info = {
    "Ross": {"age": 29, "job": "Paleontologist"},
    "Monica": {"age": 30, "job": "Chef"},
    "Chandler": {"age": 31, "job": "Statistical Analyst"}
}

In [98]:
friends_info['Ross']['job']

'Paleontologist'

In [None]:


# Accessing nested values
ross_job = friends_info["Ross"]["job"]
chandler_age = friends_info["Chandler"]["age"]

ross_job, chandler_age


('Paleontologist', 31)

#### Sets

Sets are unordered collections of unique elements. For example, we can use a set to list the unique filming locations of Friends episodes.

> The set `locations` contains unique items. Even if "New York" is repeated, it appears only once in the set.

In [47]:
# Set of unique filming locations
locations = {"Central Perk", "NYU","NYU",  "Vegas", "London"}
# Evaluating the set shows that duplicates are removed
locations

{'Central Perk', 'London', 'NYU', 'Vegas'}

### Indexing, Slicing, and Unpacking

#### Indexing and Slicing

Indexing accesses a single element; slicing extracts a subsequence. 

> 
> We use square brackets `[]` for indexing and slicing. Negative indices start from the end of the sequence.

In Python, we can also index backward, from the end—positive indexes count from the left, and negative indexes count back from the right. 

In [101]:
list_characters[-2]

'Chandler Bing'

In [156]:
list_characters[len(friends) - 3]

'Ross Geller'

> Notice that we can use an arbitrary expression in the square brackets, not just a hard- coded number literal—anywhere that Python expects a value, we can use a literal, a variable, or any expression we wish. Python’s syntax is completely general this way.


In [None]:
length = len(friends)
list_characters[length - 3]

In [49]:
# List indexing and slicing with our friends list
first_friend = friends[0]        # "Rachel"
last_friend = friends[-1]        # "Ross"
middle_friends = friends[1:4]      # ["Monica", "Phoebe", "Joey"]

first_friend, last_friend, middle_friends

('Ross', 'Phoebe', ['Rachel', 'Monica', 'Chandler'])

In [168]:
friends[1:]

['Monica', 'Phoebe', 'Joey', 'Chandler', 'Ross']

Probably the easiest way to think of slices is that they are a way to extract an entire column from a string in a single step. Their general form, X[I:J], means “give me ev- erything in X from offset I up to but not including offset J.” The result is returned in a new object.

##### Unpacking

Unpacking assigns elements of a sequence to variables. This helps avoid using indices.


In [177]:
# Unpack the friends list into individual variables
rachel, monica, phoebe, joey, chandler, ross = friends
# Evaluating the variables (each will display its value)
type(rachel)

str

##### Extended Unpacking

The * operator lets you capture multiple values in a single variable.
> Here, `first` gets the first element, `last` gets the last element, and `middle` captures the remaining elements as a list.

In [173]:
friends

['Rachel', 'Monica', 'Phoebe', 'Joey', 'Chandler', 'Ross']

In [174]:
first, *middle, last = friends


In [175]:
middle

['Monica', 'Phoebe', 'Joey', 'Chandler']

### Understanding Views vs. Copies

One common pitfall is misunderstanding when a slice returns a new copy versus a view. With lists, slicing creates a copy.

In [179]:
ages = [ross_age, rachel_age, monica_age, joey_age]
ages_copy = ages

In [180]:
ages

[29, 28, 30, 31]

In [181]:
ages_copy[0] = 42
ages

[42, 28, 30, 31]

In [182]:
# Create a list of Friends ages
ages = [ross_age, rachel_age, monica_age, joey_age]

ages_copy = ages[:]  # This creates a new copy
ages_copy[0] = 42
ages

[29, 28, 30, 31]

In [183]:
# Create a list of Friends ages
ages = [ross_age, rachel_age, monica_age, joey_age]

ages_copy = ages.copy()  # This creates a new copy
ages_copy[0] = 42
ages

[29, 28, 30, 31]

### Exercise: Episode Ratings Analysis

Given the following dictionary (episode titles mapped to their IMDb ratings):

**Tasks:**

1. **Calculate the Average Rating:**  
   Compute the average IMDb rating for all episodes in the dictionary.

2. **Find the Best Episode(s):**  
   Identify the episode title(s) with the highest rating.

3. **Round Ratings and Count Occurrences:**  
   Create a dictionary that rounds each episode’s rating to the nearest 0.5 (using the `round()` function) and maps each rounded rating to the number of episodes that received that rating.


In [54]:
# Your solution here:

episode_ratings = {
 'The Pilot': 8.3, 'The One with the Sonogram at the End': 8.1, 'The One with the Thumb': 8.2, 'The One with George Stephanopoulos': 8.1, 'The One with the East German Laundry Detergent': 8.5, 'The One with the Butt': 8.1, 'The One with the Blackout': 9.0, 'The One Where Nana Dies Twice': 8.1, 'The One Where Underdog Gets Away': 8.2, 'The One with the Monkey': 8.1, 'The One with Mrs. Bing': 8.2, 'The One with the Dozen Lasagnas': 8.2, 'The One with the Boobies': 8.7,
 'The One with the Candy Hearts': 8.3 , 'The One with the Stoned Guy': 8.2, 'The One with Two Parts: Part 1': 8.2, 'The One with Two Parts: Part 2': 8.5, 'The One with All the Poker': 8.8,  'The One with Five Steaks and an Eggplant': 8.3, 'The One with the Baby on the Bus': 8.6, 'The One Where Ross Finds Out': 9.0, 'The One with the List': 8.5, "The One with Phoebe's Dad": 8.0, 'The One with Russ': 8.0, 'The One with the Lesbian Wedding': 8.1, 'The One After the Superbowl': 8.8 ,'The One with the Prom Video': 9.4, 'The One Where Ross and Rachel...You Know': 8.9, 'The One Where Joey Moves Out': 8.6, 'The One Where Eddie Moves In': 8.3, "The One Where Eddie Won't Go": 8.6, 'The One Where Old Yeller Dies': 8.2, 'The One with the Bullies': 8.2, 'The One with the Two Parties': 9.0, 'The One with the Chicken Pox': 8.1, "The One with Barry and Mindy's Wedding": 8.2, 'The One with the Princess Leia Fantasy': 8.4, "The One Where No One's Ready": 9.0, 'The One with the Jam': 8.1, 'The One with the Metaphorical Tunnel': 8.1, 'The One with Frank Jr.': 8.1, 'The One with the Flashback': 9.1, 'The One with the Race Car Bed': 8.3, 'The One with the Giant Poking Device': 8.4, 'The One with the Football': 9.0, 'The One Where Rachel Quits': 8.1, "The One Where Chandler Can't Remember Which Sister": 8.6, 'The One with All the Jealousy': 8.2, 'The One Where Monica and Richard Are Just Friends': 8.2, "The One with Phoebe's Ex-Partner": 7.9, 'The One Where Ross and Rachel Take a Break': 8.6, 'The One with the Morning After': 9.1, 'The One Without the Ski Trip': 8.3, 'The One with the Hypnosis Tape': 8.4, 'The One with the Tiny T-Shirt': 8.2, 'The One with the Dollhouse': 8.1, 'The One with a Chick and a Duck': 8.7, 'The One with the Screamer': 8.3,
 "The One with Ross's Thing": 8.1, 'The One with the Ultimate Fighting Champion': 8.1, 'The One at the Beach': 8.8, 'The One with the Jellyfish': 9.1,
 'The One with the Cat': 8.1,
 'The One with the Cuffs': 8.5,
 'The One with the Ballroom Dancing': 8.2,
 "The One with Joey's New Girlfriend": 8.4,
 'The One with the Dirty Girl': 8.5,
 'The One Where Chandler Crosses the Line': 8.7,
 'The One with Chandler in a Box': 9.1,
 "The One Where They're Going to Party!": 7.9,
 'The One with the Girl from Poughkeepsie': 8.1,
 "The One with Phoebe's Uterus": 8.5,
 'The One with the Embryos': 9.5,
 "The One with Rachel's Crush": 8.2,
 "The One with Joey's Dirty Day": 8.2,
 'The One with All the Rugby': 8.5,
 'The One with the Fake Party': 8.2,
 'The One with the Free Porn': 8.6,
 "The One with Rachel's New Dress": 8.3,
 'The One with All the Haste': 8.7,
 'The One with All the Wedding Dresses': 8.5,
 'The One with the Invitation': 7.2,
 'The One with the Worst Best Man Ever': 8.5,
 "The One with Ross' Wedding": 9.2,
 'The One After Ross Says Rachel': 8.9,
 'The One with All the Kissing': 9.0,
 'The One Hundredth': 8.8,
 'The One Where Phoebe Hates PBS': 8.3,
 'The One with the Kips': 8.8,
 'The One with the Yeti': 8.1,
 'The One Where Ross Moves In': 8.4,
 'The One with All the Thanksgivings': 9.2,
 "The One with Ross's Sandwich": 9.1,
 'The One with the Inappropriate Sister': 8.2,
 'The One with All the Resolutions': 9.1,
 "The One with Chandler's Work Laugh": 8.3,
 "The One with Joey's Bag": 8.1,
 'The One Where Everybody Finds Out': 9.7,
 'The One with the Girl Who Hits Joey': 8.5,
 'The One with the Cop': 8.6,
 "The One with Rachel's Inadvertent Kiss": 8.5,
 'The One Where Rachel Smokes': 8.0,
 "The One Where Ross Can't Flirt": 8.7,
 'The One with the Ride-Along': 8.3
}


# 1. Calculate the average IMDb rating for all episodes.
# 2. Identify the episode(s) with the highest rating.
# 3. Create a dictionary that groups episodes into "Excellent" (>=9.0), "Good" (>=8.0 and <9.0), and "Average" (<8.0).
# 4. Create a dictionary that rounds each rating to the nearest 0.5 and counts how many episodes have that rounded rating.

### Strings

Strings are immutable sequences of characters. They can be sliced, indexed, and formatted.
By and large, strings are fairly easy to use in Python. Perhaps the most complicated thing about them is that there are so many ways to write them in your code:
- Single quotes: 'spa"m'
- Double quotes: "spa'm"
- Triple quotes: '''... spam ...''', """... spam ..."""
- Escape sequences: "s\tp\na\0m"
- Raw strings: r"C:\new\test.spm" $\rightarrow$ this can come in handy when defining file paths



In [55]:
# A Friends-themed string
theme_song = "I'll be there for you"
# Evaluating a slice: the first 10 characters
theme_slice = theme_song[:10]

theme_song, theme_slice

("I'll be there for you", "I'll be th")

#### Some Basic String Stuff

In [56]:
tv_guide = 'chanandler' + 'bong'
print(len(tv_guide))
pivot = 'Pivot!' * 10
pivot

14


'Pivot!Pivot!Pivot!Pivot!Pivot!Pivot!Pivot!Pivot!Pivot!Pivot!'

#### Extended Slicing
Python allows slicing with three parameters: `[start:stop:step]`.

In [169]:
quote = "We were on a break!"

# Extract every second letter
every_second_letter = quote[::2]  
every_second_letter

'W eeo  ra!'

In [None]:
# Reverse a Friends quote
reversed_quote = quote[::-1]  # Reverse the string

reversed_quote


('!kaerb a no erew eW', 'W eeo  ra!')

#### String Methods
Python also supports a number of built in string methods. 

In [58]:
phrase = "Friends is the best show ever!"

# Convert to uppercase
upper_phrase = phrase.upper()

# Replace words
replaced_phrase = phrase.replace("best", "worst")

# Check if phrase starts with a word
starts_with_friends = phrase.startswith("Friends")

upper_phrase, replaced_phrase, starts_with_friends


('FRIENDS IS THE BEST SHOW EVER!', 'Friends is the worst show ever!', True)

> Again, despite the names of these string methods, we are not changing the original strings here, but creating new strings as the results—because strings are immutable, this is the only way this can work. 



#### F String Formatting

f-strings are a powerful way to embed expressions in string literals. They make formatting clear and concise.

In [59]:
# Create an f-string using Friends data
character = "Joey"
catchphrase = friends_catchphrases.get(character, "No catchphrase")
message = f"{character} is known for saying: {catchphrase}"

# Evaluating the f-string shows the formatted message
message

"Joey is known for saying: How you doin'?"

### Loops
- Recap: If Statements, For Loops and While Loops.

In [60]:
# Checking if Ross is older than Rachel
if friends_info["Ross"]["age"] > friends_info["Monica"]["age"]:
    outcome = "Ross is older than Monica."
else:
    outcome = "Ross is not older than Monica."

outcome


'Ross is not older than Monica.'

In [61]:
# Looping through friends
friend_ages = {"Ross": 29, "Monica": 30, "Chandler": 31}

for friend, age in friend_ages.items():
    sentence = f"{friend} is {age} years old."
    print(sentence)


Ross is 29 years old.
Monica is 30 years old.
Chandler is 31 years old.


In [62]:
# Counting down from 5
counter = 5
while counter > 0:
    print(f"Counting down: {counter}")
    counter -= 1


Counting down: 5
Counting down: 4
Counting down: 3
Counting down: 2
Counting down: 1


##### Break, Continue, and Pass (Explained with Examples)

Each of these keywords changes how a loop executes:

- **break**: Immediately stops the loop.
- **continue**: Skips the current iteration and moves to the next.
- **pass**: Does nothing, acting as a placeholder.

###### Break

In [63]:
# Searching for Chandler in the Friends group
friends = ["Ross", "Rachel", "Monica", "Chandler", "Joey", "Phoebe"]

for friend in friends:
    if friend == "Chandler":
        break  # Stops when Chandler is found
    print(f"{friend} is not Chandler.")  


Ross is not Chandler.
Rachel is not Chandler.
Monica is not Chandler.


###### Continue

In [64]:
# The Friends characters are ordering coffee, except Joey, who already has one
for friend in friends:
    if friend == "Joey":
        continue  # Skip Joey
    print(f"{friend} orders coffee.")  


Ross orders coffee.
Rachel orders coffee.
Monica orders coffee.
Chandler orders coffee.
Phoebe orders coffee.


##### Pass

In [65]:
# Placeholder function for a future feature
def friend_introduction(friend):
    if friend == "Janice":
        pass  # Placeholder for Janice's introduction
    else:
        print(f"{friend} says hello!")

friend_introduction("Ross")  
friend_introduction("Janice")  


Ross says hello!


#### Nested Loops
Nested loops occur when one loop runs inside another.

In [66]:
# Friends meeting locations
locations = ["Central Perk", "Monica's Apartment"]
people = ["Ross", "Rachel", "Joey"]

# Nested loop: Each person visits each location
for location in locations:
    for person in people:
        print(f"{person} is at {location}.")


Ross is at Central Perk.
Rachel is at Central Perk.
Joey is at Central Perk.
Ross is at Monica's Apartment.
Rachel is at Monica's Apartment.
Joey is at Monica's Apartment.


In [150]:

# Load the CSV files using Pandas
friends_df = pd.read_csv(r"datasets/friends.csv")
friends_info_df = pd.read_csv(r"datasets/friends_info.csv")

# Convert to lists of dictionaries for plain Python processing
friends_data = friends_df.to_dict(orient="records")
friends_info_data = friends_info_df.to_dict(orient="records")

## Some Advanced Topics
### Understanding Iterables, Iteration, and List Comprehensions  

Before we dive into code, let’s first understand the fundamental concepts of iterables, iteration, and list comprehensions—key building blocks for working with data efficiently in Python.

#### What is an Iterable?  
An iterable is any Python object that can return its elements one at a time. Lists, tuples, dictionaries, sets, and even strings are iterables because they contain multiple elements that can be accessed sequentially. Think of an iterable like a script of a Friends episode—each line is stored in order, and you can go through it one by one.  

#### What is Iteration?  
Iteration is the process of going through an iterable, accessing each element in turn. This is done using loops or special functions that retrieve elements automatically. For example, when we loop through a list of Friends characters and print each name, we are iterating over that list.  

Iteration is crucial in programming because it allows us to process data dynamically—whether that means counting the number of lines Joey has in Season 2, finding the most common words in dialogues, or filtering episodes with high IMDb ratings.  

#### What is an Iterator?  
An iterator is a special type of iterable that remembers its position while iterating. It follows the iterator protocol, meaning it implements two methods:  
- `__iter__()` → Returns the iterator itself.  
- `__next__()` → Returns the next item in the sequence.  

Unlike regular iterables, iterators don’t restart when exhausted—they must be recreated. You can think of this like watching an episode of Friends: if you pause and resume later, you continue from where you left off.  

#### What is a List Comprehension?  
A list comprehension is a concise and readable way to create lists by applying an operation to each element in an iterable. Instead of writing a loop to filter or transform data, we can express it in a single line.  

For example, instead of writing multiple lines to collect all dialogue lines spoken by Ross, we can use a list comprehension to do it in one compact expression. This makes code more readable, efficient, and Pythonic.  

#### Why Use List Comprehensions?  
List comprehensions have several advantages:  
- Concise: Reduces the number of lines needed for simple operations.  
- Efficient: Runs faster than using explicit loops in many cases.  
- Readable: Expresses filtering and transformations in an intuitive way.  

For example, if we wanted to get all episodes with more than 10 million US viewers, a loop would take multiple lines, while a list comprehension could do it in just one.  



### Step 2. Basic Iteration

Let’s start by iterating over the list of dialogue records (from friends_data) to, for example, collect all lines spoken by **Rachel**.

In [136]:
# Basic iteration: Extract dialogue lines by Rachel
rachel_dialogues = []  # We'll collect these in a list
for record in friends_data:
    if record["speaker"] == "Rachel Green":
        # Each record is a dictionary; we add the dialogue text to our list.
        rachel_dialogues.append(record["text"])

# Show a few of Rachel's dialogue lines (first 3 for brevity)
rachel_dialogues[:3]

["Oh God Monica hi! Thank God! I just went to your building and you weren't there and then this guy with a big hammer said you might be here and you are, you are!",
 'Hi, sure!',
 "Oh God... well, it started about a half hour before the wedding. I was in the room where we were keeping all the presents, and I was looking at this gravy boat. This really gorgeous Lamauge gravy boat. When all of a sudden- Sweet 'n' Lo?- I realized that I was more turned on by this gravy boat than by Barry! And then I got really freaked out, and that's when it hit me: how much Barry looks like Mr. Potato Head. Y'know, I mean, I always knew looked familiar, but... Anyway, I just had to get out of there, and I started wondering 'Why am I doing this, and who am I doing this for?'. So anyway I just didn't know where to go, and I know that you and I have kinda drifted apart, but you're the only person I knew who lived here in the city."]

### Step 3. Basic List Comprehensions

Now we can rewrite the above loop as a list comprehension. This single line builds a list by filtering and mapping at once.

In [139]:
friends_data

[{'text': "There's nothing to tell! He's just some guy I work with!",
  'speaker': 'Monica Geller',
  'season': 1,
  'episode': 1,
  'scene': 1,
  'utterance': 1},
 {'text': "C'mon, you're going out with the guy! There's gotta be something wrong with him!",
  'speaker': 'Joey Tribbiani',
  'season': 1,
  'episode': 1,
  'scene': 1,
  'utterance': 2},
 {'text': 'All right Joey, be nice. So does he have a hump? A hump and a hairpiece?',
  'speaker': 'Chandler Bing',
  'season': 1,
  'episode': 1,
  'scene': 1,
  'utterance': 3},
 {'text': 'Wait, does he eat chalk?',
  'speaker': 'Phoebe Buffay',
  'season': 1,
  'episode': 1,
  'scene': 1,
  'utterance': 4},
 {'text': '(They all stare, bemused.)',
  'speaker': 'Scene Directions',
  'season': 1,
  'episode': 1,
  'scene': 1,
  'utterance': 5},
 {'text': "Just, 'cause, I don't want her to go through what I went through with Carl- oh!",
  'speaker': 'Phoebe Buffay',
  'season': 1,
  'episode': 1,
  'scene': 1,
  'utterance': 6},
 {'text': "

In [140]:
list_comprehension = [i['text'] for i in friends_data]
list_comprehension

["There's nothing to tell! He's just some guy I work with!",
 "C'mon, you're going out with the guy! There's gotta be something wrong with him!",
 'All right Joey, be nice. So does he have a hump? A hump and a hairpiece?',
 'Wait, does he eat chalk?',
 '(They all stare, bemused.)',
 "Just, 'cause, I don't want her to go through what I went through with Carl- oh!",
 "Okay, everybody relax. This is not even a date. It's just two people going out to dinner and- not having sex.",
 'Sounds like a date to me.',
 '[Time Lapse]',
 "Alright, so I'm back in high school, I'm standing in the middle of the cafeteria, and I realize I am totally naked.",
 'Oh, yeah. Had that dream.',
 "Then I look down, and I realize there's a phone... there.",
 'Instead of...?',
 "That's right.",
 'Never had that dream.',
 'No.',
 "All of a sudden, the phone starts to ring. Now I don't know what to do, everybody starts looking at me.",
 "And they weren't looking at you before?!",
 "Finally, I figure I'd better answe

In [184]:
rachel_dialogues_comp = [record["text"] for record in friends_data]
rachel_dialogues_comp

["There's nothing to tell! He's just some guy I work with!",
 "C'mon, you're going out with the guy! There's gotta be something wrong with him!",
 'All right Joey, be nice. So does he have a hump? A hump and a hairpiece?',
 'Wait, does he eat chalk?',
 '(They all stare, bemused.)',
 "Just, 'cause, I don't want her to go through what I went through with Carl- oh!",
 "Okay, everybody relax. This is not even a date. It's just two people going out to dinner and- not having sex.",
 'Sounds like a date to me.',
 '[Time Lapse]',
 "Alright, so I'm back in high school, I'm standing in the middle of the cafeteria, and I realize I am totally naked.",
 'Oh, yeah. Had that dream.',
 "Then I look down, and I realize there's a phone... there.",
 'Instead of...?',
 "That's right.",
 'Never had that dream.',
 'No.',
 "All of a sudden, the phone starts to ring. Now I don't know what to do, everybody starts looking at me.",
 "And they weren't looking at you before?!",
 "Finally, I figure I'd better answe

In [143]:
# Using a list comprehension to extract Rachel's dialogues in one line
rachel_dialogues_comp = [record["text"] for record in friends_data if record["speaker"] ==  "Rachel Green"][:3]

rachel_dialogues_comp

["Oh God Monica hi! Thank God! I just went to your building and you weren't there and then this guy with a big hammer said you might be here and you are, you are!",
 'Hi, sure!',
 "Oh God... well, it started about a half hour before the wedding. I was in the room where we were keeping all the presents, and I was looking at this gravy boat. This really gorgeous Lamauge gravy boat. When all of a sudden- Sweet 'n' Lo?- I realized that I was more turned on by this gravy boat than by Barry! And then I got really freaked out, and that's when it hit me: how much Barry looks like Mr. Potato Head. Y'know, I mean, I always knew looked familiar, but... Anyway, I just had to get out of there, and I started wondering 'Why am I doing this, and who am I doing this for?'. So anyway I just didn't know where to go, and I know that you and I have kinda drifted apart, but you're the only person I knew who lived here in the city."]

> This list comprehension iterates over every record in `friends_data`, includes only those where `speaker` is `"Rachel"`, and extracts the `text` field. The result is functionally identical to the previous loop but in a more compact form.

#### Exercise: Exploring String Methods and Formatting with List Comprehension

You are given a dictionary `quote_dict` where keys are Friends quotes and values are the characters who said them.

Your task is to manipulate and format these quotes using string methods and formatting techniques. You should not modify the dictionary structure, only work with string transformations.

#### Tasks:

1. Normalize Quotes:  
   - Convert all quotes to lowercase.
   - Remove any leading or trailing whitespace.
   - Replace "Ross" with `"Dr. Geller"` wherever it appears in a quote.

2. Modify and Extract Text:  
   - Count how many times "no" appears in all quotes (case-insensitive).
   - Extract the first 5 words of each quote and store them in a new list.

3. Format Output Nicely:  
   - Convert all quotes into title case (first letter of each word capitalized).
   - Use f-strings to print each quote in the format:  
     `"Character: <Character Name> | Quote: "<Formatted Quote>" (Length: X words)"`  
     (Hint: Use `len()` and `.split()` to count words in a quote.)


In [70]:
# YOUR CODE HERE
# Given dictionary of Friends quotes
quote_dict = {
    "There's nothing to tell! He's just some guy I work with!": 'Monica Geller',
    "C'mon, you're going out with the guy! There's gotta be something wrong with him!": 'Joey Tribbiani',
    'All right Joey, be nice. So does he have a hump? A hump and a hairpiece?': 'Chandler Bing',
    'Wait, does he eat chalk?': 'Phoebe Buffay',
    "Just, 'cause, I don't want her to go through what I went through with Carl- oh!": 'Phoebe Buffay',
    "Okay, everybody relax. This is not even a date. It's just two people going out to dinner and- not having sex.": 'Monica Geller',
    'Sounds like a date to me.': 'Chandler Bing',
    "Alright, so I'm back in high school, I'm standing in the middle of the cafeteria, and I realize I am totally naked.": 'Chandler Bing',
    "Then I look down, and I realize there's a phone... there.": 'Chandler Bing',
    'Instead of...?': 'Joey Tribbiani',
    "That's right.": 'Monica Geller',
    'Never had that dream.': 'Joey Tribbiani',
    'No.': 'Monica Geller',
    "All of a sudden, the phone starts to ring. Now I don't know what to do, everybody starts looking at me.": 'Chandler Bing',
    "And they weren't looking at you before?!": 'Monica Geller',
    "Finally, I figure I'd better answer it, and it turns out it's my mother, which is very-very weird, because- she never calls me!": 'Chandler Bing',
    'Hi.': 'Rachel Green',
    'This guy says hello, I wanna kill myself.': 'Joey Tribbiani',
    'Are you okay, sweetie?': 'Monica Geller',
    'I just feel like someone reached down my throat, grabbed my small intestine, pulled it out of my mouth and tied it around my neck...': 'Ross Geller',
    'Cookie?': 'Chandler Bing',
    'Carol moved her stuff out today.': 'Monica Geller',
    'Ohh.': 'Monica Geller',
    'Let me get you some coffee.': 'Monica Geller',
    'Thanks.': 'Monica Geller',
    'Ooh! Oh!': 'Phoebe Buffay',
    "No, no don't! Stop cleansing my aura! No, just leave my aura alone, okay?": 'Ross Geller',
    'Fine! Be murky!': 'Phoebe Buffay',
    "I'll be fine, alright? Really, everyone. I hope she'll be very happy.": 'Ross Geller',
    "No you don't.": 'Rachel Green',
    "No I don't, to hell with her, she left me!": 'Ross Geller',
    'And you never knew she was a lesbian...': 'Joey Tribbiani',
    "No!! Okay?! Why does everyone keep fixating on that? She didn't know, how should I know?": 'Ross Geller',
    'Sometimes I wish I was a lesbian... Did I say that out loud?': 'Chandler Bing',
    'I told mom and dad last night, they seemed to take it pretty well.': 'Ross Geller',
    "Oh really, so that hysterical phone call I got from a woman at sobbing 3:00 A.M., \"I'll never have grandchildren, I'll never have grandchildren.\" was what? A wrong number?": 'Monica Geller',
    'Sorry.': 'Chandler Bing',
}

# 1. Normalize Quotes: Convert to lowercase, remove whitespace, replace "Ross" with "Dr. Geller"
normalized_quotes# DO SOMETHING 

# 2. Count occurrences of "no" in all quotes (case-insensitive)
no_count = # DO SOMETHING 

# 3. Extract first 5 words from each quote
first_5_words # DO SOMETHING 

# 4. Convert all quotes to title case
title_case_quotes # DO SOMETHING 

# 5. Format and print each quote with f-string formatting
for quote, speaker in title_case_quotes.items():
# DO SOMETHING 

# Print additional statistics
# DO SOMETHING 


SyntaxError: invalid syntax (2196717976.py, line 47)

### Step 4. Extended Examples: Filtering with Multiple Conditions

Let’s extend our example to work with both datasets. Suppose we want to extract all dialogue lines from episodes with a high IMDb rating (above 8.0). To do this efficiently, we first build a lookup dictionary for episode metadata keyed by `(season, episode)`

In [145]:
# Create a lookup dictionary from friends_info_data keyed by (season, episode)
info_lookup = {(record["season"], record["episode"]): record['title'] 
               for record in friends_info_data}
info_lookup

{(1, 1): 'The Pilot',
 (1, 2): 'The One with the Sonogram at the End',
 (1, 3): 'The One with the Thumb',
 (1, 4): 'The One with George Stephanopoulos',
 (1, 5): 'The One with the East German Laundry Detergent',
 (1, 6): 'The One with the Butt',
 (1, 7): 'The One with the Blackout',
 (1, 8): 'The One Where Nana Dies Twice',
 (1, 9): 'The One Where Underdog Gets Away',
 (1, 10): 'The One with the Monkey',
 (1, 11): 'The One with Mrs. Bing',
 (1, 12): 'The One with the Dozen Lasagnas',
 (1, 13): 'The One with the Boobies',
 (1, 14): 'The One with the Candy Hearts',
 (1, 15): 'The One with the Stoned Guy',
 (1, 16): 'The One with Two Parts: Part 1',
 (1, 17): 'The One with Two Parts: Part 2',
 (1, 18): 'The One with All the Poker',
 (1, 19): 'The One Where the Monkey Gets Away',
 (1, 20): 'The One with the Evil Orthodontist',
 (1, 21): 'The One with the Fake Monica',
 (1, 22): 'The One with the Ick Factor',
 (1, 23): 'The One with the Birth',
 (1, 24): 'The One Where Rachel Finds Out',
 (

In [None]:


# Now, iterate over friends_data and select dialogues from episodes with IMDb rating > 8.0
high_rating_dialogues = []
for record in friends_data:
    key = (record["season"], record["episode"])
    # Get the episode info if available
    episode_info = info_lookup.get(key)
    if episode_info and episode_info["imdb_rating"] > 8.0:
        # Append a tuple: (episode title, speaker, dialogue)
        high_rating_dialogues.append((episode_info["title"], record["speaker"], record["text"]))

high_rating_dialogues[:3]

[('The Pilot',
  'Monica Geller',
  "There's nothing to tell! He's just some guy I work with!"),
 ('The Pilot',
  'Joey Tribbiani',
  "C'mon, you're going out with the guy! There's gotta be something wrong with him!"),
 ('The Pilot',
  'Chandler Bing',
  'All right Joey, be nice. So does he have a hump? A hump and a hairpiece?')]

### Step 5. Extended List Comprehensions

We can achieve the same result as above using a nested list comprehension. This version is more concise but slightly denser.


In [72]:
# List comprehension version: Extract (title, speaker, dialogue) for high-rated episodes
high_rating_dialogues_comp = [(info_lookup[(record["season"], record["episode"])]["title"], record["speaker"],  record["text"]) for record in friends_data
    if (record["season"], record["episode"]) in info_lookup 
       and info_lookup[(record["season"], record["episode"])]["imdb_rating"] > 8.0
]

high_rating_dialogues_comp[:3]

[('The Pilot',
  'Monica Geller',
  "There's nothing to tell! He's just some guy I work with!"),
 ('The Pilot',
  'Joey Tribbiani',
  "C'mon, you're going out with the guy! There's gotta be something wrong with him!"),
 ('The Pilot',
  'Chandler Bing',
  'All right Joey, be nice. So does he have a hump? A hump and a hairpiece?')]

> - This comprehension does the same filtering as before: it checks if the episode exists in `info_lookup` and whether its IMDb rating exceeds 8.0.
> - For each valid record, it builds a tuple with the episode title, speaker, and dialogue.

### Step 6. Iteration with Enumerate and Zip

Lastly, let’s illustrate using built-in functions like `enumerate()` and `zip()` to process our lists further. For instance, we can label the first 5 dialogue lines from a specific episode and then pair them with the speaker names.

In [73]:
# Get dialogue lines from season 1, episode 1
episode_1_dialogues = [record for record in friends_data 
                       if record["season"] == 1 and record["episode"] == 1]

# Use enumerate to number each dialogue line (starting from 1)
numbered_dialogues = [(i, record["speaker"], record["text"]) 
                      for i, record in enumerate(episode_1_dialogues, start=1)]
numbered_dialogues[:5]

# Suppose we extract two separate lists: one for speakers and one for dialogues
speakers = [record["speaker"] for record in episode_1_dialogues][:5]
dialogues = [record["text"] for record in episode_1_dialogues][:5]

# Use zip() to combine the speakers and dialogues into a formatted string
paired_lines = [f"{speaker}: {dialogue}" for speaker, dialogue in zip(speakers, dialogues)]
paired_lines

["Monica Geller: There's nothing to tell! He's just some guy I work with!",
 "Joey Tribbiani: C'mon, you're going out with the guy! There's gotta be something wrong with him!",
 'Chandler Bing: All right Joey, be nice. So does he have a hump? A hump and a hairpiece?',
 'Phoebe Buffay: Wait, does he eat chalk?',
 'Scene Directions: (They all stare, bemused.)']

> The first block uses a list comprehension with an if‑condition to filter dialogue lines for season 1, episode 1.
> Then, `enumerate()` is used to assign a line number to each dialogue.
> Finally, we use `zip()` to pair the first five speakers with their dialogues, demonstrating how you can combine two lists element‑by‑element.

# Exercises!

Below are two exercises that challenge you to combine everything we’ve learned—from iteration and list methods to comprehensions, dictionary manipulations, string methods, type conversion, f‑strings, and careful handling of copies versus views. Use the two Friends datasets (friends.csv and friends_info.csv) that were loaded earlier (and converted into lists of dictionaries) to complete the following tasks.



### Exercise 1: Episode Dialogue Summary

**Task:**  
Using the `friends_data` (dialogues) and `friends_info_data` (episode metadata), create a summary for each episode in Season 3. For each episode, compute:
- **Episode Title:** (from friends_info_data)
- **Total Number of Dialogue Lines:** Count how many dialogue records occur in that episode.
- **Average Dialogue Length:** Calculate the average number of characters per dialogue line (convert numbers to integers as needed).
- **Unique Speakers:** A sorted (alphabetically) list of all speakers in that episode.

Your final result should be a list of dictionaries (one dictionary per episode), with keys: `"season"`, `"episode"`, `"title"`, `"dialogue_count"`, `"avg_length"`, and `"unique_speakers"`.

*Hint:*  
- First, build a lookup (dictionary) keyed by `(season, episode)` for the episode metadata.  
- Use iteration and list comprehensions to filter and process dialogue records from `friends_data`.  
- Use list methods and string methods where necessary.


In [74]:

# Your solution here:
# 1. Create a lookup dictionary for episodes in Season 3 using friends_info_data.
# 2. Iterate over friends_data to group dialogue records by (season, episode) for Season 3.
# 3. For each episode, compute the dialogue count, average dialogue length, and a sorted list of unique speakers.
# 4. Build and return the final list of dictionaries summarizing the episodes.



### Exercise 2: Character Dialogue Analysis & Transformation

#### Task:
Using the friends_data (dialogues dataset), perform the following transformations and analyses:
- Filter and Analyze Character Dialogues:
  - Create a dictionary where the keys are character names and the values are the number of lines spoken by each character.
  - Find the character who spoke the most lines.

- Format and Transform the Dialogue Data:
  - Use f-strings to format dialogue records as:
    - "<Speaker>: <Dialogue> (Words: X)"
  - Convert dialogues into different formats based on their length:
    - If a dialogue contains more than 12 words, convert it to uppercase.
    - If a dialogue contains between 6 and 12 words, convert it to title case.
    - Otherwise, leave it as is.
- Extract the Most Frequent Words:
  - Create a list of all words spoken in the dataset.
  - Convert them to lowercase and remove punctuation (if any).
  - Find and print the five most common words in Friends dialogues.

#### Hints

- Use dictionary methods to count dialogue occurrences per speaker.
- Use .split() and len() to determine the number of words per dialogue.
- Use conditional expressions inside a list comprehension for text transformation.
- Use collections.Counter to find the most common words.


In [75]:

# Your solution here:

# 1. Create a dictionary that counts how many lines each character has.
# 2. Find the character with the most spoken lines.
# 3. Format each dialogue as "<Speaker>: <Dialogue> (Words: X)".
# 4. Convert dialogue text based on word count:
#    - >12 words → UPPERCASE
#    - 6-12 words → Title Case
#    - Otherwise → Leave as is.
# 5. Extract all words from dialogues, clean them (lowercase, no punctuation), and find the 5 most common words.

