![](https://i1.wp.com/learnenglishfunway.com/wp-content/uploads/2020/06/Learn-English-with-Friends.jpg?resize=620%2C465&ssl=1)

**Friends** is an American situation comedy about six 20-30s-year old friends living in the New York City borough of Manhattan. It was created by David Crane and Marta Kauffman, which premiered on NBC on September 22, 1994.

Some informaton about Friends

**Format**: Sitcom

**Episode Count**: 236

**No. Of Seasons**: 10

**Run time**:
20–22 minutes (per episode, edited)
Up to 30 minutes (per episode, uncut)

**Network(s)**: NBC (original network)

**First Aired:** September 22, 1994

**Last Aired:** May 6, 2004

**Characters**:

Jennifer Aniston as **Rachel Greene**

Courteney Cox as **Monica Geller**

Lisa Kudrow as **Phoebe Buffay**

Matt Le Blanc as **Joey Tribbiani**

Matthew Perry as **Chandler Bing**

David Schwimmer as **Ross Geller**


Friends received positive reviews throughout its run, and became one of the most popular sitcoms of its time. The series won many awards and was nominated for 63 Primetime Emmy Awards. The series was also very successful in the ratings, consistently ranking in the top ten in the final primetime ratings.



**On this notebook, we will be doing EDA on FRIENDS.**

**Load the important required libraries**





In [None]:
import pandas as pd
import numpy as np 
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')


**Load the dataset Now**

In [None]:
friends = pd.read_csv("../input/friends-series-dataset/friends_episodes_v2.csv")

**Checking first 5 and last 5 records from the datasets.**

In [None]:
friends.head(5)

In [None]:
friends.tail(5)

**Look at the records no 233 and 234. Both the episodes name are same. We will have to rename that to avoid duplication.**

In [None]:
friends.loc[233,'Episode_Title'] = "The Last One I"
friends.loc[234,"Episode_Title"] = "The Last One II"

In [None]:
friends.tail(5)

In [None]:
friends.duplicated().sum()


**So, there are no more duplicated values are present in data sets.**

In [None]:
friends.info()

**Changing the data type of Season to object.**

In [None]:
friends['Season'] = friends['Season'].astype("object")


In [None]:
friends.info()


In [None]:
friends.shape


**So, there 235 records in 8 columns. Also, there are no null records.**

# **Exploratory Data Analysis - EDA**

In [None]:
print("The show started in {} and ended in {}".format(min(friends['Year_of_prod']),max(friends['Year_of_prod'])))

In [None]:
friends['Season'].value_counts().sort_index()


In [None]:
plt.figure(figsize=(10,5))
plt.xlabel("Season")
plt.title("Count of episodes")
sns.countplot(x = "Season", data = friends,palette='inferno')

**From above table and plot, we can observe that Season 3 and Season 6 has the maximum episodes i.e 25. Also, Season 10 has the minimum episodes i.e 18.**



In [None]:
season_duration = friends.groupby('Season').Duration.sum().to_frame().reset_index()
season_duration

In [None]:
plt.figure(figsize=(10,5))
sns.barplot(x=season_duration.Season, y=season_duration.Duration, palette='mako')
plt.title('Total Duration of each Season', fontsize=15)
plt.xlabel('Duration')
plt.ylabel('Season')
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.ylim(300, 600)

**From above table and plot, longest season 6 is with 582 mins and shortest season 10 is with 421 mins.**

In [None]:
season_stars = friends.groupby('Season').mean().Stars.to_frame().reset_index()
season_stars.columns = ['Season','Average Stars']
season_stars = season_stars.sort_values('Average Stars', ascending=False)
season_stars


In [None]:
plt.figure(figsize=(10,5))
sns.barplot(y=season_stars.Season, x=season_stars['Average Stars'], palette='magma', orient='h')
plt.title('Avg Stars of each Season', fontsize=15)
plt.xlabel('Avg Stars')
plt.ylabel('Season')
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.xlim(8, 9)

**From above table and plot, it is quite evident that 1st season has the lowest average star rating while the last season has the most average star ratings.**

In [None]:
season_pop_epi = friends[['Episode_Title', 'Stars']].sort_values('Stars', ascending=False).head(10).reset_index(drop=True)
season_pop_epi


In [None]:
plt.figure(figsize=(10,5))
sns.barplot(y=season_pop_epi.Episode_Title, x=season_pop_epi.Stars, palette='twilight', orient='h')
plt.title('Top 10 High-Rated Episodes', fontsize=15)
plt.xlabel('IMDB Stars')
plt.ylabel('Episodes')
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.xlim(9, 10)

**From above table and plot, it is quite evident that season finale and "The One Where Everybody Finds Out" have the highest Stars.**

In [None]:
director_count = friends.groupby("Director").Episode_Title.count().sort_values(ascending=False)
director_count

In [None]:
director_count.count()

In [None]:
top10_dir = director_count.head(10).reset_index()
top10_dir

In [None]:
plt.figure(figsize=(18,5))
sns.barplot(x=top10_dir['Director'], y=top10_dir['Episode_Title'], palette='Oranges')
plt.title('Top 10 High-Rated Episodes', fontsize=15)
plt.xlabel('Directed By')
plt.ylabel('No Of Episodes')

**Gary Halvorson and Kevin Bright has directed the maximum no of episodes, 54 each.**

**Following are the director who directed only 1 episode**



In [None]:
director_1 = director_count.reset_index()
director_1.columns= ["Director","Episode Count"]

director_1 = director_1[director_1['Episode Count'] == 1]
director_1

In [None]:
Director_Rating = friends.groupby("Director").Stars.agg(['count','mean']).sort_values(by="mean")
print(Director_Rating)

**From this we can observe that Todd Holland has least average rating and Joe Regalbuto has the highest rating.**

![](https://i.gifer.com/1Xzq.gif)