#  Netflix Movies and TV Shows 데이터셋 분석
<br>
<br>

### 데이터셋 출처
* https://www.kaggle.com/datasets/shivamb/netflix-shows
<br>
<br>

### 데이터 간단 이해
* Netflix Movies and TV Shows

```
About this Dataset: Netflix is one of the most popular media and video streaming platforms. They have over 8000 movies or tv shows available on their platform, as of mid-2021, they have over 200M Subscribers globally. This tabular dataset consists of listings of all the movies and tv shows available on Netflix, along with details such as - cast, directors, ratings, release year, duration, etc.
```

```
* Netflix 는 가장 인기 있는 미디어 및 비디오 스트리밍 플랫폼 중 하나이다.
* 데이터 세트를 통해 Netflix의 영화, TV프로그램의 목록과 출연진, 감독, 등급, 출시 연도, 기간 등과 같은 세부 정보를 얻을 수 있다.
```
<br>
<br>

### 학습 내용
* 데이터 분석을 위해 배운 내용을 실습해본다.
* Netflix Movies and TV Shows Data를 분석하여 Netflix 트렌드를 살펴본다.

## 데이터에서 무엇을 확인해볼까?
* Movie가 많을까 TV Show가 많을까?
* 제작 국가 country는 어디가 가장 많을까? 한국은 몇 번째나 될까?
* 공개 된 날짜 date_added는 어떤 범위를 보일까?
* 어떤 월에 가장 많을까?
* 어떤 장르가 많은가?
* 가장 많은 관람 등급은 무엇일까?

### 01. 라이브러리 불러오기

In [11]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns

#### 라이브러리 확인하기

In [12]:
print("pandas 버전 : ", pd.__version__)
print("NumPy 버전 : ", np.__version__)
print("matplotlib 버전 : ", matplotlib.__version__)

pandas 버전 :  1.1.3
NumPy 버전 :  1.18.5
matplotlib 버전 :  3.3.2


### 02. 데이터 불러오기

In [13]:
netflix = pd.read_csv("./data/netflix/netflix_titles.csv")
netflix.shape

(8807, 12)

#### 데이터 찍어보기 (정보 확인)

In [14]:
netflix.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


In [15]:
netflix.tail()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
8802,s8803,Movie,Zodiac,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...",United States,"November 20, 2019",2007,R,158 min,"Cult Movies, Dramas, Thrillers","A political cartoonist, a crime reporter and a..."
8803,s8804,TV Show,Zombie Dumb,,,,"July 1, 2019",2018,TV-Y7,2 Seasons,"Kids' TV, Korean TV Shows, TV Comedies","While living alone in a spooky town, a young g..."
8804,s8805,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,"November 1, 2019",2009,R,88 min,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...
8805,s8806,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,"January 11, 2020",2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero..."
8806,s8807,Movie,Zubaan,Mozez Singh,"Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan...",India,"March 2, 2019",2015,TV-14,111 min,"Dramas, International Movies, Music & Musicals",A scrappy but poor boy worms his way into a ty...


In [16]:
netflix.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB


#### 데이터 컬럼 설명
|No.|컬럼명| 설명 |
|:---:|:---:|:---:|
|0|show_id|id|
|1|type|TV or Movie|
|2|title|제목|
|3|director|제작자|
|4|cast|출연진|
|5|country|제작국가|
|6|date_added|Netflix에 공개된 날짜|
|7|release_year|출시년도|
|8|rating|관람등급|
|9|duration|작품 길이|
|10|listed_in|분류|
|11|description|개요|

In [17]:
netflix.describe()

Unnamed: 0,release_year
count,8807.0
mean,2014.180198
std,8.819312
min,1925.0
25%,2013.0
50%,2017.0
75%,2019.0
max,2021.0


In [18]:
netflix.describe(include = np.object_)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,rating,duration,listed_in,description
count,8807,8807,8807,6173,7982,7976,8797,8803,8804,8807,8807
unique,8807,2,8807,4528,7692,748,1767,17,220,514,8775
top,s7046,Movie,Dolly Parton’s Christmas on the Square,Rajiv Chilaka,David Attenborough,United States,"January 1, 2020",TV-MA,1 Season,"Dramas, International Movies","Paranormal activity at a lush, abandoned prope..."
freq,1,6131,1,19,19,2818,109,3207,1793,362,4


In [19]:
print(netflix.isnull().sum())

show_id            0
type               0
title              0
director        2634
cast             825
country          831
date_added        10
release_year       0
rating             4
duration           3
listed_in          0
description        0
dtype: int64


### 03. 데이터 전처리

#### 1) 중복데이터 확인

In [21]:
netflix.duplicated().sum()

0

#### 2) 사용하지 않는 컬럼 삭제
* 'director', 'cast' 그 값을 평균이나 임의로 대체할 수 없고, 결측치가 많아 삭제 예정이다.

#### 3) 결측치 처리
* 결측치 삭제 : drop()
* 결측치 다른 값으로 변경 : fillna()

In [22]:
# 결측치 처리 : 삭제
netfl = netflix.drop(["director", "cast"], axis = 1)
print(netfl.shape)

# 결측치 처리 : 변경
netfl["country"] = netfl["country"].fillna("Unkonwn")
netfl["country"].isnull().sum()

(8807, 10)


0

#### 4) 컬럼 'country'에 대한 전처리
* 여러 국가가 들어있으므로 제일 앞 국가명만 남기고 컬럼을 생성해보자.

In [23]:
netfl["country"].unique()

array(['United States', 'South Africa', 'Unkonwn', 'India',
       'United States, Ghana, Burkina Faso, United Kingdom, Germany, Ethiopia',
       'United Kingdom', 'Germany, Czech Republic', 'Mexico', 'Turkey',
       'Australia', 'United States, India, France', 'Finland',
       'China, Canada, United States',
       'South Africa, United States, Japan', 'Nigeria', 'Japan',
       'Spain, United States', 'France', 'Belgium',
       'United Kingdom, United States', 'United States, United Kingdom',
       'France, United States', 'South Korea', 'Spain',
       'United States, Singapore', 'United Kingdom, Australia, France',
       'United Kingdom, Australia, France, United States',
       'United States, Canada', 'Germany, United States',
       'South Africa, United States', 'United States, Mexico',
       'United States, Italy, France, Japan',
       'United States, Italy, Romania, United Kingdom',
       'Australia, United States', 'Argentina, Venezuela',
       'United States, Unit

In [24]:
netfl["country_n"] = netfl["country"].apply(lambda x: x.split(',')[0])

print(netfl.shape)
netfl.head()

(8807, 11)


Unnamed: 0,show_id,type,title,country,date_added,release_year,rating,duration,listed_in,description,country_n
0,s1,Movie,Dick Johnson Is Dead,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",United States
1,s2,TV Show,Blood & Water,South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",South Africa
2,s3,TV Show,Ganglands,Unkonwn,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,Unkonwn
3,s4,TV Show,Jailbirds New Orleans,Unkonwn,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",Unkonwn
4,s5,TV Show,Kota Factory,India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,India


* split 함수를 lambda 안에 추가해 콤마(,)를 기준으로 나눈다.
* 그 중 첫 번째([0]) 를 남기는 것으로 적용해주었다.

#### (추가) lambda, split을 활용해 data_added도 연도와 날짜로 분리해보자.

### 04. 시각화

In [34]:
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
print(matplotlib.__version__)

3.3.2


#### Type 시각화해보기

In [28]:
n_type = netfl["type"].value_counts()

In [29]:
n_type

Movie      6131
TV Show    2676
Name: type, dtype: int64