# Data Cleaning for movies data

The whole purpose of this notebook is to clean data of movies data.
The steps for cleaning are as above:

## Clean columns
- Convert released into released_date and released_place
- Check duplicated values

## Fill in null values

1. **Fill "Others" for Object columns**: rating, released_place, writer, star, country, company
2. **Fill with 0 for float64 columns**: score, votes, runtime
3. **Fill with np.mean for columns**: budget, gross
4. **Fill with "0000-00-00" for column**: released_date

Then we will save the cleaned data file as csv

This notebook is based on [DAKKATA's notebook](https://www.kaggle.com/code/dakkatadigvijayreddy/exploratory-data-analysis-in-the-movie-industry/notebook#5.Data-Cleaning)

In [None]:
# Core libraries
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import datetime as dt
%matplotlib inline

# Terminate Warnings
warnings.filterwarnings("ignore",category=FutureWarning)

# Set Maximum Columns To be Display In DataFrame As 20.
pd.set_option("display.max_columns",20)
plt.style.use('seaborn')

df = pd.read_csv(r'Data\movies.csv')

In [None]:
df.head()

In [None]:
# 1. Creating a copy of our dataset
df2 = df.copy()
df2

In [None]:
#2. Split 'released' into 'released_date' and 'released_place'
y = df2['released'].str.replace(")","").str.split("(",expand=True).rename(columns={0:'released_date',1:'released_place'})

In [None]:
#3. Add 'released_date' and 'released_place' into df
df2.insert(4,'released_date',y['released_date'])
df2.insert(5,'released_place',y['released_place'])

In [None]:
# 4. Drop 'released' column
df2.drop(['released'],axis=1,inplace=True)

In [None]:
# 5. Convert released_date into datetime dtype
df2['released_date'] = df2['released_date'].astype(str).astype('datetime64')
df2.head(2)

In [None]:
# 6. Check duplicated rows
df2.duplicated().value_counts() #True means duplicate rows.

# Filling null values

In [None]:
# 7. Check null values
df2.isna().sum().sort_values(ascending=False)

In [None]:
# 8. Check overall null
df2.info()

# Tactics for filling null values
<div style="direction: ltr;">
	<table style="direction: ltr; border-collapse: collapse; border: 1pt solid #A3A3A3;" title="" border="1" summary="" cellspacing="0" cellpadding="0">
		<tbody>
			<tr>
				<td style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<h3 style="margin: 0in; font-family: Calibri; font-size: 12.0pt; color: #5b9bd5;">#</h3>
				</td>
				<td style="vertical-align: top; width: 1.1368in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<h3 style="margin: 0in; font-family: Calibri; font-size: 12.0pt; color: #5b9bd5;">Column</h3>
				</td>
				<td style="vertical-align: top; width: 1.2729in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<h3 style="margin: 0in; font-family: Calibri; font-size: 12.0pt; color: #5b9bd5;"><span lang="en-US">Non-Null</span><span lang="vi"> Count</span></h3>
				</td>
				<td style="vertical-align: top; width: 1.152in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<h3 style="margin: 0in; font-family: Calibri; font-size: 12.0pt; color: #5b9bd5;">Dtype</h3>
				</td>
				<td style="vertical-align: top; width: 1.0152in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<h3 lang="vi" style="margin: 0in; font-family: Calibri; font-size: 12.0pt; color: #5b9bd5;">Fill null</h3>
				</td>
			</tr>
			<tr>
				<td style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">0</p>
				</td>
				<td style="vertical-align: top; width: 1.1368in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">name</p>
				</td>
				<td style="vertical-align: top; width: 1.2534in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">7668</p>
				</td>
				<td style="vertical-align: top; width: 1.152in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">object</p>
				</td>
				<td style="vertical-align: top; width: .9104in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">&nbsp;</p>
				</td>
			</tr>
			<tr>
				<td style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">1</p>
				</td>
				<td style="vertical-align: top; width: 1.1368in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">rating</p>
				</td>
				<td style="vertical-align: top; width: 1.2534in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">7591</p>
				</td>
				<td style="vertical-align: top; width: 1.152in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">object</p>
				</td>
				<td style="vertical-align: top; width: .9104in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p lang="vi" style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">Other</p>
				</td>
			</tr>
			<tr>
				<td style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">2</p>
				</td>
				<td style="vertical-align: top; width: 1.1368in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">genre</p>
				</td>
				<td style="vertical-align: top; width: 1.2534in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">7668</p>
				</td>
				<td style="vertical-align: top; width: 1.152in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">object</p>
				</td>
				<td style="vertical-align: top; width: .9104in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">&nbsp;</p>
				</td>
			</tr>
			<tr>
				<td style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">3</p>
				</td>
				<td style="vertical-align: top; width: 1.1368in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">year</p>
				</td>
				<td style="vertical-align: top; width: 1.2534in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">7668</p>
				</td>
				<td style="vertical-align: top; width: 1.152in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">int64</p>
				</td>
				<td style="vertical-align: top; width: .9104in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p lang="vi" style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">&nbsp;</p>
				</td>
			</tr>
			<tr>
				<td style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">4</p>
				</td>
				<td style="vertical-align: top; width: 1.1368in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">released_date</p>
				</td>
				<td style="vertical-align: top; width: 1.2534in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">7666</p>
				</td>
				<td style="vertical-align: top; width: 1.1715in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">datetime64[ns]</p>
				</td>
				<td style="vertical-align: top; width: 1.1048in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p lang="vi" style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">0000-00-00</p>
				</td>
			</tr>
			<tr>
				<td style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">5</p>
				</td>
				<td style="vertical-align: top; width: 1.1562in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">released_place</p>
				</td>
				<td style="vertical-align: top; width: 1.2534in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">7666</p>
				</td>
				<td style="vertical-align: top; width: 1.152in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">object</p>
				</td>
				<td style="vertical-align: top; width: .8965in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p lang="vi" style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">Other</p>
				</td>
			</tr>
			<tr>
				<td style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">6</p>
				</td>
				<td style="vertical-align: top; width: 1.1368in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">score</p>
				</td>
				<td style="vertical-align: top; width: 1.2534in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">7665</p>
				</td>
				<td style="vertical-align: top; width: 1.152in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">float64</p>
				</td>
				<td style="vertical-align: top; width: 1.1048in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p lang="vi" style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">Fill with 0</p>
				</td>
			</tr>
			<tr>
				<td style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">7</p>
				</td>
				<td style="vertical-align: top; width: 1.1368in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">votes</p>
				</td>
				<td style="vertical-align: top; width: 1.2534in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">7665</p>
				</td>
				<td style="vertical-align: top; width: 1.152in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">float64</p>
				</td>
				<td style="vertical-align: top; width: 1.1048in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p lang="vi" style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">Fill with 0</p>
				</td>
			</tr>
			<tr>
				<td style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">8</p>
				</td>
				<td style="vertical-align: top; width: 1.1368in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">director</p>
				</td>
				<td style="vertical-align: top; width: 1.2534in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">7668</p>
				</td>
				<td style="vertical-align: top; width: 1.152in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">object</p>
				</td>
				<td style="vertical-align: top; width: .9104in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">&nbsp;</p>
				</td>
			</tr>
			<tr>
				<td style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">9</p>
				</td>
				<td style="vertical-align: top; width: 1.1368in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">writer</p>
				</td>
				<td style="vertical-align: top; width: 1.2534in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">7665</p>
				</td>
				<td style="vertical-align: top; width: 1.152in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">object</p>
				</td>
				<td style="vertical-align: top; width: .9104in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p lang="vi" style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">Other</p>
				</td>
			</tr>
			<tr>
				<td style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">10</p>
				</td>
				<td style="vertical-align: top; width: 1.1368in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">star</p>
				</td>
				<td style="vertical-align: top; width: 1.2534in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">7667</p>
				</td>
				<td style="vertical-align: top; width: 1.152in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">object</p>
				</td>
				<td style="vertical-align: top; width: .9104in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p lang="vi" style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">Other</p>
				</td>
			</tr>
			<tr>
				<td style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">11</p>
				</td>
				<td style="vertical-align: top; width: 1.1368in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">country</p>
				</td>
				<td style="vertical-align: top; width: 1.2534in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">7665</p>
				</td>
				<td style="vertical-align: top; width: 1.152in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">object</p>
				</td>
				<td style="vertical-align: top; width: .9104in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p lang="vi" style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">Other</p>
				</td>
			</tr>
			<tr>
				<td style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">12</p>
				</td>
				<td style="vertical-align: top; width: 1.1368in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">budget</p>
				</td>
				<td style="vertical-align: top; width: 1.2534in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">5497</p>
				</td>
				<td style="vertical-align: top; width: 1.152in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">float64</p>
				</td>
				<td style="vertical-align: top; width: 1.209in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p lang="vi" style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">Fill with np.mean</p>
				</td>
			</tr>
			<tr>
				<td style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">13</p>
				</td>
				<td style="vertical-align: top; width: 1.1368in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">gross</p>
				</td>
				<td style="vertical-align: top; width: 1.2534in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">7479</p>
				</td>
				<td style="vertical-align: top; width: 1.152in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">float64</p>
				</td>
				<td style="vertical-align: top; width: 1.209in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p lang="vi" style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">Fill with np.mean</p>
				</td>
			</tr>
			<tr>
				<td style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">14</p>
				</td>
				<td style="vertical-align: top; width: 1.1368in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">company</p>
				</td>
				<td style="vertical-align: top; width: 1.2534in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">7651</p>
				</td>
				<td style="vertical-align: top; width: 1.152in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">object</p>
				</td>
				<td style="vertical-align: top; width: .9104in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p lang="vi" style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">Other</p>
				</td>
			</tr>
			<tr>
				<td style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">15</p>
				</td>
				<td style="vertical-align: top; width: 1.1368in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">runtime</p>
				</td>
				<td style="vertical-align: top; width: 1.2534in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">7664</p>
				</td>
				<td style="vertical-align: top; width: 1.152in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">float64</p>
				</td>
				<td style="vertical-align: top; width: 1.1048in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;">
					<p lang="vi" style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">Fill with 0</p>
				</td>
			</tr>
		</tbody>
	</table>
</div>

## To sum up
1. Fill "Others" for Object columns: rating, released_place, writer, star, country, company
2. Fill with 0 for float64 columns: score, votes, runtime
3. Fill with np.mean for columns: budget, gross
4. Fill with "0000-00-00" for column: released_date

In [None]:
# 9. Fill "Others" for Object columns
columns_1 = ['rating','company','writer','country','released_place','star']
for i in columns_1:
    df2[i].fillna("Others",inplace=True)

df2.isna().sum().sort_values(ascending=False)

In [None]:
# 10. Fill 0 for float64 columns
columns_2 = ['runtime','score','votes']
for j in columns_2:
    df2[j].fillna(0,inplace=True)

df2.isna().sum().sort_values(ascending=False)

In [None]:
# 11. Fill np.mean for 'budget' and 'gross'
columns_3 = ['budget','gross']
for k in columns_3:
    df2[k].fillna(round(np.mean(df2[k])),inplace=True)

df2.isna().sum().sort_values(ascending=False)

In [None]:
# 12. Fill 0000-00-00 for 'year' column
df2['released_date'].fillna('0000-00-00',inplace=True)

df2.isna().sum().sort_values(ascending=False)

In [None]:
# 13. Recheck null values in dataset
df2.isna().sum().sort_values(ascending=False)[:-4].reset_index().rename(columns={"index":"columns",0:"Null_Values"})

In [None]:
# 14. Save cleaned dataset to csv
df2.to_csv("Cleaned_Movie_industry.csv")