<h3 align='center'>Modified Z Score Tutorial</h3>

I took following file and sampled few rows to demonstrate modified Z score

datasource: https://www.kaggle.com/tmdb/tmdb-movie-metadata/version/2?select=tmdb_5000_movies.csv

In [2]:
import pandas as pd
import numpy as np

In [3]:
df = pd.read_csv("movie_revenues.csv")
df.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",12/10/2009,2787965087,162,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,54000000,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10749, ""...",http://www.youmeanddupree.com/,1819,"[{""id"": 1253, ""name"": ""roommate""}, {""id"": 2038...",en,"You, Me and Dupree",After standing in as best man for his longtime...,18.600367,"[{""name"": ""Universal Pictures"", ""id"": 33}, {""n...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",7/14/2006,130431368,108,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,Two's company. Dupree's a crowd.,"You, Me and Dupree",5.4,407
2,21000000,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 53, ""name...",,2575,"[{""id"": 246, ""name"": ""dancing""}, {""id"": 470, ""...",en,The Tailor of Panama,A British spy is banished to Panama after havi...,7.047975,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""IE"", ""name"": ""Ireland""}, {""is...",2/11/2001,28008462,109,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,"In a place this treacherous, what a good spy n...",The Tailor of Panama,6.2,92
3,1000000,"[{""id"": 80, ""name"": ""Crime""}, {""id"": 18, ""name...",,26791,[],en,Brigham City,Wes Clayton is a lawman and a bishop in a Morm...,0.280083,"[{""name"": ""Main Street Movie Company"", ""id"": 6...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",3/30/2001,852206,119,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,,Brigham City,7.3,4
4,100000000,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 878, ""nam...",,644,"[{""id"": 310, ""name"": ""artificial intelligence""...",en,A.I. Artificial Intelligence,"A robotic boy, the first programmed to love, D...",34.035114,"[{""name"": ""DreamWorks SKG"", ""id"": 27}, {""name""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",6/29/2001,235926552,146,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,David is 11 years old. He weighs 60 pounds. He...,A.I. Artificial Intelligence,6.8,1974


In [184]:
df.revenue.describe()

count    4.600000e+01
mean     1.879289e+08
std      4.551144e+08
min      8.522060e+05
25%      2.866957e+07
50%      8.381714e+07
75%      1.382135e+08
max      2.787965e+09
Name: revenue, dtype: float64

Here revenue is an absolute dollar value. To avoid a large scale, we will add a new column for revenue in millions

In [185]:
df['revenue_mln'] = df['revenue'].apply(lambda x: x/1000000)
df.revenue_mln.describe()

count      46.000000
mean      187.928898
std       455.114423
min         0.852206
25%        28.669569
50%        83.817142
75%       138.213502
max      2787.965087
Name: revenue_mln, dtype: float64

In [186]:
_, mean, std, *_ = df.revenue_mln.describe()

In [187]:
mean

187.92889841304347

In [188]:
std

455.1144234195408

<h3>Outlier detection using Z score</h3>

In [199]:
def get_z_score(value, mean, std):
    return (value - mean)/std

In [200]:
df['z_score'] = df.revenue_mln.apply(lambda x: get_z_score(x, mean, std))
df.head(3)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,revenue_mln,mod_z_score,z_score
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,2787.965087,32.339762,5.712929
1,54000000,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10749, ""...",http://www.youmeanddupree.com/,1819,"[{""id"": 1253, ""name"": ""roommate""}, {""id"": 2038...",en,"You, Me and Dupree",After standing in as best man for his longtime...,18.600367,"[{""name"": ""Universal Pictures"", ""id"": 33}, {""n...",...,108,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,Two's company. Dupree's a crowd.,"You, Me and Dupree",5.4,407,130.431368,0.557474,-0.126336
2,21000000,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 53, ""name...",,2575,"[{""id"": 246, ""name"": ""dancing""}, {""id"": 470, ""...",en,The Tailor of Panama,A British spy is banished to Panama after havi...,7.047975,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,109,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,"In a place this treacherous, what a good spy n...",The Tailor of Panama,6.2,92,28.008462,-0.667434,-0.351385


**General guideline is to treat anything that has a z score of 3 or more as an outlier**

In [201]:
df[df.z_score>3]

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,revenue_mln,mod_z_score,z_score
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,2787.965087,32.339762,5.712929


<h3>Outlier detection using modified Z score</h3>

So we got only 1 outlier using simple Z score. Based on the kind of analysis we are doing sometimes it might be better to use modified Z score as it will return more outliers.

In [191]:
def get_mad(s):
    median = np.median(s)
    diff = abs(s-median)
    MAD = np.median(diff)
    return MAD

In [192]:
MAD = get_mad(df.revenue_mln)
median = np.median(df.revenue_mln)
MAD, median

(56.399542499999995, 83.8171415)

**General guideline for modified Z score is to use 3.5 as a thresold. i.e. anything that has a mod z score of 3.5 or more is an outlier**

In [195]:
def get_modified_z_score(x, median, MAD):
    return 0.6745*(x-median)/MAD

**Test z score function for a sample data point**

In [196]:
get_modified_z_score(2711, median, MAD)

31.41931227648256

In [197]:
df['mod_z_score'] = df.revenue_mln.apply(lambda x: get_modified_z_score(x, median, MAD))
df.head(3)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,revenue_mln,mod_z_score
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,2787965087,162,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,2787.965087,32.339762
1,54000000,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10749, ""...",http://www.youmeanddupree.com/,1819,"[{""id"": 1253, ""name"": ""roommate""}, {""id"": 2038...",en,"You, Me and Dupree",After standing in as best man for his longtime...,18.600367,"[{""name"": ""Universal Pictures"", ""id"": 33}, {""n...",...,130431368,108,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,Two's company. Dupree's a crowd.,"You, Me and Dupree",5.4,407,130.431368,0.557474
2,21000000,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 53, ""name...",,2575,"[{""id"": 246, ""name"": ""dancing""}, {""id"": 470, ""...",en,The Tailor of Panama,A British spy is banished to Panama after havi...,7.047975,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,28008462,109,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,"In a place this treacherous, what a good spy n...",The Tailor of Panama,6.2,92,28.008462,-0.667434


**General guideline is to treat anything that has a modified z score of 3 or more as an outlier**

In [198]:
df[df.mod_z_score>3.5]

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,revenue_mln,mod_z_score
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,2787965087,162,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,2787.965087,32.339762
7,150000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.jurassicworld.com/,135397,"[{""id"": 1299, ""name"": ""monster""}, {""id"": 1718,...",en,Jurassic World,Twenty-two years after the events of Jurassic ...,418.708552,"[{""name"": ""Universal Studios"", ""id"": 13}, {""na...",...,1513528810,124,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The park is open.,Jurassic World,6.5,8662,1513.52881,17.098375
12,25000000,"[{""id"": 53, ""name"": ""Thriller""}, {""id"": 28, ""n...",,619,"[{""id"": 1156, ""name"": ""sister sister relations...",en,The Bodyguard,A former Secret Service agent grudgingly takes...,26.576385,"[{""name"": ""Tig Productions"", ""id"": 335}, {""nam...",...,411006740,129,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,Never let her out of your sight. Never let you...,The Bodyguard,6.1,661,411.00674,3.912964
