In [1]:
%matplotlib inline


# Sign test
In this tutorial we demonstrate how to check equal size of two samples using the sign test. To do so, we provide an example
in which we check if Heung-Min Son shoots with both feet the same number of times. 


In [2]:
import pandas as pd
import numpy as np
import json
# plotting
import matplotlib.pyplot as plt
#opening data
import os
import pathlib
import warnings

pd.options.mode.chained_assignment = None
warnings.filterwarnings('ignore')

## Opening the dataset

First we open the data. For this example we will use WyScout data from 2017/18 Premier League season.


In [3]:
#open event data
repo_path = pathlib.Path().resolve().parent
path = os.path.join(str(repo_path), 'data', 'Wyscout', 'events', 'events_England.json')
with open(path) as f:
    data = json.load(f) 
train = pd.DataFrame(data)

In [4]:
#path to data
path = os.path.join(str(repo_path), 'data', 'Wyscout', 'players.json')  
#open data
with open(path) as f:
    data = json.load(f)
#save it in a dataframe  
player_df = pd.DataFrame(data)

## Preparing the dataset

First, we filter the events to only keep shots. Then, we check for Son's id in the player database. As the next step,
we keep shots made by him. Then, we look for the shots made with his left (ones with *id* = 401) and right (ones with *id* = 402) foot.
In the end, we create a list with 1's for shots with his left foot and -1's for shots with his right foot.

In [5]:
#take shots only
shots = train.loc[train['subEventName'] == 'Shot'] 
#look for son's id
son_id = player_df.loc[player_df["shortName"] == "Son Heung-Min"]["wyId"].iloc[0]
#get son's shot
son_shots = shots.loc[shots["playerId"] == son_id]

#left leg shots
lefty_shots = son_shots.loc[son_shots.apply (lambda x:{'id':401} in x.tags, axis = 1)]
#right leg shots
righty_shots = son_shots.loc[son_shots.apply (lambda x:{'id':402} in x.tags, axis = 1)]

#create list with ones for left foot shots and -1 for right foot shots   
l = [1] * len(lefty_shots) 
l.extend([-1] * len(righty_shots))

## Testing the hypothesis

Now we can test the hypothesis that Heung-Min Son is indeed ambidextrous. To do so, a [sign test](https://en.wikipedia.org/wiki/Sign_test) is used.
We set the significance level at 0.05. 

The null hypotesis is that  the quantity of shots taken with Son's right foot is statistically equivalent to the quantity of shots taken with his left foot, implying his ambidexterity.

After conducting the hypothesis, there's no significance evidence to reject the null hypothesis. Therefore, we claim that Son shoots with his right and left foot the same number of times.

In [6]:
from statsmodels.stats.descriptivestats import sign_test
test = sign_test(l, mu0 = 0)
pvalue = test[1]

if pvalue < 0.05:
    print("P-value amounts to", str(pvalue)[:5], "- We reject null hypothesis - Heung-Min Son is not ambidextrous")
else:
    print("P-value amounts to", str(pvalue)[:5], " - We do not reject null hypothesis - Heung-Min Son is ambidextrous")

P-value amounts to 0.142  - We do not reject null hypothesis - Heung-Min Son is ambidextrous
