Создадим первую выборку

In [17]:
import pandas as pd
data = pd.read_csv('Data.csv')
data = data[data['Name'] == 'Tim Duncan']
first_selection = data['FieldGoalsPercentage'][~data['FieldGoalsPercentage'].isin([0, 100])]
first_selection

7        40.0
26       60.9
42       33.3
57       61.5
72       61.5
         ... 
15570    57.1
15585    12.5
15598    25.0
15629    16.7
15637    50.0
Name: FieldGoalsPercentage, Length: 527, dtype: float64

И вторую


In [24]:
second_selection = data['FieldGoalsAttempted']
second_selection

7        10
26       23
42        6
57       13
72       13
         ..
15585     8
15598     4
15612     0
15629     6
15637    14
Name: FieldGoalsAttempted, Length: 537, dtype: int64

Теперь построим вариационный ряд для первой выборки

In [36]:
def variation_series(selection, step):
    length = len(selection)
    maximum = selection.max()
    current = 0
    series = {}
    while current < maximum:
        key = '%s - %s' % (current, current + step)
        series[key] = [len(selection[(current <= selection) & (selection < current + step)]) / length]
        current += step
    return pd.DataFrame(series)

variation_series(first_selection, 10)

Unnamed: 0,0 - 10,10 - 20,20 - 30,30 - 40,40 - 50,50 - 60,60 - 70,70 - 80,80 - 90
0,0.001898,0.020873,0.041746,0.148008,0.227704,0.301708,0.184061,0.049336,0.02277



И для второй

In [37]:
variation_series(second_selection, 5)

Unnamed: 0,0 - 5,5 - 10,10 - 15,15 - 20,20 - 25,25 - 30
0,0.027933,0.243948,0.435754,0.223464,0.063315,0.005587



Выборочное среднее для первой

In [38]:
first_selection.mean()

49.92390891840608


И для второй

In [39]:
second_selection.mean()

12.327746741154563

Выборочная дисперсия для первой

In [47]:
import math

def central_moment(selection, k):
    length = len(selection)
    mean = selection.mean()
    sum = 0
    for item in selection:
        sum += math.pow(item - mean, k)
    return sum / length

central_moment(first_selection, 2)

198.06443785128724


И для второй

In [48]:
central_moment(second_selection, 2)

20.261297157461467


Исправленная дисперсия для первой

In [49]:
def corrected_variance(selection):
    length = len(selection)
    return (length / (length - 1)) * central_moment(selection, 2)

corrected_variance(first_selection)

198.44098621222125


И для второй

In [50]:
corrected_variance(second_selection)

20.299098084994046

Коэффициент ассиметрии для первой

In [51]:
def asymmetry_coefficient(selection):
    moment3 = central_moment(selection, 3)
    moment2 = central_moment(selection, 2)
    return moment3 / math.pow(moment2, 3 / 2)

asymmetry_coefficient(first_selection)

-0.043344497921975284

И для второй

In [52]:
asymmetry_coefficient(second_selection)

0.20449439855961962

Эксцесс для первой выборки

In [53]:
def excess(selection):
    moment4 = central_moment(selection, 4)
    moment2 = central_moment(selection, 2)
    return (moment4 / math.pow(moment2, 2)) - 3

excess(first_selection)

0.15686705819937075

И для второй

In [54]:
excess(second_selection)

-0.14326343821126164

Размах первой выборки

In [55]:
def selection_range(selection):
    return selection.max() - selection.min()

selection_range(first_selection)

81.7

Второй

In [56]:
selection_range(second_selection)

26

Медиана первой выборки

In [57]:
first_selection.median()

50.0

Второй

In [58]:
second_selection.median()

12.0

Квартили и квантиль уровня 1/3 первой выборки

In [66]:
print('q = 1/4; Z = %s' % first_selection.quantile(.25))
print('q = 1/2; Z = %s' % first_selection.median())
print('q = 3/4; Z = %s' % first_selection.quantile(.75))
print('q = 1/3; Z = %s' % first_selection.quantile(1/3))

q = 1/4; Z = 40.0
q = 1/2; Z = 50.0
q = 3/4; Z = 60.0
q = 1/3; Z = 43.8


Второй

In [67]:
print('q = 1/4; Z = %s' % second_selection.quantile(.25))
print('q = 1/2; Z = %s' % second_selection.median())
print('q = 3/4; Z = %s' % second_selection.quantile(.75))
print('q = 1/3; Z = %s' % second_selection.quantile(1/3))

q = 1/4; Z = 9.0
q = 1/2; Z = 12.0
q = 3/4; Z = 15.0
q = 1/3; Z = 10.0


Гистограмма, полигон частот и график плотности (вероятностей) нормального закона распределения для первой выборки

In [None]:
import matplotlib.pyplot as plt

%matplotlib inline


