 1. Downloading the libraries:

In [1]:
import pandas as pd
import scipy.stats as st
import numpy as np
from bstrap import bootstrap

2. Importing Kinopoisk data:

In [11]:
data = pd.read_csv('kinopoisk rating.csv', sep=';', encoding='windows-1251')

In [3]:
data.head()

Unnamed: 0,num,name_rus,rating_new,origin,genre,rating_old,qty_views
0,1,Зеленая миля,9.1,США,фэнтези/ драма,8.9,692418
1,2,Побег из Шоушенка,9.1,США,драма,8.9,784326
2,3,Властелин колец: Возвращение короля,8.6,Новая Зеландия/ США,фэнтези/ приключения,8.8,481829
3,4,Властелин колец: Две крепости,8.6,Новая Зеландия/ США,фэнтези/ приключения,8.8,467607
4,5,Властелин колец: Братство Кольца,8.6,Новая Зеландия/ США,фэнтези/ приключения,8.8,516856


3. Testing Mann-Whitney U test:

- we can use it without data normalization
- and when sample size is small

 H0 - there is no statistical difference between rating_new and rating_old
 Ha - the difference between rating_new and rating_old exists

In [4]:
st.mannwhitneyu(data.rating_old, data.rating_new)

MannwhitneyuResult(statistic=31324.5, pvalue=0.9629567921262221)

4. Conclusion: for all movies pvalue is not less than 0.05, therefore statistical difference between two groups is not relevant. Old rating does not differ from new rating

5. Let's check the same thing, but for films with the similiar genre (e.g. USA and action movies):

In [5]:
data_USA = data[data.origin.str.contains('США') & data.genre.str.contains('боевик')]
data_USA.shape

(33, 7)

In [6]:
st.mannwhitneyu(data_USA.rating_old, data_USA.rating_new)

MannwhitneyuResult(statistic=699.0, pvalue=0.042728150663774264)

6. Conclusion: for USA action movies pvalue is  less than 0.05, therefore statistical difference between two groups is relevant. Old rating does differ from new rating

7. Let's try to apply bootstrap:
- we chose mean as our metric

In [7]:
stats_rt_old, stats_rt_new, p_value = bootstrap(np.mean, data.rating_old, data.rating_new, nbr_runs=1000)
print(stats_rt_old)
print(stats_rt_new)
print(p_value)

{'avg_metric': 8.179111599999999, 'metric_ci_lb': 8.16, 'metric_ci_ub': 8.2}
{'avg_metric': 8.1837548, 'metric_ci_lb': 8.15398, 'metric_ci_ub': 8.212399999999999}
0.836


8. Conclusion: once again for all movies pvalue is not less than 0.05, therefore statistical difference between two groups is not relevant. Old rating does not differ from new rating

In [8]:
stats_rt_old_USA, stats_rt_new_USA, p_value = bootstrap(np.mean, data_USA.rating_old, data_USA.rating_new, nbr_runs=1000)
print(stats_rt_old_USA)
print(stats_rt_new_USA)
print(p_value)

{'avg_metric': 8.162427272727273, 'metric_ci_lb': 8.118181818181817, 'metric_ci_ub': 8.212121212121213}
{'avg_metric': 8.08840909090909, 'metric_ci_lb': 8.012121212121212, 'metric_ci_ub': 8.166666666666666}
0.171


9. Conclusion: unfortunately, bootstrap did not prove that result for USA action movies are stable. Our dataset is small (only 33 rows), therefore, when we resample it, using bootstrap, we did not get that difference between two groups is relevant. Old rating does not differ from new rating