In [36]:
import pandas as pd
from scipy.stats import mannwhitneyu
import numpy as np

In [5]:
t = pd.read_csv('2022-09-21 17-47-27.csv', sep = ';', encoding = 'cp1251')
t.head()

Unnamed: 0,num,name_rus,rating_new,origin,genre,rating_old,qty_views
0,1,Зеленая миля,9.1,США,фэнтези/ драма,8.9,692418
1,2,Побег из Шоушенка,9.1,США,драма,8.9,784326
2,3,Властелин колец: Возвращение короля,8.6,Новая Зеландия/ США,фэнтези/ приключения,8.8,481829
3,4,Властелин колец: Две крепости,8.6,Новая Зеландия/ США,фэнтези/ приключения,8.8,467607
4,5,Властелин колец: Братство Кольца,8.6,Новая Зеландия/ США,фэнтези/ приключения,8.8,516856


In [10]:
t.describe()

Unnamed: 0,num,rating_new,rating_old,qty_views
count,250.0,250.0,250.0,250.0
mean,125.5,8.1844,8.1796,289378.3
std,72.312977,0.272643,0.193114,186828.4
min,1.0,7.6,8.0,20056.0
25%,63.25,8.0,8.0,145021.2
50%,125.5,8.1,8.1,251629.5
75%,187.75,8.3,8.3,405379.0
max,250.0,9.1,8.9,1303016.0


According to description data, ratings appear to be nearly constant throughout time, with a similar mean, median, and standard deviation. Let's use the Mann-Whitney U test to compare the total ratings of the new and old ratings.

In [8]:
u1, p = mannwhitneyu(x = t.rating_new, y = t.rating_old, alternative = 'two-sided')
u1, p

(31175.5, 0.9629567921262221)

So, the p-value is too high, there isn't a statistically significant difference in the ratings overall. Let's analyze the data by genre and nation:

In [15]:
t.groupby('origin').count().sort_values(by = 'num', ascending = False).head()

Unnamed: 0_level_0,num,name_rus,rating_new,genre,rating_old,qty_views
origin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
США,110,110,110,110,110,110
СССР,31,31,31,31,31,31
Великобритания/ США,17,17,17,17,17,17
США/ Германия,10,10,10,10,10,10
Япония,9,9,9,9,9,9


Let's examine the distinctions between the USA and the USSR over time.

In [16]:
u1, p = mannwhitneyu(x = t.loc[t.origin == 'США', 'rating_new'], y = t.loc[t.origin == 'США', 'rating_old']
                     , alternative = 'two-sided')
u1, p

(5634.0, 0.3718029670954922)

In [17]:
u1, p = mannwhitneyu(x = t.loc[t.origin == 'СССР', 'rating_new'], y = t.loc[t.origin == 'СССР', 'rating_old']
                     , alternative = 'two-sided')
u1, p

(669.0, 0.007358118927667391)

In [19]:
t.loc[t.origin == 'СССР', 'rating_new'].shape

(31,)

There is no statistically significant difference over time for the USA, but there is for the USSR. The Mann-Whitney U test should still be able to identify it despite the tiny sample size. Let's examine genres.

In [21]:
t.groupby('genre').count().sort_values(by = 'num', ascending = False).head()

Unnamed: 0_level_0,num,name_rus,rating_new,origin,rating_old,qty_views
genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
фантастика/ боевик,19,19,19,19,19,19
мультфильм/ фэнтези,13,13,13,13,13,13
драма/ мелодрама,13,13,13,13,13,13
драма,12,12,12,12,12,12
триллер/ драма,11,11,11,11,11,11


In [22]:
u1, p = mannwhitneyu(x = t.loc[t.genre == 'фантастика/ боевик', 'rating_new'], y = t.loc[t.genre == 'фантастика/ боевик', 'rating_old']
                     , alternative = 'two-sided')
u1, p

(121.5, 0.0787049093768963)

There isn't any statistical difference between action movie ratings over time, or at least there is very little likelihood that there is. Let's use bootstrap to verify the same thing.

In [34]:
from bstrap import bootstrap

In [38]:
old_stats, new_stats, p_value = bootstrap(np.mean, t.rating_old, t.rating_new, nbr_runs=1000)
old_stats, new_stats, p_value

({'avg_metric': 8.179429999999991,
  'metric_ci_lb': 8.160399999999992,
  'metric_ci_ub': 8.19881999999999},
 {'avg_metric': 8.184998399999996,
  'metric_ci_lb': 8.156799999999995,
  'metric_ci_ub': 8.215199999999998},
 0.806)

We can see that the statistical differences between the evaluations are still insignificant. Let's examine the nations and the genres:

In [39]:
old_stats, new_stats, p_value = bootstrap(np.mean, t.loc[t.origin == 'США', 'rating_old'], t.loc[t.origin == 'США', 'rating_new'], nbr_runs=1000)
old_stats, new_stats, p_value

({'avg_metric': 8.177637272727278,
  'metric_ci_lb': 8.149090909090916,
  'metric_ci_ub': 8.206363636363644},
 {'avg_metric': 8.15514181818182,
  'metric_ci_lb': 8.111772727272728,
  'metric_ci_ub': 8.20181818181818},
 0.519)

In [40]:
old_stats, new_stats, p_value = bootstrap(np.mean, t.loc[t.origin == 'СССР', 'rating_old'], t.loc[t.origin == 'СССР', 'rating_new'], nbr_runs=1000)
old_stats, new_stats, p_value

({'avg_metric': 8.237703225806452,
  'metric_ci_lb': 8.183870967741935,
  'metric_ci_ub': 8.290483870967744},
 {'avg_metric': 8.378712903225807,
  'metric_ci_lb': 8.325806451612904,
  'metric_ci_ub': 8.43225806451613},
 0.003)

In [41]:
old_stats, new_stats, p_value = bootstrap(np.mean, t.loc[t.genre == 'фантастика/ боевик', 'rating_old'], 
                                          t.loc[t.genre == 'фантастика/ боевик', 'rating_new'], nbr_runs=1000)
old_stats, new_stats, p_value

({'avg_metric': 8.16688947368421,
  'metric_ci_lb': 8.110526315789471,
  'metric_ci_ub': 8.231578947368419},
 {'avg_metric': 8.06256842105263,
  'metric_ci_lb': 7.957894736842104,
  'metric_ci_ub': 8.173684210526313},
 0.142)

As we can see, the results are consistent with what we saw with Mann-Whitney U tests: only USSR ratings are different over time, USA and action movies are not statistically significantly different.
