In [1]:
!pip install bstrap
import numpy as np
import pandas as pd
from scipy.stats import mannwhitneyu
from bstrap import bootstrap

Collecting bstrap
  Downloading bstrap-0.0.9-py3-none-any.whl (6.6 kB)
Installing collected packages: bstrap
Successfully installed bstrap-0.0.9


In [2]:
df = pd.read_csv('/Users/vladislavcesnokov/Desktop/rating.csv',sep = ';', encoding = 'cp1251')

In [3]:
df.head()

Unnamed: 0,num,name_rus,rating_new,origin,genre,rating_old,qty_views
0,1,Зеленая миля,9.1,США,фэнтези/ драма,8.9,692418
1,2,Побег из Шоушенка,9.1,США,драма,8.9,784326
2,3,Властелин колец: Возвращение короля,8.6,Новая Зеландия/ США,фэнтези/ приключения,8.8,481829
3,4,Властелин колец: Две крепости,8.6,Новая Зеландия/ США,фэнтези/ приключения,8.8,467607
4,5,Властелин колец: Братство Кольца,8.6,Новая Зеландия/ США,фэнтези/ приключения,8.8,516856


Let's examine the data to see if there are any null values and if the column format is correct 

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 250 entries, 0 to 249
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   num         250 non-null    int64  
 1   name_rus    250 non-null    object 
 2   rating_new  250 non-null    float64
 3   origin      250 non-null    object 
 4   genre       250 non-null    object 
 5   rating_old  250 non-null    float64
 6   qty_views   250 non-null    int64  
dtypes: float64(2), int64(2), object(3)
memory usage: 13.8+ KB


Now we analyze summary statistics

In [5]:
df.describe()

Unnamed: 0,num,rating_new,rating_old,qty_views
count,250.0,250.0,250.0,250.0
mean,125.5,8.1844,8.1796,289378.3
std,72.312977,0.272643,0.193114,186828.4
min,1.0,7.6,8.0,20056.0
25%,63.25,8.0,8.0,145021.2
50%,125.5,8.1,8.1,251629.5
75%,187.75,8.3,8.3,405379.0
max,250.0,9.1,8.9,1303016.0


We see that the means and median values seem to be similar in old and new ratings 

## Mann – Whitney U-test for total sample

Now we conduct Mann – Whitney U-test to find out if there's a difference in cumulitive sum of rankings between these two types of ratings. <br>We reject null hypothesis when the confidence interval is less than 95%

In [23]:
mannwhitneyu(df['rating_new'],df['rating_old'])

MannwhitneyuResult(statistic=31175.5, pvalue=0.9629567921262221)

We see that the p value is more than 0.05, meaning that the null hypothesis is validated and the ratings are the same for the general top-250 sample

## Mann – Whitney U-test for different genres and countries

Now we conduct the same test for the different countries and genres. First, we select top-2 countries and genres by the number of records and compare them to their previous ratings

In [8]:
top_countries = dict(df['origin'].value_counts().head(2))
top_countries 

{'США': 110, 'СССР': 31}

In [9]:
top_genres = dict(df['genre'].value_counts().head(2))
top_genres

{'фантастика/ боевик': 19, 'мультфильм/ фэнтези': 13}

<br>We see that the highest portion of films in top-250 are made in USA and USSR. The most popular genres are action and animation. <br> Now we will analyze the difference between rakning sums for USSR and USA films

In [54]:
for country in top_countries.keys():
    U, p = mannwhitneyu(df.loc[df['origin'] == country, 'rating_new'],
             df.loc[df['origin'] == country, 'rating_old'])
    print(f'p.value for {country} is {p}, U is {U}\n')

p.value for США is 0.3718029670954922, U is 5634.0

p.value for СССР is 0.007358118927667391, U is 669.0



In [55]:
for genre in top_genres.keys():
    U, p = mannwhitneyu(df.loc[df['genre'] == genre, 'rating_new'],
             df.loc[df['genre'] == genre, 'rating_old'])
    print(f'p.value for {genre} is {p}, U is {U}\n')

p.value for фантастика/ боевик is 0.0787049093768963, U is 121.5

p.value for мультфильм/ фэнтези is 0.25230756055185943, U is 62.0



<br>There is no difference in rankings for films made in USA but there is a difference in films made in USSR. Since the previous votes in a new rating approach are not considered, we can assume that top-250 becomes more dependent on a new reviews that are likely to be made by newly registered users. We don't know the age of the new users, but we can assume that they are likely to be younger gerenation. Therefore, we can assume that this is caused by the emergence of new generations that percieve films made in USSR in a different way: these films might not cause nostalgia to them as they do towards the older generation. <br><br>However, we see no differnce in the action movie rating and animation ranking.

## Bootstrapping

We will check the consistency of mean equality using bootstrap. We will use the mean as the main metric to be tested for the difference in old and new rating system

General sample:

In [53]:
stats_old, stats_new, p = bootstrap(np.mean, df['rating_new'],
                                    df['rating_old'], nbr_runs=10**4)
print(f'p.value for difference in means for general sample is {p}\n old stats: \n{stats_old} \n new stats: \n{stats_new} \n\n')

p.value for difference in means for general sample is 0.818
 old stats: 
{'avg_metric': 8.184381839999999, 'metric_ci_lb': 8.156799999999999, 'metric_ci_ub': 8.2128} 
 new stats: 
{'avg_metric': 8.17955832, 'metric_ci_lb': 8.159999999999998, 'metric_ci_ub': 8.1996} 




USSR and USA films:

In [52]:
for country in top_countries.keys():
    stats_old, stats_new, p = bootstrap(np.mean, df.loc[df['origin'] == country, 'rating_new'],
                                              df.loc[df['origin'] == country, 'rating_old'], nbr_runs=10**4)
    print(f'p.value for difference in means for {country} is {p}\n old stats: \n{stats_old} \n new stats: \n{stats_new} \n\n')

p.value for difference in means for США is 0.5209
 old stats: 
{'avg_metric': 8.15625009090909, 'metric_ci_lb': 8.11090909090909, 'metric_ci_ub': 8.201818181818183} 
 new stats: 
{'avg_metric': 8.177100363636365, 'metric_ci_lb': 8.148181818181818, 'metric_ci_ub': 8.207272727272727} 


p.value for difference in means for СССР is 0.0035
 old stats: 
{'avg_metric': 8.377486451612903, 'metric_ci_lb': 8.325806451612904, 'metric_ci_ub': 8.432258064516128} 
 new stats: 
{'avg_metric': 8.238636451612903, 'metric_ci_lb': 8.187096774193547, 'metric_ci_ub': 8.290322580645164} 




Action and Animation films:

In [51]:
for genre in top_genres.keys():
    stats_old, stats_new, p = bootstrap(np.mean, df.loc[df['genre'] == genre, 'rating_new'],
                                              df.loc[df['genre'] == genre, 'rating_old'], nbr_runs=10**4)
    print(f'p.value for difference in means for {genre} is {p}\n old stats: \n{stats_old} \n new stats: \n{stats_new} \n\n')

p.value for difference in means for фантастика/ боевик is 0.1656
 old stats: 
{'avg_metric': 8.06397052631579, 'metric_ci_lb': 7.963157894736841, 'metric_ci_ub': 8.173684210526314} 
 new stats: 
{'avg_metric': 8.168303684210526, 'metric_ci_lb': 8.110526315789473, 'metric_ci_ub': 8.23157894736842} 


p.value for difference in means for мультфильм/ фэнтези is 0.2687
 old stats: 
{'avg_metric': 8.108414615384614, 'metric_ci_lb': 8.007692307692308, 'metric_ci_ub': 8.215384615384615} 
 new stats: 
{'avg_metric': 8.199968461538461, 'metric_ci_lb': 8.12307692307692, 'metric_ci_ub': 8.284615384615384} 




We see that the bootstrap remains consistent with the results of Mann – Whitney U-test, meaning that the results we achieved are statistically significant

1. USA films have the same rating both in new and old versions of rating, while USSR changed its ratings and the new rating is smaller than it was before, thus showing us the change in users' perseption of USSR films in 2022
2. Action and animation do not have statiscitally significant difference in a new ranking system