# What is the Relationship Between Cooking Complexity and Average Rating?

**Name(s)**: Aman Kar, Daniel Mathew

**Website Link**: https://akar247.github.io/RecipesDurationAnalysis/

## Code

In [1]:
import pandas as pd
import numpy as np
import os

import plotly.express as px
pd.options.plotting.backend = 'plotly'

### Cleaning and EDA

Cleaning
1. Merged two datasets on the recipes
2. Made series of average rating for each recipe and merged it with current dataframe
3. Changed tags, nutrition, steps, and ingredients column datatypes to lists of correct value types
4. Changed submitted and data column datatypes to datetime
5. Replaced ratings of 0 with NaN values 

In [2]:
interactions_fp = os.path.join('food_data', 'RAW_interactions.csv')
recipes_fp = os.path.join('food_data', 'RAW_recipes.csv')
raw_interactions = pd.read_csv(interactions_fp)
raw_recipes = pd.read_csv(recipes_fp)
display(raw_interactions, raw_recipes)

Unnamed: 0,user_id,recipe_id,date,rating,review
0,1293707,40893,2011-12-21,5,"So simple, so delicious! Great for chilly fall..."
1,126440,85009,2010-02-27,5,I made the Mexican topping and took it to bunk...
2,57222,85009,2011-10-01,5,"Made the cheddar bacon topping, adding a sprin..."
3,124416,120345,2011-08-06,0,"Just an observation, so I will not rate. I fo..."
4,2000192946,120345,2015-05-10,2,This recipe was OVERLY too sweet. I would sta...
...,...,...,...,...,...
731922,2002357020,82303,2018-12-05,5,Delicious quick thick chocolate sauce with ing...
731923,583662,386618,2009-09-29,5,These were so delicious! My husband and I tru...
731924,157126,78003,2008-06-23,5,WOW! Sometimes I don't take the time to rate ...
731925,53932,78003,2009-01-11,4,Very good! I used regular port as well. The ...


Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients
0,1 brownies in the world best ever,333281,40,985201,2008-10-27,"['60-minutes-or-less', 'time-to-make', 'course...","[138.4, 10.0, 50.0, 3.0, 3.0, 19.0, 6.0]",10,['heat the oven to 350f and arrange the rack i...,"these are the most; chocolatey, moist, rich, d...","['bittersweet chocolate', 'unsalted butter', '...",9
1,1 in canada chocolate chip cookies,453467,45,1848091,2011-04-11,"['60-minutes-or-less', 'time-to-make', 'cuisin...","[595.1, 46.0, 211.0, 22.0, 13.0, 51.0, 26.0]",12,"['pre-heat oven the 350 degrees f', 'in a mixi...",this is the recipe that we use at my school ca...,"['white sugar', 'brown sugar', 'salt', 'margar...",11
2,412 broccoli casserole,306168,40,50969,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9
3,millionaire pound cake,286009,120,461724,2008-02-12,"['time-to-make', 'course', 'cuisine', 'prepara...","[878.3, 63.0, 326.0, 13.0, 20.0, 123.0, 39.0]",7,"['freheat the oven to 300 degrees', 'grease a ...",why a millionaire pound cake? because it's su...,"['butter', 'sugar', 'eggs', 'all-purpose flour...",7
4,2000 meatloaf,475785,90,2202916,2012-03-06,"['time-to-make', 'course', 'main-ingredient', ...","[267.0, 30.0, 12.0, 12.0, 29.0, 48.0, 2.0]",17,"['pan fry bacon , and set aside on a paper tow...","ready, set, cook! special edition contest entr...","['meatloaf mixture', 'unsmoked bacon', 'goat c...",13
...,...,...,...,...,...,...,...,...,...,...,...,...
83777,zydeco soup,486161,60,227978,2012-08-29,"['ham', '60-minutes-or-less', 'time-to-make', ...","[415.2, 26.0, 34.0, 26.0, 44.0, 21.0, 15.0]",7,"['heat oil in a 4-quart dutch oven', 'add cele...",this is a delicious soup that i originally fou...,"['celery', 'onion', 'green sweet pepper', 'gar...",22
83778,zydeco spice mix,493372,5,1500678,2013-01-09,"['15-minutes-or-less', 'time-to-make', 'course...","[14.8, 0.0, 2.0, 58.0, 1.0, 0.0, 1.0]",1,['mix all ingredients together thoroughly'],this spice mix will make your taste buds dance!,"['paprika', 'salt', 'garlic powder', 'onion po...",13
83779,zydeco ya ya deviled eggs,308080,40,37779,2008-06-07,"['60-minutes-or-less', 'time-to-make', 'course...","[59.2, 6.0, 2.0, 3.0, 6.0, 5.0, 0.0]",7,"['in a bowl , combine the mashed yolks and may...","deviled eggs, cajun-style","['hard-cooked eggs', 'mayonnaise', 'dijon must...",8
83780,cookies by design cookies on a stick,298512,29,506822,2008-04-15,"['30-minutes-or-less', 'time-to-make', 'course...","[188.0, 11.0, 57.0, 11.0, 7.0, 21.0, 9.0]",9,['place melted butter in a large mixing bowl a...,"i've heard of the 'cookies by design' company,...","['butter', 'eagle brand condensed milk', 'ligh...",10


In [3]:
reviews = raw_recipes.merge(raw_interactions, left_on='id', right_on='recipe_id', how='left')
reviews

Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients,user_id,recipe_id,date,rating,review
0,1 brownies in the world best ever,333281,40,985201,2008-10-27,"['60-minutes-or-less', 'time-to-make', 'course...","[138.4, 10.0, 50.0, 3.0, 3.0, 19.0, 6.0]",10,['heat the oven to 350f and arrange the rack i...,"these are the most; chocolatey, moist, rich, d...","['bittersweet chocolate', 'unsalted butter', '...",9,3.865850e+05,333281.0,2008-11-19,4.0,"These were pretty good, but took forever to ba..."
1,1 in canada chocolate chip cookies,453467,45,1848091,2011-04-11,"['60-minutes-or-less', 'time-to-make', 'cuisin...","[595.1, 46.0, 211.0, 22.0, 13.0, 51.0, 26.0]",12,"['pre-heat oven the 350 degrees f', 'in a mixi...",this is the recipe that we use at my school ca...,"['white sugar', 'brown sugar', 'salt', 'margar...",11,4.246800e+05,453467.0,2012-01-26,5.0,Originally I was gonna cut the recipe in half ...
2,412 broccoli casserole,306168,40,50969,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,2.978200e+04,306168.0,2008-12-31,5.0,This was one of the best broccoli casseroles t...
3,412 broccoli casserole,306168,40,50969,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,1.196280e+06,306168.0,2009-04-13,5.0,I made this for my son's first birthday party ...
4,412 broccoli casserole,306168,40,50969,2008-05-30,"['60-minutes-or-less', 'time-to-make', 'course...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"['preheat oven to 350 degrees', 'spray a 2 qua...",since there are already 411 recipes for brocco...,"['frozen broccoli cuts', 'cream of chicken sou...",9,7.688280e+05,306168.0,2013-08-02,5.0,Loved this. Be sure to completely thaw the br...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
234424,zydeco ya ya deviled eggs,308080,40,37779,2008-06-07,"['60-minutes-or-less', 'time-to-make', 'course...","[59.2, 6.0, 2.0, 3.0, 6.0, 5.0, 0.0]",7,"['in a bowl , combine the mashed yolks and may...","deviled eggs, cajun-style","['hard-cooked eggs', 'mayonnaise', 'dijon must...",8,8.445540e+05,308080.0,2009-10-14,5.0,These were very good. I meant to add some jala...
234425,cookies by design cookies on a stick,298512,29,506822,2008-04-15,"['30-minutes-or-less', 'time-to-make', 'course...","[188.0, 11.0, 57.0, 11.0, 7.0, 21.0, 9.0]",9,['place melted butter in a large mixing bowl a...,"i've heard of the 'cookies by design' company,...","['butter', 'eagle brand condensed milk', 'ligh...",10,8.042340e+05,298512.0,2008-05-02,1.0,I would rate this a zero if I could. I followe...
234426,cookies by design sugar shortbread cookies,298509,20,506822,2008-04-15,"['30-minutes-or-less', 'time-to-make', 'course...","[174.9, 14.0, 33.0, 4.0, 4.0, 11.0, 6.0]",5,"['whip sugar and shortening in a large bowl , ...","i've heard of the 'cookies by design' company,...","['granulated sugar', 'shortening', 'eggs', 'fl...",7,8.666510e+05,298509.0,2008-06-19,1.0,This recipe tastes nothing like the Cookies by...
234427,cookies by design sugar shortbread cookies,298509,20,506822,2008-04-15,"['30-minutes-or-less', 'time-to-make', 'course...","[174.9, 14.0, 33.0, 4.0, 4.0, 11.0, 6.0]",5,"['whip sugar and shortening in a large bowl , ...","i've heard of the 'cookies by design' company,...","['granulated sugar', 'shortening', 'eggs', 'fl...",7,1.546277e+06,298509.0,2010-02-08,5.0,"yummy cookies, i love this recipe me and my sm..."


In [4]:
reviews['rating'] = reviews['rating'].replace(0, np.NaN)
recipe_ratings = reviews.groupby('name')['rating'].mean().to_frame()
final_reviews = reviews.merge(recipe_ratings, left_on='name', right_index=True, suffixes=('_individual', '_average'))

In [5]:
# convert columns: 
#    tags, nutrition, steps, ingredients of strings to lists (DONE)
#    user_id, recipe_id to int (NO since changing type doesn't work on NA and no need for our purposes to change)
#    rating_individual to int (not necessary)
#    submitted, date to datetime
def convert_column(ser):
    return ser.str.slice(start=1, stop=-1).str.replace("'", '').str.split(', ')

In [6]:
final_reviews[['tags', 'nutrition', 'steps', 'ingredients']] = final_reviews[['tags', 'nutrition', 'steps', 'ingredients']].apply(convert_column)
final_reviews['nutrition'] = final_reviews['nutrition'].transform(lambda lst: list(map(float, lst)))

In [139]:
final_reviews['submitted'] = pd.to_datetime(final_reviews['submitted'])
final_reviews['date'] = pd.to_datetime(final_reviews['date'])
final_reviews = final_reviews[final_reviews['minutes'] <= 1440]

In [141]:
relevant_columns = final_reviews[['name', 'minutes', 'n_steps', 'n_ingredients', 'rating_average']]
less_than_day = relevant_columns[relevant_columns['minutes'] <= 1440]
grouped_data = less_than_day.groupby('name').mean()
grouped_data

Unnamed: 0_level_0,minutes,n_steps,n_ingredients,rating_average
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0 carb 0 cal gummy worms,45.0,15.0,3.0,4.750000
0 point ice cream only 1 ingredient,125.0,5.0,3.0,5.000000
0 point soup ww,55.0,5.0,14.0,4.777778
0 point soup crock pot,305.0,2.0,11.0,5.000000
007 martini,5.0,4.0,4.0,5.000000
...,...,...,...,...
zydeco salad,5.0,4.0,4.0,5.000000
zydeco sauce,15.0,3.0,6.0,5.000000
zydeco soup,60.0,7.0,22.0,5.000000
zydeco spice mix,5.0,1.0,13.0,5.000000


In [77]:
fig = px.histogram(grouped_data, x='minutes', nbins=400, title='Distribution of Cooking Times')
fig

In [74]:
fig = px.histogram(grouped_data, x='n_steps', nbins=20, title='Distribution of Number of Steps in Recipes')
fig

In [105]:
fig = px.histogram(grouped_data, x='n_ingredients', nbins=10, title='Distribution of Number of Ingredients in Recipes')
fig

In [70]:
fig = px.histogram(grouped_data, x='rating_average', nbins=5, title='Distribution of Average Ratings for Recipes')
fig

In [98]:
# minutes with steps
minutes_and_steps = grouped_data.groupby('minutes').mean().reset_index()
minutes_and_steps
fig = px.histogram(minutes_and_steps, 'n_steps', 'minutes', nbins=25, histfunc='avg')
fig

In [107]:
# ingredients with minutes
minutes_and_steps = grouped_data.groupby('minutes').mean().reset_index()
fig = px.histogram(minutes_and_steps, 'n_ingredients', 'minutes', nbins=10, histfunc='avg')
fig

In [122]:
grouped_data['minute_intervals'] = pd.qcut(grouped_data['minutes'], q=10)
grouped_data['step_intervals'] = pd.qcut(grouped_data['n_steps'], q=10)
minute_step_count_pt = grouped_data.reset_index().pivot_table(index='minute_intervals', columns='step_intervals', values='name', aggfunc='count')

In [123]:
grouped_data['minute_intervals'] = pd.qcut(grouped_data['minutes'], q=10)
grouped_data['step_intervals'] = pd.qcut(grouped_data['n_steps'], q=10)
minute_step_rating_pt = grouped_data.reset_index().pivot_table(index='minute_intervals', columns='step_intervals', values='rating_average', aggfunc='mean')

### Assessment of Missingness

NMAR Assessment purely on website, nothing to show on notebook. 
Most likely, the "description" column is NMAR. While it is possible that descriptions may be missing because the title is self-explanatory, there are descriptions that have nothing to do with the title or the food item itself. This implies that descriptions aren't missing dependent on other columns. They could be missing dependent on themselves because the owner of the recipe felt that they couldn't think of a description that warranted writing it in the first place. The owner felt that there was nothing important to say about the recipe, so there was no need to write a description. The missingness of description is dependent on the descriptions themselves.

In [142]:
# use rating_individual as missing column
# hypothesize that rating_individual is missing dependent on minutes
# and it is missing not dependent on name
final_reviews.head()

Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients,user_id,recipe_id,date,rating_individual,review,rating_average
0,1 brownies in the world best ever,333281,40,985201,2008-10-27,"[60-minutes-or-less, time-to-make, course, mai...","[138.4, 10.0, 50.0, 3.0, 3.0, 19.0, 6.0]",10,[heat the oven to 350f and arrange the rack in...,"these are the most; chocolatey, moist, rich, d...","[bittersweet chocolate, unsalted butter, eggs,...",9,386585.0,333281.0,2008-11-19,4.0,"These were pretty good, but took forever to ba...",4.0
1,1 in canada chocolate chip cookies,453467,45,1848091,2011-04-11,"[60-minutes-or-less, time-to-make, cuisine, pr...","[595.1, 46.0, 211.0, 22.0, 13.0, 51.0, 26.0]",12,"[pre-heat oven the 350 degrees f, in a mixing ...",this is the recipe that we use at my school ca...,"[white sugar, brown sugar, salt, margarine, eg...",11,424680.0,453467.0,2012-01-26,5.0,Originally I was gonna cut the recipe in half ...,5.0
2,412 broccoli casserole,306168,40,50969,2008-05-30,"[60-minutes-or-less, time-to-make, course, mai...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"[preheat oven to 350 degrees, spray a 2 quart ...",since there are already 411 recipes for brocco...,"[frozen broccoli cuts, cream of chicken soup, ...",9,29782.0,306168.0,2008-12-31,5.0,This was one of the best broccoli casseroles t...,5.0
3,412 broccoli casserole,306168,40,50969,2008-05-30,"[60-minutes-or-less, time-to-make, course, mai...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"[preheat oven to 350 degrees, spray a 2 quart ...",since there are already 411 recipes for brocco...,"[frozen broccoli cuts, cream of chicken soup, ...",9,1196280.0,306168.0,2009-04-13,5.0,I made this for my son's first birthday party ...,5.0
4,412 broccoli casserole,306168,40,50969,2008-05-30,"[60-minutes-or-less, time-to-make, course, mai...","[194.8, 20.0, 6.0, 32.0, 22.0, 36.0, 3.0]",6,"[preheat oven to 350 degrees, spray a 2 quart ...",since there are already 411 recipes for brocco...,"[frozen broccoli cuts, cream of chicken soup, ...",9,768828.0,306168.0,2013-08-02,5.0,Loved this. Be sure to completely thaw the br...,5.0


In [182]:
missingness1 = final_reviews[['minutes', 'rating_individual']]
rating_missing_1 = missingness1[missingness1['rating_individual'].isna()]
fig = px.histogram(rating_missing_1, 'minutes')
rating_not_missing_1 = missingness1[~missingness1['rating_individual'].isna()]
fig2 = px.histogram(rating_not_missing_1, 'minutes')
# fig.show(), fig2.show()
ratings_missing_mean = rating_missing_1['minutes'].mean()
ratings_not_missing_mean = rating_not_missing_1['minutes'].mean()
observed = abs(ratings_missing_mean - ratings_not_missing_mean)

In [183]:
reps = 100
def run_perm(df, N):
    df_copy = df.copy()
    diffs = []
    for _ in range(N):
        df_copy['rating_individual'] = np.random.permutation(df_copy['rating_individual'])
        missing = df_copy[['minutes', 'rating_individual']]
        ratings_missing_mean = missing[missing['rating_individual'].isna()]['minutes'].mean()
        ratings_not_missing_mean = missing[~missing['rating_individual'].isna()]['minutes'].mean()
        diffs.append(abs(ratings_missing_mean - ratings_not_missing_mean))
    return diffs

In [184]:
arr = run_perm(final_reviews, 100)

In [185]:
(np.array(arr) >= observed).mean()

0.0

In [209]:
def season(date):
    month = date.month
    if month < 4:
        return 'Q1'
    elif month < 7:
        return 'Q2'
    elif month < 10:
        return 'Q3'
    else:
        return 'Q4'
missingness2 = final_reviews[['date', 'rating_individual']]
missingness2.loc[:, 'quarter'] = missingness2.loc[:, 'date'].transform(season)
missingness2



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,date,rating_individual,quarter
0,2008-11-19,4.0,Q4
1,2012-01-26,5.0,Q1
2,2008-12-31,5.0,Q4
3,2009-04-13,5.0,Q2
4,2013-08-02,5.0,Q3
...,...,...,...
234424,2009-10-14,5.0,Q4
234425,2008-05-02,1.0,Q2
234426,2008-06-19,1.0,Q2
234427,2010-02-08,5.0,Q1


In [217]:
fig = px.histogram(missingness2[['rating_individual', 'quarter']], 'quarter', 'rating_individual', nbins=10, histfunc='count')
fig

pt = missingness2.pivot_table(index='quarter', columns='rating_individual', values='date', aggfunc='count')
pt / pt.sum()

rating_individual,1.0,2.0,3.0,4.0,5.0
quarter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Q1,0.27046,0.281143,0.276727,0.255856,0.254955
Q2,0.236038,0.21971,0.252067,0.283937,0.27774
Q3,0.23112,0.247014,0.239877,0.242986,0.250785
Q4,0.262381,0.252133,0.23133,0.21722,0.216521


In [236]:
rating_missing_2 = missingness2[missingness2['rating_individual'].isna()]['quarter'].value_counts(normalize=True)
rating_not_missing_2 = missingness2[~missingess2['rating_individual'].isna()]['quarter'].value_counts(normalize=True)
obs = (rating_missing_2 - rating_not_missing_2).abs().sum() / 2
obs

(rating_missing_2 - rating_not_missing_2)

Q1    0.023709
Q2   -0.048004
Q3   -0.015030
Q4    0.039325
Name: quarter, dtype: float64

In [295]:
tag = final_reviews[['tags', 'rating_individual']]
tag['num_tags'] = tag['tags'].apply(lambda x: len(x))

rm_2 = tag[tag['rating_individual'].isna()]['num_tags'].mean()
rnm_2 = tag[~tag['rating_individual'].isna()]['num_tags'].mean()
obs = abs(rm_2 - rnm_2)
obs

tag_copy = tag.copy()
tag_copy['rating_individual'] = np.random.permutation(tag_copy['rating_individual'])
rm_2 = tag_copy[tag_copy['rating_individual'].isna()]['num_tags'].mean()
rnm_2 = tag_copy[~tag_copy['rating_individual'].isna()]['num_tags'].mean()
abs(rm_2 - rnm_2)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



0.07154283967334862

In [265]:
crazy = final_reviews[['user_id', 'rating_individual']]

rm3 = crazy[crazy['rating_individual'].isna()]['user_id'].mean()
rnm3 = crazy[~crazy['rating_individual'].isna()]['user_id'].mean()
cra_obs = abs(rm3 - rnm3)
cra_obs

rm3, rnm3, cra_obs

(747072538.7172711, 184960276.60085762, 562112262.1164135)

In [251]:
def run_perm_2(df, col, N):
    df_copy = df.copy()
    tvds = []
    for _ in range(N):
        df_copy['rating_individual'] = np.random.permutation(df_copy['rating_individual'])
        rating_missing_2 = df_copy[df_copy['rating_individual'].isna()][col].value_counts(normalize=True)
        rating_not_missing_2 = df_copy[~df_copy['rating_individual'].isna()][col].value_counts(normalize=True)
        tvd = (rating_missing_2 - rating_not_missing_2).abs().sum() / 2
        tvds.append(tvd)
    return np.array(tvds)

In [267]:
def run_means_perm(df, col, N):
    df_copy = df.copy()
    difs = []
    for _ in range(N):
        df_copy['rating_individual'] = np.random.permutation(df_copy['rating_individual'])
        rm2 = df_copy[df_copy['rating_individual'].isna()][col].mean()
        rnm2 = df_copy[~df_copy['rating_individual'].isna()][col].mean()
#         print(rm2, rnm2, abs(rm2 - rnm2))
        difs.append(abs(rm2 - rnm2))
    return np.array(difs)

In [253]:
arr = run_perm_2(missingness2, 'quarter', 100)
arr[:10]

array([0.00644059, 0.01040609, 0.00520604, 0.00888839, 0.00514468,
       0.00699992, 0.00608659, 0.00701653, 0.0062966 , 0.00671193])

In [269]:
arr = run_means_perm(tag, 'num_tags', 10)
arr[:10]

array([0.04922459, 0.05495126, 0.02896129, 0.00156432, 0.05260832,
       0.0097388 , 0.04231313, 0.00117147, 0.1001246 , 0.04454495])

In [270]:
arr = run_means_perm(crazy, 'user_id', 100)
arr[:10]

array([ 1280612.29159659,  2904599.15142015,  5885271.25472808,
        8347305.01902491,  4621879.40283355,  7964938.60647932,
        3380358.86648571, 10454819.49724305,  9724441.45184547,
         983665.13043779])

In [271]:
(arr >= obs).mean()

1.0

### Hypothesis Testing

Null: The average rating of a recipe is not related to the cooking duration of that recipe.

Alternate: The average rating of a recipe decreases the longer the cooking duration of that recipe is.

Test Statistic: The difference in average cooking duration for low rated recipes on average [1, 3) and high rated recipes on average [3, 5]

In [309]:
grouped = final_reviews.groupby('name')[['minutes', 'rating_average']].mean()
grouped['High or Low'] = grouped['rating_average'].transform(lambda x: 'High' if x >= 3.5 else 'Low')
obs = grouped.groupby('High or Low').mean().diff().abs().iloc[-1]['minutes']

In [317]:
diffs = []
for _ in range(1000):
    grouped_copy = grouped.copy()
    grouped_copy['High or Low'] = np.random.permutation(grouped_copy['High or Low'])
    diff = grouped_copy.groupby('High or Low').mean().diff().abs().iloc[-1]['minutes']
    diffs.append(diff)
diffs = np.array(diffs)

In [319]:
(diffs >= obs).mean()

0.0

Since we got a p-value of 0.0, we reject the null hypothesis and there is evidence that the average rating of a recipe decreaes the longer the cooking duration of that recipe. 