# **Test task for a code reviewer** 🤠

Hi **Practicum**, I'm David Bautista, @BautistaDavid 👽 on Github and my Lnkeding is [David Felipe Bautista Bernal](https://www.linkedin.com/in/david-felipe-bautista-bernal/). In this Notebook i´m going to develop the test task for a code reviewer, thanks for your time guys!!



## **Task 1: Working with data**

 **Import modules and data**

In [35]:
# Run this cell to import all the modules used in this notebook
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

**1.1** Download the data set movie_metadata.csv, which contains data about films from IMDb (Internet Movie Database).

In [36]:
df = pd.read_csv('movie_metadata.csv') # import data from movie_metada.csv using .read_csv method of pandas.

In [37]:
# Use df.columns to know what are the columns of the DataFrame
df.columns 

Index(['color', 'director_name', 'num_critic_for_reviews', 'duration',
       'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
       'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
       'movie_title', 'num_voted_users', 'cast_total_facebook_likes',
       'actor_3_name', 'facenumber_in_poster', 'plot_keywords',
       'movie_imdb_link', 'num_user_for_reviews', 'language', 'country',
       'content_rating', 'budget', 'title_year', 'actor_2_facebook_likes',
       'imdb_score', 'aspect_ratio', 'movie_facebook_likes'],
      dtype='object')

**1.2** The duration column contains data on the film length. How many missing values are there in this column?

we can get an array of booleans indicating whether a record is null (True) or not (False) using the `isnull()` method, then we can add the values ​​that are true (Remember that in python a boolean true is equivalent to the integer 1)

In [38]:
df['duration'].isnull().sum() # First we use isnull method and then sum method. 

15

**Answer**: There are 15 null values in the duration column. 

**1.3** Replace the missing values in the duration column with the median value for this column.

To replace missing values ​​with the median value, it is first important to know that we can calculate the mean of a panda series using the `.median()` method. Knowing that, we just have to use the pandas .fillna() method and give it the value we want to use to replace the null values, in this case the median of the same column. If we want to save the changues in the original DataFrame we have two option, use the argument `inplace=True` or save the results as the new duration column.

In [39]:
# Replae null values with the median 
df['duration'].fillna(value=df['duration'].median(), inplace=True)

**1.4** What is the average film length? Give the answer as a floating-point figure rounded to
two decimal places.

We can use the pandas` mean()` method to find the average of a series of pandas. Also using the `round()` method we can approximate any floating point value to the number of decimal places we want.


In [40]:
df['duration'].mean().round(2)

107.19

**Answer**: The average length of the films is 107.19 minutes. Approximately 1.78 hours

**1.5** Create a movie_duration_category column, which will contain three categories
depending on the film length:
- Category "1. <90" if the film is less than 90 minutes long
-  Category "2. 90–120" if the film is between 90 minutes and two hours long (inclusively)
- Category "3. >120" if the film is more than two hours long

To create these categories we can use the pd.cut() method, for this we have to indicate a list of interval limits (those are inclusive), so for example a list with the values ​​[0,89,120,190] will create three intervals, [0-89], [90-120], [121-190]. Then we can grant a list with names that will be the labels granted to each category.

Taking into account the above, we simply have to create a new column using the column referencing with the square brackets.

In [41]:
max_duration = max(df['duration'])
df['movie_duration_category'] = pd.cut(df['duration'],[0,89,120,max_duration],labels=['<90','90-120','>120'])

In [42]:
# Let's see some examples of the categories that were created
df[['duration','movie_duration_category']].head()

Unnamed: 0,duration,movie_duration_category
0,178.0,>120
1,169.0,>120
2,148.0,>120
3,164.0,>120
4,103.0,90-120


**1.6** Build a summary table for films released after 2000 (inclusively), to list the numbers of
films:

* Table rows: year
* Table columns: movie duration category ("<90", "90–120", ">120")
* The year of release should be displayed in the YYYY format.

We use the crosstab() method to structure a table showing the frequency of movies categorized by year and duration. But first we have to modify the `title_year` column.

In [43]:
# Let's solve this challenge in parts so we can see everything that happens

# First we can convert title_year column from float to int and then covert to datetime year format

df['title_year'] = df['title_year'].fillna(1111) # we can replace nulls by a random value that we can control after
df['title_year'] = df['title_year'].astype('int') # Use .astype method to convert  a float serie to integer serie 


In [44]:
# Having ready the column that contains the information of the year of the movie we can build a table.
# first we need to filter the DataFrame by the given condition
data_filter = df[df['title_year']>=2000]

table = pd.crosstab(data_filter['title_year'],data_filter['movie_duration_category'])
table

movie_duration_category,<90,90-120,>120
title_year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2000,25,112,34
2001,29,120,39
2002,36,146,27
2003,31,108,30
2004,30,142,42
2005,31,142,48
2006,40,146,53
2007,31,130,43
2008,29,160,36
2009,42,178,40


This table allows us to conclude about the frequency of films given the categories of year of the film and duration category. For example, it can be seen that the category of movies with the most frequency are those of the year 2009 that last between 90 minutes and 2 hours.

**1.7** How many films between 90 minutes and two hours long were released in 2008?
<br><br>
We can access the records in the table above to identify the number of films between 90 minutes and two hours. 

In [45]:
table.loc[2008,'90-120']

160

**Answer**: There are 160 films between 90 minutes and two hours long were released in 2008.

**1.8** The plot_keywords column holds keywords characterizing the film's plot. Using the data
in this column, create a column called movie_plot_category, to contain four categories
depending on the key words in the column:
* Category "love_and_death" if the keywords include both "love" and "death"
* Category "love" if the keywords include the word "love"
* Category "death" if the keywords include the word "death"
* Category "other" if the keywords do not meet the conditions above
<br><br>

To create this category we first can use pandas string methods to split words by the character '|', after we just have to identify resistor by register if it has one of the conditions and save the result in a list to estructure a columns after the bucle 

In [46]:
# Create a list of values split by | in each register
df['plot_keywords'] = df['plot_keywords'].str.split('|')
# replace null values with a default word to can interate in all registers without errors
df['plot_keywords'] = df['plot_keywords'].fillna('Null')

# Create a empty list to save the categorys for all register
categorys = []

# Use a for loop to catch which category each record value has
for i in df['plot_keywords']:
  if 'love' and 'death' in i:
    categorys.append('love_and_death')
  elif 'love' in i:
    categorys.append('love')
  elif 'death' in i:
    categorys.append('death')
  else:
    categorys.append('other')

# Create a column in the DataFrame that contains all the values saved in the created list
df['movie_plot_category'] = categorys

In [47]:
# let's see the frequencies of new categories
df['movie_plot_category'].value_counts()

other             4722
love               189
love_and_death     132
Name: movie_plot_category, dtype: int64

**1.9** The imdb_score column shows a viewer rating for the film. Build a table to reflect the
average rating of films depending on which movie_plot_category category they belong to.

To display a table with the average movie rating depending on which movie_plot_category you can use a clustered table to display the averages by category. 

In [48]:
# Use a groypby() method with a mean() method to calculate the viewer rating averages depend movie_plot_category
table2 = df[['movie_plot_category','imdb_score']].groupby(['movie_plot_category']).mean()
table2

Unnamed: 0_level_0,imdb_score
movie_plot_category,Unnamed: 1_level_1
love,6.533862
love_and_death,6.519697
other,6.436298


**1.10** What is the average rating of films in the "love" category? Give the answer as a floating point figure rounded to two decimal places.

We can access the records in the table above to identify the average rating of the movies in the "love" category. then use the round method to round the float number to whatever decimal places you want

In [49]:
# use pd.loc to access a group of rows and columns that you need. 
table2.loc['love','imdb_score'].round(2)

6.53

**1.11** The budget column contains the film's budget. What is the median budget for all the films listed? Give the answer as an integer.

We can use the median method to calculate this value, however these columns are hardcoded as object data type column, so we need to replace the $ symbol using pandas string methods and then convert the strings to float using method astype()

In [50]:
# change the data type from strings to floats (we need to remove some symbols from the strings)
df['budget'] = df['budget'].str.replace('$','').astype('float')

int(df['budget'].median())

15000000

**Answer:** the median budget for all the films listed is 15000000$

## **Task 2: Problem Solving**

**2.1** Download the event_data.csv dataset, which contains data on the use of the mobile
application of users who registered from July 29 to September 1, 2019:
* user_id - user identifier;
* event_date - time of the event;
* event_type - type of event: registration - registration in the application; 
simple_event - click event in the application; purchase - an event of purchase within the application; purchase_amount - purchase amount.

In [51]:
df2 = pd.read_csv('event_data.csv')
df2.columns

Index(['user_id', 'event_date', 'event_type', 'purchase_amount'], dtype='object')

**2.2** Highlight user cohorts based on the week of registration in the application. The cohort
identifier should be the week ordinal (for example, the week from July 29 to August 4 should have identifier 31)

In [52]:
df2['event_date'] = pd.to_datetime(df2['event_date']) 

# Create a new dataframe to calculate the register week of all users
df2_register_week = pd.DataFrame()
# lets filter only the registers to create the columns of the new dataframe 
df2_register_week = df2[df2['event_type']=='registration'][['user_id','event_date']]
# use .dt pandas methods to get the number of week of all registers 
df2_register_week['cohort_register_week'] = df2_register_week['event_date'].dt.isocalendar().week
df2_register_week.head()


Unnamed: 0,user_id,event_date,cohort_register_week
0,c40e6a,2019-07-29 00:02:15,31
1,a2b682,2019-07-29 00:04:46,31
2,9ac888,2019-07-29 00:13:22,31
3,93ff22,2019-07-29 00:16:47,31
4,65ef85,2019-07-29 00:19:23,31


In [53]:
# now we can merge this info to the original table
df2 = df2.merge(df2_register_week[['user_id','cohort_register_week']],how='left',on='user_id')
# Create the number of week of all events 
df2['event_week'] = df2['event_date'].dt.isocalendar().week
df2.head()

Unnamed: 0,user_id,event_date,event_type,purchase_amount,cohort_register_week,event_week
0,c40e6a,2019-07-29 00:02:15,registration,,31,31
1,a2b682,2019-07-29 00:04:46,registration,,31,31
2,9ac888,2019-07-29 00:13:22,registration,,31,31
3,93ff22,2019-07-29 00:16:47,registration,,31,31
4,65ef85,2019-07-29 00:19:23,registration,,31,31


**2.3** How many unique users in the cohort with ID 33?

In [54]:
df2[df2['cohort_register_week'] == 33]['user_id'].unique().shape[0]

2045

**2.4** For each event, highlight the indicator lifetime - the weekly lifetime of the cohort. The
lifetime indicator is calculated based on the serial number of the week in which the event
is committed, relative to the week of registration. For example, an event committed on
August 3 by a user from a cohort of registrants at 31 weeks will be committed on the zero
week of lifetime, and an event committed by the same user on August 5 will be committed
on the first week of lifetime).

In [55]:
df2['lifetime'] = df2['event_week'] - df2['cohort_register_week']
df2.head()

Unnamed: 0,user_id,event_date,event_type,purchase_amount,cohort_register_week,event_week,lifetime
0,c40e6a,2019-07-29 00:02:15,registration,,31,31,0
1,a2b682,2019-07-29 00:04:46,registration,,31,31,0
2,9ac888,2019-07-29 00:13:22,registration,,31,31,0
3,93ff22,2019-07-29 00:16:47,registration,,31,31,0
4,65ef85,2019-07-29 00:19:23,registration,,31,31,0


**5.** Build a summary table of changes in the Retention Rate for cohorts depending on lifetime.

Let's create a pivot table that calculates the number of users subject to cohorot_register_week and the time to live. This is to be able to put together a table where the previous information is related to the number of new clients for each week of registration. Now with this structure you can build a pivot table that gives evidence of  changes in the Retention Rate for cohorts depending on lifetime.


$Retention_{}Rate = \frac{UserPeriod}{NewUser *Week}$


In [56]:
# Calculate the user in all the periods 
sumary_table = df2.pivot_table(index = ['cohort_register_week','lifetime'], aggfunc = {'user_id':'nunique'}).reset_index()
sumary_table.head(10)

# Create a table to calculate the New Users by week

new_users = sumary_table[sumary_table['lifetime'] == 0]
new_users = new_users[['cohort_register_week','user_id']]
new_users

# Merge the two above table and calculate the retetion rate dividing user_id / users_cohort
new_users = new_users.rename(columns={'user_id':'users_cohort'})
sumary_table = sumary_table.merge(new_users,how='left',on='cohort_register_week')

sumary_table['Retetion_Rate'] = sumary_table['user_id'] / sumary_table['users_cohort']
sumary_table

Unnamed: 0,cohort_register_week,lifetime,user_id,users_cohort,Retetion_Rate
0,31,0,1975,1975,1.0
1,31,1,1832,1975,0.927595
2,31,2,1243,1975,0.629367
3,31,3,705,1975,0.356962
4,31,4,297,1975,0.15038
5,32,0,1952,1952,1.0
6,32,1,1814,1952,0.929303
7,32,2,1265,1952,0.648053
8,32,3,705,1952,0.361168
9,33,0,2045,2045,1.0


In [57]:
# Use pivot_table() Method to contrast the retetion_rate values
cohorot_pivot = pd.pivot_table(data=sumary_table,index='cohort_register_week',
                                values='Retetion_Rate',columns='lifetime') 
cohorot_pivot


lifetime,0,1,2,3,4
cohort_register_week,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
31,1.0,0.927595,0.629367,0.356962,0.15038
32,1.0,0.929303,0.648053,0.361168,
33,1.0,0.924205,0.661125,,
34,1.0,0.929078,,,
35,1.0,,,,


**2.6** What is the 3 week retention rate for a cohort with ID 32? Give the answer in percent, rounded to 2 decimal places, inclusive.

iterates through the previous table to obtain the requested data

In [58]:
print(f'{(cohorot_pivot.loc[32,3] * 100).round(2)} %')

36.12 %


**Answer:** The retention rate for a cohorot with value 32 and 3 weeks of lifetime is 36.12 %

**2.7** Build a summary table of changes in the indicator ARPPU (Average Revenue Per Paying User) for cohorts depending on lifetime.

As in numeral 5, the dynamic to solve this problem is the same, build a table that allows comparing both the lifetime and coherent categories with the average purchase amount. Then a pivot table is made to see how the consistency of this value changes.

$APPRU=\frac{SumPAmount}{NewUser*Week}$

In [59]:
pivot2 = pd.pivot_table(data=df2,index=['cohort_register_week','lifetime'], 
                        aggfunc = {'purchase_amount':'sum'}).reset_index() 

APPRU_table = sumary_table.merge(pivot2,how='left',on=['cohort_register_week','lifetime'])
APPRU_table['APPRU'] = APPRU_table['purchase_amount']/APPRU_table['user_id']  
APPRU_table

Unnamed: 0,cohort_register_week,lifetime,user_id,users_cohort,Retetion_Rate,purchase_amount,APPRU
0,31,0,1975,1975,1.0,8890.0,4.501266
1,31,1,1832,1975,0.927595,20540.0,11.21179
2,31,2,1243,1975,0.629367,12210.0,9.823009
3,31,3,705,1975,0.356962,6120.0,8.680851
4,31,4,297,1975,0.15038,2010.0,6.767677
5,32,0,1952,1952,1.0,10850.0,5.558402
6,32,1,1814,1952,0.929303,21050.0,11.60419
7,32,2,1265,1952,0.648053,12600.0,9.960474
8,32,3,705,1952,0.361168,6260.0,8.879433
9,33,0,2045,2045,1.0,9790.0,4.787286


In [60]:
ARPPU_pivot = APPRU_table.pivot_table(index='lifetime',columns='cohort_register_week',
                                      values='APPRU',aggfunc='mean') 
ARPPU_pivot

cohort_register_week,31,32,33,34,35
lifetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,4.501266,5.558402,4.787286,4.817629,5.604878
1,11.21179,11.60419,11.497354,10.708833,
2,9.823009,9.960474,10.162722,,
3,8.680851,8.879433,,,
4,6.767677,,,,


**2.8.** What is the 3-week ARPPU of a cohort with ID 31? Give the answer with a floating point number, rounded to 2 decimal places, inclusive.

Iterate over the previous table to obtain the requested value

In [61]:
ARPPU_pivot.loc[3,31].round(2)

8.68

**Answer:** The 3 Week ARPPU value of a cohort wiht ID 31 is 8.68

**2.9** What is the median time between user registration and first purchase? Give the answer
in seconds (!) As an integer.

We are going to build two data frames, one that relates users to their registration date and another that relates users to the date of their first purchase, with this we calculate the time difference for each customer to then average the results.

In [62]:
#Lets construct two dataframes
# datafrme with register info
df_register = df2[df2['event_type']=='registration'][['user_id','event_date']]
df_register.columns = ['user_id','date_register']

In [63]:
# dataframe wiht first purchase info

df_first_purchase = df2[df2['event_type'] == 'purchase']

# count the number of purchase by user 
df_first_purchase['n_purchase'] = df_first_purchase.groupby('user_id').cumcount()+1

df_first_purchase = df_first_purchase[df_first_purchase['n_purchase'] == 1]
df_first_purchase = df_first_purchase[['user_id', 'event_date']]

df_first_purchase.columns = ['user_id', 'date_first_purchase']

In [64]:
# Merge the created dataframes with the original dataframe 
df2 = pd.merge(df2,df_first_purchase,how='left',on='user_id')
df2 = pd.merge(df2,df_register,how='left',on='user_id')

In [65]:
# Let´s see how looks the DataFrame
df2.head()

Unnamed: 0,user_id,event_date,event_type,purchase_amount,cohort_register_week,event_week,lifetime,date_first_purchase,date_register
0,c40e6a,2019-07-29 00:02:15,registration,,31,31,0,2019-08-10 11:40:06,2019-07-29 00:02:15
1,a2b682,2019-07-29 00:04:46,registration,,31,31,0,2019-08-06 23:18:39,2019-07-29 00:04:46
2,9ac888,2019-07-29 00:13:22,registration,,31,31,0,2019-08-02 02:07:01,2019-07-29 00:13:22
3,93ff22,2019-07-29 00:16:47,registration,,31,31,0,2019-07-31 11:43:09,2019-07-29 00:16:47
4,65ef85,2019-07-29 00:19:23,registration,,31,31,0,2019-08-06 11:55:49,2019-07-29 00:19:23


In [66]:
# calculate the time diference between date first purchase and date register 

df2['delta_time'] = df2['date_first_purchase'] - df2['date_register']
df2

Unnamed: 0,user_id,event_date,event_type,purchase_amount,cohort_register_week,event_week,lifetime,date_first_purchase,date_register,delta_time
0,c40e6a,2019-07-29 00:02:15,registration,,31,31,0,2019-08-10 11:40:06,2019-07-29 00:02:15,12 days 11:37:51
1,a2b682,2019-07-29 00:04:46,registration,,31,31,0,2019-08-06 23:18:39,2019-07-29 00:04:46,8 days 23:13:53
2,9ac888,2019-07-29 00:13:22,registration,,31,31,0,2019-08-02 02:07:01,2019-07-29 00:13:22,4 days 01:53:39
3,93ff22,2019-07-29 00:16:47,registration,,31,31,0,2019-07-31 11:43:09,2019-07-29 00:16:47,2 days 11:26:22
4,65ef85,2019-07-29 00:19:23,registration,,31,31,0,2019-08-06 11:55:49,2019-07-29 00:19:23,8 days 11:36:26
...,...,...,...,...,...,...,...,...,...,...
79737,930c23,2019-09-01 23:57:41,simple_event,,32,35,3,2019-08-18 06:53:13,2019-08-07 15:58:32,10 days 14:54:41
79738,a84999,2019-09-01 23:57:50,simple_event,,33,35,2,NaT,2019-08-14 05:08:15,NaT
79739,175e4d,2019-09-01 23:59:40,simple_event,,32,35,3,2019-08-18 01:35:56,2019-08-10 00:31:21,8 days 01:04:35
79740,1c2210,2019-09-01 23:59:51,simple_event,,33,35,2,2019-08-20 07:57:11,2019-08-17 09:45:33,2 days 22:11:38


In [67]:
median_seconds = int(df2['delta_time'].dt.seconds.median())
print('--Answer--\n',)
print(f'The median time between user registration and first purchase is {median_seconds} seconds ')

--Answer--

The median time between user registration and first purchase is 43356 seconds 



**Curious fact:**
There is a data type for pandas specialized in recording time differences

## **Task 3: Answering Students Questions**

How would you answer the student's question below? Your task is to get your message across in such a way that a beginner can understand your explanation. You can do this any way you want (pictures, GIFs, metaphors, anything) so long as it makes your explanation clear. Indicate how much time you spent completing this task.

> **What is the difference between DataFrame and Series?**

**Answer:**
<br><br>
The main strategy that I would use to explain the difference between these two concepts is a graphic aid so that the student can associate the structure of these two terms with a mental image, which will help them much more to understand the logic of each of these. 
<br><br>
In this image/infographic what most seeks to highlight the difference between a series and a dataframe in relation to their dimensions.
<br><br>

![alt text](https://raw.githubusercontent.com/BautistaDavid/Twitter_Posts/main/series_df.png)

☝ Own authorship

<br><br>
Another important element is trying to capture the student's attention if the answer to their question is given within some type of content such as an article. Therefore we can always resort to humor and more if it is a subject related to the pandas library. There is a perfect play on words.
<br><br>
![alt text](https://miro.medium.com/v2/resize:fit:720/format:webp/1*DadyHI0auADUxl5-ft4uSQ.jpeg)

image taken from: [https://miro.medium.com/v2/resize:fit:720/format:webp/1*DadyHI0auADUxl5-ft4uSQ.jpeg](https://)

<br><br>
**How much time i spent?**
<br><br>
Spend about 30-45 minutes thinking about and creating the infographic layout.

## **Task 4:** 

You are given two random variables X and Y.

$E(X) = 0.5 \hspace{2cm} Var(X) = 2\\
\\
\\
E(Y) = 7 \hspace{2cm}  Var(X) = 3.5\\
$

$
cov(X,Y) = -0.8
$

* Find the variance of the variable $Z= 2X - 3Y$

$Var(aX - bY) = a^2Var(X) + b^2Var(Y) + 2ab*cov(X,Y)\\ 
Var(2X-3Y) = a^2Var(X) + b^2Var(Y) + 2ab*cov(X,Y)\\
Var(2X-3Y) = 2^2 * 2 + (-3)^2 * 3.5+2(2)(-3)*-0.8\\
Var(2X-3Y) = 8 + 31.5+9.6\\
Var(2X-3Y) = 49.1$



## **Task 5**

Omer trained a linear regression model and tested its performance on a test sample of 500 objects. On 400 of those, the model returned a prediction higher than expected by 0.5, and on the remaining 100, the model returned a prediction lower than expected by 0.7.

What is the MSE for his model?

Limor claims that the linear regression model wasn't trained correctly, and we can do improve it by changing all the answers by a constant value. What will be her MSE?

You can assume that Limor found the smallest error under her constraints.

**Return two values - Omer's and Limor's MSE.**
<br><br>
---------------------

First of all lets define the MSE:

$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y -̂y)^2 $$

Bearing in mind that the MSE can be seen as the sum of the difference between the real and estimated values ​​raised to the square. (That divided by the number of data). It is easy to calculate the MSE by means of the data that we are given for the case of omer.
<br><br>

$$MSE = \frac{(0.5^2 * 400) + (-0.7^2 * 100)}{500}\\
\\
MSE = 0.298\\
$$








In [68]:
MSE = (((0.5**2) * 400) + ((0.7**2) * 100)) * (1/500)
print(f'Omar´s Case MSE: {MSE}')

Omar´s Case MSE: 0.298


We can start from the following facts to analyze this case
<br><br>
$$\hat{y}_{new} = \hat{y} + c \hspace{2 cm} MSE= \frac{1}{n} \sum(y-\hat{y})^2$$
<br><br>
Now it can be argued that
<br><br>
$$MSE_{new} = \frac{1}{n} \sum(y-\hat{y}_{new})^2 \\
= \frac{1}{n} \sum(y-\hat{y} + c)^2 \\
=  \frac{1}{n} \sum (c^2 - 2c(y-\hat{y}) + (y-\hat{y})^2)\\
= c^2 + \frac{2c}{n}\sum(y - \hat{y}) + MSE$$
<br><br>
Differentiating with respect to 0 we have
<br><br>
$$\frac{dMSE_{new}}{dc} = 2c + \frac{2}{n}\sum(y - \hat{y}) = 0\\
c = -\frac{1}{n}\sum(y - \hat{y})$$
<br><br>
Replacing c in MSE New we have that
<br><br>

$$MSE_{new} = MSE - (\frac{1}{n})^2 * \sum(y - \hat{y})$$


**Answer:**
<br><br>
In conclusion, in the case of Limor, the metric calculated after making the change in the constant has no differences with the metric calculated in the case of Omer (0.298).