## Library Imports

In [46]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.formula.api import ols
from scipy.stats import chi2_contingency

## III. Professional Suites

In [30]:
suites = pd.read_excel("Shruti Sharada S-FinalExam.xlsx", sheet_name = "Suites")

In [31]:
suites.head()

Unnamed: 0,City,Room Type,Revenue (Lakhs)
0,Mumbai,Standard,50
1,Mumbai,Deluxe,55
2,Mumbai,Super Deluxe,60
3,Delhi,Standard,30
4,Delhi,Deluxe,45


In [32]:
suites["Room_type"] = suites["Room Type"]
suites["Revenue"] = suites["Revenue (Lakhs)"]

In [33]:
suites = suites.drop(['Room Type', "Revenue (Lakhs)"], axis = 1)
suites

Unnamed: 0,City,Room_type,Revenue
0,Mumbai,Standard,50
1,Mumbai,Deluxe,55
2,Mumbai,Super Deluxe,60
3,Delhi,Standard,30
4,Delhi,Deluxe,45
5,Delhi,Super Deluxe,45
6,Bangalore,Standard,40
7,Bangalore,Deluxe,45
8,Bangalore,Super Deluxe,50
9,Chennai,Standard,30


In [20]:
suites.groupby(['City','Room_type'])["Revenue (Lakhs)"].mean()

City       Room_type   
Bangalore  Deluxe          45.0
           Standard        40.0
           Super Deluxe    50.0
Chennai    Deluxe          30.0
           Standard        30.0
           Super Deluxe    40.0
Delhi      Deluxe          45.0
           Standard        30.0
           Super Deluxe    45.0
Mumbai     Deluxe          55.0
           Standard        50.0
           Super Deluxe    60.0
Name: Revenue (Lakhs), dtype: float64

## Two-way ANOVA:

In [22]:
suites_model = ols('Revenue ~ City+ Room_type', data = suites).fit()

In [35]:
result = sm.stats.anova_lm(suites_model, axis = 2)
result

Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
City,3.0,750.0,250.0,24.0,0.000966
Room_type,2.0,254.166667,127.083333,12.2,0.007688
Residual,6.0,62.5,10.416667,,


**Inference:** When we take `significance as 0.05`, we can observe that the `p-value of City is less than the significance, hence, rejecting H0`, we can conclude that there is a significant difference in the average revenue between the different cities.
However, when we take significance as 0.05, the `p-value of 'Room_type' is greater than the significance, hence, accepting H0`, we can conclude that there is no a significant difference in the average revenue between the different Room Types.

Infering from data, Cities, `Chennai and Mumbai` seem to be making a significant impact on the difference in revenue among the cities.
Tests like, Tukey's Test and Pairwise T-test, can be performed to conclude this.

## IV. Generational Differences

In [37]:
gen = pd.read_excel("Shruti Sharada S-FinalExam.xlsx", sheet_name = "Generations")
gen.head()

Unnamed: 0,Generation,Leave Job For More Money?
0,Gen Z,No
1,Gen X,No
2,Gen Z,Yes
3,Gen Z,No
4,Gen Z,No


In [42]:
gens = gen.pivot_table(index='Leave Job For More Money?', columns='Generation', aggfunc='size', fill_value=0)
gens

Generation,Gen X,Gen Z,Millenial
Leave Job For More Money?,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
No,208,171,182
Yes,129,164,152


## Chi-Square Analysis:

In [48]:
chi2, p, dof, expected = chi2_contingency(gens)
print(f"Chi2 Statistic value: {chi2}")
print(f"P-value: {p}")
print(f"Degrees of Freedom: {dof}")

Chi2 Statistic value: 8.092012900555744
P-value: 0.01749209087386266
Degrees of Freedom: 2


**Inference:** If we take `significance level as 5%`, we can notice that the p-value is greater than the significance, hence, accepting H0, we can conclude that the `Interest in leaving a current job for more money is independent of employee generation.`