The Sales Department manager of a service company has been studying computer reports (regarding customer service by telephone) over a period of several weeks. One of the characteristics of particular interest to her is the time required by each clerk to receive orders and book shipments. The manager is interested in knowing if the amount of order processing time on the phone is associated with the clerk receiving the call. Choose and calculate the value of an appropriate statistic to help her answer the research question. Hint: “Is there a relationship between processing time and clerk?”

The data collected by the manager are presented d in the data file named Sales.dat. Use a Type I error rate of α=0.05.

What conclusion did you make regarding the null hypothesis?

In [1]:
import pandas as pd
import scipy.stats as stats

# Load the data from the provided file
file_path = 'Sales.dat'
data = pd.read_csv(file_path, delim_whitespace=True)

# Display the first few rows of the data to understand its structure
data.head()


Unnamed: 0,clerk,time
0,1,473
1,1,189
2,1,140
3,1,125
4,1,46


In [2]:
# Perform a one-way ANOVA test
anova_result = stats.f_oneway(
    *[data[data['clerk'] == clerk]['time'] for clerk in data['clerk'].unique()]
)

# Extract the F-statistic and p-value
f_statistic = anova_result.statistic
p_value = anova_result.pvalue

f_statistic, p_value


(9.440441746187224, 0.004691003478840473)

Problem 5

A beverage company’s Marketing Department recently distributed a consumer survey questionnaire regarding beverage containers. One of the questions requested the consumer to identify their preference of beverage container type (ct), either glass, aluminum, or plastic. The survey questionnaires were distributed nationwide to each of four geographical regions (region): Northwest (NW), Northeast (NE), Southwest (SW), and Southeast (SE). The beverage company was interested in knowing if there was a relationship between the type of beverage container preferred by consumers and the geographical region in which they lived. One of the marketing employees organized the data as shown below. The value in each cell represents the frequency of consumers preferring that type of container within the corresponding geographical region. Use an appropriate test to assist the Marketing Department in answering the research question. These data are also in the file named Beverage.dat. Use a significance level α=0.05.

In [3]:
import pandas as pd
import scipy.stats as stats

# Load the data from the .dat file
file_path = 'Beverage.dat'
data = pd.read_csv(file_path, delim_whitespace=True, header=None)

# Set up the table with appropriate column names
data.columns = ['Container_Type', 'Region', 'Count']

# Convert the data to numeric, coercing any non-numeric values to NaN (then we can drop them)
data['Count'] = pd.to_numeric(data['Count'], errors='coerce')

# Drop any rows with NaN values that may have resulted from non-numeric entries
data = data.dropna()

# Reshape the data into a pivot table (contingency table)
pivot_table = data.pivot(index='Container_Type', columns='Region', values='Count')

# Clean up the pivot table (remove extra rows and columns if present)
pivot_table = pivot_table.iloc[:-1, :-1]  # Adjust based on the file's content

pivot_table


Region,1,2,3
Container_Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,165.0,77.0,58.0
2,73.0,98.0,131.0


In [4]:
# Perform the Chi-Square Test for Independence
chi2, p_value, dof, expected = stats.chi2_contingency(pivot_table)

# Display the results
print(f"Chi-Square Statistic: {chi2}")
print(f"p-value: {p_value}")
print(f"Degrees of Freedom: {dof}")
print("Expected Frequencies:")
print(expected)

# Interpretation based on p-value
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. There is a significant relationship between container type and region.")
else:
    print("Fail to reject the null hypothesis. There is no significant relationship between container type and region.")

Chi-Square Statistic: 66.27287936823981
p-value: 4.064686876130899e-15
Degrees of Freedom: 2
Expected Frequencies:
[[118.60465116  87.20930233  94.18604651]
 [119.39534884  87.79069767  94.81395349]]
Reject the null hypothesis. There is a significant relationship between container type and region.
