## **Correlation Task**

### Task:

Is the beef consumption in Argentina significantly different from that in Bangladesh?

H₀: Mean beef consumption (Argentina) = Mean beef consumption (Bangladesh)

H₁: Mean beef consumption (Argentina) ≠ Mean beef consumption (Bangladesh)

In [2]:
# Loading the dataset from Google drive
# %pip install gdown   (If not installed)
import gdown

In [3]:
file_id = '1yB5qSBOLl96Y563nIewKOU8RN_gsY3dO'  # Make sure it's a string

gdown.download(f'https://drive.google.com/uc?id={file_id}', 'data.csv', quiet=False)


Downloading...
From: https://drive.google.com/uc?id=1yB5qSBOLl96Y563nIewKOU8RN_gsY3dO
To: c:\Users\Welcome Sir\Desktop\abioye_olajide_tasks\week_nine\data.csv
100%|██████████| 52.0k/52.0k [00:00<00:00, 8.74MB/s]


'data.csv'

In [4]:
# Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

In [5]:
# Reading in the downloaded CSV file
df = pd.read_csv('data.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,country,food_category,consumption,co2_emission
0,1,Argentina,pork,10.51,37.2
1,2,Argentina,poultry,38.66,41.53
2,3,Argentina,beef,55.48,1712.0
3,4,Argentina,lamb_goat,1.56,54.63
4,5,Argentina,fish,4.36,6.96


In [6]:
df

Unnamed: 0.1,Unnamed: 0,country,food_category,consumption,co2_emission
0,1,Argentina,pork,10.51,37.20
1,2,Argentina,poultry,38.66,41.53
2,3,Argentina,beef,55.48,1712.00
3,4,Argentina,lamb_goat,1.56,54.63
4,5,Argentina,fish,4.36,6.96
...,...,...,...,...,...
1425,1426,Bangladesh,dairy,21.91,31.21
1426,1427,Bangladesh,wheat,17.47,3.33
1427,1428,Bangladesh,rice,171.73,219.76
1428,1429,Bangladesh,soybeans,0.61,0.27


In [7]:
# Since the column for Unnamed is not necessary or needed in the analysis, I'll be dropping it
df.drop('Unnamed: 0', axis= 1, inplace= True)
df

Unnamed: 0,country,food_category,consumption,co2_emission
0,Argentina,pork,10.51,37.20
1,Argentina,poultry,38.66,41.53
2,Argentina,beef,55.48,1712.00
3,Argentina,lamb_goat,1.56,54.63
4,Argentina,fish,4.36,6.96
...,...,...,...,...
1425,Bangladesh,dairy,21.91,31.21
1426,Bangladesh,wheat,17.47,3.33
1427,Bangladesh,rice,171.73,219.76
1428,Bangladesh,soybeans,0.61,0.27


In [8]:
# Select beef consuming nations
arg_beef = df[(df["country"]=="Argentina") & (df["food_category"]=="beef")]

bang_beef = df[(df["country"] == "Bangladesh") & (df['food_category'] == "beef")]

arg_consump = arg_beef["consumption"]

bang_consump = bang_beef["consumption"]

In [10]:
# Select the mean of beef consumption for both nations
arg_consump_mean = arg_consump.mean()

bang_consump_mean = bang_consump.mean()

In [14]:
print("Average consumption of Beef in Argentina:", arg_consump_mean)

print("\nAverage consumption of Beef in Bangladesh:", bang_consump_mean)

Average consumption of Beef in Argentina: 55.48

Average consumption of Beef in Bangladesh: 1.28


In [26]:
# Getting the samples needed for testing

arg_beef_samples = np.random.normal(loc=arg_consump_mean, scale=5, size=30)

bang_beef_samples = np.random.normal(loc=bang_consump_mean, scale=5, size=30)

#### **Calculating the two-tailed test**

In [31]:
# two-sample t-test
t_stat, p_val = stats.ttest_ind(arg_beef_samples, bang_beef_samples, alternative='two-sided')


# Printing out the value of t_stat
print("T-statistic:", t_stat)


# Printing out the value of p_val
print("P-value:", p_val)

T-statistic: 46.50314030705993
P-value: 1.3061955812730002e-47


#### **Interpretation of Hypothesis**

if p < 0.05, we reject $H_0$

if p > 0.05, we fail to reject $H_0$

From the two-tailed test, it can be seen that the `p_value` < `0.05`, therefore we can `reject` the null hypothesis $(H_0)$ and conclude that the mean consumption of beef in Argentina is not the same as the mean beef consumption in Bangladesh.