### **Data Ingestion**

In [27]:
# import necessary package
import gdown
import pandas as pd
import numpy as np
import scipy.stats as stat
import seaborn as sns
import matplotlib.pyplot as plt

In [28]:
# load the data

file_id = '1yB5qSBOLl96Y563nIewKOU8RN_gsY3dO'  # Make sure it's a string
gdown.download(f'https://drive.google.com/uc?id={file_id}', 'data.csv', quiet=False)

Downloading...
From: https://drive.google.com/uc?id=1yB5qSBOLl96Y563nIewKOU8RN_gsY3dO
To: c:\Users\ncc\Desktop\task\Week_9\data.csv
100%|██████████| 52.0k/52.0k [00:00<00:00, 47.5MB/s]


'data.csv'

In [29]:
file = pd.read_csv('data.csv')

### **Data Cleaning**

In [30]:
file.isna().sum()

Unnamed: 0       0
country          0
food_category    0
consumption      0
co2_emission     0
dtype: int64

In [31]:
# drop unnecessary columns
file.drop('Unnamed: 0', axis = 1, inplace = True)

### **Statistical Analysis:** One sample T-test

#### **Question**

Is the beef consumption in Argentina significantly different from that in Bangladesh?


- H₀: Mean beef consumption (Argentina) = Mean beef consumption (Bangladesh)

- H₁: Mean beef consumption (Argentina) ≠ Mean beef consumption (Bangladesh)

#### **Solution:**

To identify is the beef consumption in Argentina is slightly different from that in Bangladesh, we have to take the mean of both nations into consideration.

H₀: Mean beef consumption in Argentina = Mean beef consumption in Bangladesh

H₁: Mean beef consumption in Argentina ≠ Mean beef consumption in Bangladesh

In [32]:
# Select Argentina beef consumption
arg_beef = file[(file["country"]=="Argentina") & (file["food_category"]=="beef")]
arg_beef

Unnamed: 0,country,food_category,consumption,co2_emission
2,Argentina,beef,55.48,1712.0


In [33]:
# Select Bangladesh beef consumption
bang_beef = file[(file['country'] == 'Bangladesh') & (file['food_category'] == 'beef')]
bang_beef

Unnamed: 0,country,food_category,consumption,co2_emission
1421,Bangladesh,beef,1.28,39.5


In [34]:
# Brong out the consumption rate for both countries
arg_consump = arg_beef["consumption"]
bang_consump = bang_beef['consumption']

In [35]:
# Get the consumption mean of both countries
arg_consump_mean = arg_consump.mean()
bang_consump_mean = bang_consump.mean()

means = [arg_consump_mean, bang_consump_mean]
print(f'The mean consumption for Argentina is: {arg_consump_mean}\nThe mean consumption for Bangladesh is : {bang_consump_mean}')
print(means)

The mean consumption for Argentina is: 55.48
The mean consumption for Bangladesh is : 1.28
[np.float64(55.48), np.float64(1.28)]


In [None]:
rg_beef_samples = np.random.normal(loc=arg_consump_mean, scale=5, size=30)
rg_beef_samples

array([57.78370376, 60.79162572, 43.71737736, 51.27589084, 57.03403163,
       48.00765905, 53.27769892, 60.63669328, 48.97079492, 61.92798236,
       58.2298846 , 51.87700784, 55.78206889, 55.32808195, 48.45544564,
       59.61449217, 55.0677338 , 67.92614119, 63.5006471 , 57.10472195,
       51.1487727 , 59.65200217, 56.8072683 , 53.23372801, 60.19556237,
       56.17873064, 53.68912798, 57.43836239, 60.22344152, 60.51831921])

In [None]:
# One-sample t-test
t_stat, p_val = stat.ttest_1samp(rg_beef_samples, 50)

In [None]:
print("T-statistic:", t_stat)

T-statistic: 6.533891460294526


In [None]:
print('P-value: ', p_val)



P-value:  3.7235383278763725e-07


#### **Interpretation**

Since the P-value (0.00000119) is less than the alpha (0.05), my decision will be to reject H₀. 

This means the mean consumption of beef in Argentina is not the same as the mean beef consumption in Bangladesh.