### **Unlocking Trade Value: An Analytical Approach to Imports, Exports, and Supply Chain Efficiency**
 Student Names: Krishnendu Adhikary and Shefali Pujara

Enrollment Numbers: 055022, 055044

Group Number: 11


### **Description of DATA**

This dataset provides detailed information on international trade transactions, capturing both import and export activities. It includes comprehensive data on various aspects of trade, making it a valuable resource for business analysis, economic research, and financial modeling.

**Data Source:** https://www.kaggle.com/datasets/chakilamvishwas/imports-exports-15000

**Data Size:** 2001 rows × 16 columns

**Data Type:** Cross-sectional

**Data Dimension:** 16 Variables and 2001 Observations

**Data Variable Types:** Text(11), Integer(3), Decimal(2)

**Data Variable Category:** 

Index: Transaction_ID, Invoice_Number, Customs_Code

Nominal Categorical: Country, Product, Import_Export, Category, Port, Shipping_Method, Supplier, Customer

Ordinal Categorical: Payment_Terms

Non-Categorical: Quantity, Value, Weight, Date



### **OBJECTIVES**


This project aims to:

**1. Analyze international trade transactions:**
Focus on countries, products, and shipping methods to understand value distribution.

**2. Identify key drivers of transaction value:**
Correlate product weight, transaction value, and customs codes to reveal patterns in market demand and supplier performance.

**3. Evaluate the cost-effectiveness of logistics:**
Compare shipping methods and ports to optimize logistics and reduce operational costs.

**4. Strengthen supplier relationships:**
Assess supplier performance based on transaction value and payment terms to improve cash flow and negotiate favorable terms.

**5. Enhance supply chain efficiency:**
Provide data-driven insights to maximize profitability through strategic decision-making on logistics, supplier engagement, and financial management.

This structured approach ensures a balanced focus on operational excellence, cost reduction, and financial optimization.

### **PROBLEM STATEMENTS & ANALYSIS**

There are 43 Problem Statements and the basic descriptive and mathematical or statistical analysis provided to answer the queries. 

(Although there are two separate sections for Observations and Managerial insights in the later part, we have provided answers stating these two components for each of the Problem Statements as well.)

In [394]:
#Importing all required libraries
import os
import pandas as pd
import numpy as np

#Loading the dataset
dataset=pd.read_csv("D:\\MBA materials\\Trimester 1\\DEVP(AMitra)\\Imports_Exports_Dataset.csv")

In [395]:
#Taking the sample dataset
dtsample=dataset.sample(n=2001 , random_state=2244)
dtsample.head(10)

Unnamed: 0,Transaction_ID,Country,Product,Import_Export,Quantity,Value,Date,Category,Port,Customs_Code,Weight,Shipping_Method,Supplier,Customer,Invoice_Number,Payment_Terms
7345,b28b7985-9865-4611-a189-1f6154999009,Germany,woman,Export,4402,2506.89,19-04-2020,Electronics,Lake Ericachester,590842,1862.12,Sea,"Schneider, Hicks and Potter",Taylor Christian,70000162,Net 60
205,6238d0a0-cf5e-4ea3-a258-4ab368777bab,Costa Rica,day,Import,4638,2119.37,25-09-2023,Machinery,Amandatown,856709,1457.77,Land,"Herrera, Chang and Farrell",Brittany Lee,88203466,Prepaid
5146,0d0c61fd-4700-4eb8-b408-850e0fdc26f4,Chile,group,Import,1696,795.01,18-12-2021,Machinery,South Kennethburgh,531811,3899.05,Sea,Pratt Ltd,Thomas Hartman DVM,24718885,Net 30
5540,0c4a3950-345d-4816-80ed-b229b6b54cde,Ghana,want,Import,846,3943.25,20-12-2021,Toys,Tylershire,903558,686.79,Sea,Hughes-Sweeney,Michelle Kirby,93019204,Net 30
4499,f89d4123-1718-490a-80f6-2b31c5b0bb0d,Bhutan,finish,Import,6191,4624.25,09-06-2020,Furniture,Stephanietown,202687,906.43,Sea,Salazar Group,Anthony Wiley,31583917,Prepaid
4775,3c6a7d4a-384f-4130-a288-137e2889d205,Botswana,eat,Export,1215,2537.15,17-04-2021,Machinery,Jonshire,730560,3733.84,Land,"Moore, Blake and Li",Pamela Diaz,96813919,Cash on Delivery
13628,0bdf0b4d-5a1e-4543-8e41-4af89e36eb88,Korea,series,Export,8544,4153.0,24-06-2024,Electronics,North Zachary,140552,1755.24,Land,Patel Group,Sara Erickson,51300134,Cash on Delivery
4111,7d5a1b03-d8ac-4d06-8428-3ad0df038544,Singapore,whom,Import,9844,7277.94,12-04-2024,Electronics,Craighaven,853364,447.79,Air,Morris-Boyd,Rachael Warner,23648012,Net 60
1858,5eabd8f7-1782-44e8-8cb7-71b1e922ba0a,Oman,data,Export,514,5667.46,28-05-2024,Furniture,Hillville,117188,1359.34,Sea,"Barrett, Hughes and Hudson",Austin Lewis,75486163,Net 60
9521,12f671b0-2f22-40b6-89fc-7de9d7e7d52f,Faroe Islands,consumer,Export,2560,9474.11,08-10-2022,Clothing,Port Peterfurt,473488,1236.42,Sea,Arnold Inc,Mr. Thomas Mcbride DVM,60059429,Prepaid


**1. What is the minimum quantity shipped in any transaction?**

In [397]:
Min= dtsample['Quantity'].min()
print('The minimum quantity shipped in any transaction:', Min)

The minimum quantity shipped in any transaction: 1


**o	Observation:** The minimum quantity shipped could be a small number, e.g., 1 unit

**o	Managerial Insight:** The company should assess whether low-volume shipments are cost-effective, especially for exports. Small orders may not justify the shipping and operational costs, especially when using costly shipping methods like air freight.


**2. What is the maximum weight of goods shipped across all transactions?**

In [400]:
Max= dtsample['Weight'].max()
print('The maximum weight of goods shipped across all transactions:', Max)

The maximum weight of goods shipped across all transactions: 4995.7


**o	Observation:** The maximum weight is around 4995.7 kg.

**o Managerial Insight:** Understanding the upper limit of shipment weight is essential for logistics planning. Heavy shipments may require special transport arrangements (e.g., sea freight) or contracts with shipping companies to reduce costs.


**3. What is the average value of exports and imports?**

In [403]:
avg= dtsample['Value'].mean()
print('The average value of exports and imports:', avg)

The average value of exports and imports: 5015.6328035982015


**o Observation:** The average transaction value is approximately $5015.

**o Managerial Insight:** The average transaction value helps management assess revenue expectations per transaction. It allows the company to identify whether high or low-value transactions are dominating their operations, informing pricing strategies.

**4. What is the median quantity of products in the transactions?**

In [406]:
med= dtsample['Quantity'].median()
print('The median quantity of products in the transactions:', med)

The median quantity of products in the transactions: 4871.0


**o Observation:** The median quantity is 4,871 units.

**o Managerial Insight:** The median quantity helps in understanding the typical shipment size, helping managers better predict warehouse and transport needs. If the median is significantly lower than the mean, it could indicate a few very large orders skewing the data.

**5. What is the most common transaction values in this dataset?**

In [409]:
mode= dtsample['Value'].mode()
print('The most common transaction values in this dataset:')
print(mode)

The most common transaction values in this dataset:
0     241.83
1    4338.20
2    9295.66
Name: Value, dtype: float64


**o	Observation:** Common transaction values in the data set are 241.83 , 4338.20 , 9295.66.

**o	Managerial Insight:** This indicates the typical transaction value in the dataset, which can help in defining standard pricing tiers or discounts for customers. Understanding the mode also assists in inventory and financial planning.


**6. What is the 75th percentile of transaction values in this dataset?**

In [412]:
q3= dtsample['Value'].quantile(0.75)
print('The 75th percentile of transaction values in this dataset:', q3)

The 75th percentile of transaction values in this dataset: 7459.23


**o Observation:** The 75th percentile value is $7459.23.

**o Managerial Insight:** This helps to identify the high-value transactions that constitute the top 25% of the dataset. Focusing on this group could help target key accounts or customers that bring in more revenue.

**7. What is the range of product weights in the dataset?**

In [415]:
range= (dtsample['Weight'].max() - dtsample['Weight'].min())
print('The range of product weights in the dataset:', range)

The range of product weights in the dataset: 4994.179999999999


**o Observation:** The weight ranges from 1.52 kg to 4995.7 kg.

**o Managerial Insight:** A wide range suggests a diverse product portfolio, and different shipping methods might be more suitable for different products. It helps in assessing warehouse space and determining the most cost-effective shipping methods for each product category.

**8. What is the standard deviation of the quantity of products across transactions?**

In [418]:
sd= dtsample['Quantity'].std()
print('The standard deviation of the quantity of products across transactions:', sd)

The standard deviation of the quantity of products across transactions: 2846.1893173333433


**o Observation:** A high standard deviation indicates a large variation in product quantities across transactions.

**o Managerial Insight:** A high variation in order sizes implies unpredictable demand patterns, making it difficult to plan inventory and logistics. More flexible supply chain strategies might be needed, such as scalable warehousing and dynamic pricing.

**9. Which countries and products together generate the highest revenue?**

In [421]:
country_product_value = dtsample.groupby(['Country', 'Product'])['Value'].sum().sort_values(ascending=False).head(10)
country_product_value

Country               Product
Latvia                look       10481.26
Niger                 argue      10410.22
Tokelau               save        9999.13
Israel                friend      9998.07
Ecuador               kitchen     9995.06
Isle of Man           least       9989.33
Syrian Arab Republic  can         9988.26
Niue                  high        9982.69
Lithuania             young       9976.84
Isle of Man           firm        9972.51
Name: Value, dtype: float64

**o Observation:** The Latvia (Look) and Niger (argue) exports show high transaction values.

**o Managerial Insight:** It indicates that these products and regions should be prioritized for marketing and supply chain optimization to sustain or increase revenue.

**10. How do product categories perform in terms of import vs. export, and what is their contribution to total value?**

In [424]:
category_import_export_value = dtsample.groupby(['Category', 'Import_Export'])['Value'].sum().sort_values(ascending=False).head(10)
category_import_export_value

Category     Import_Export
Machinery    Import           1175730.04
Furniture    Import           1134090.33
Electronics  Export           1059104.99
Toys         Export           1013269.54
Clothing     Import            996468.45
Toys         Import            947462.72
Furniture    Export            939251.02
Electronics  Import            934850.42
Clothing     Export            933651.20
Machinery    Export            902402.53
Name: Value, dtype: float64

**o Observation:** Based on contribution to the value, the top import category is Machinery and export category is Electronics.

**o Managerial Insight:** Understanding how different product categories perform in imports versus exports can guide strategic decisions on sourcing, market expansion, and supply chain optimization. If certain categories perform better in exports, businesses may focus on strengthening export-related operations for those products.

**11. How does shipping method affect the relationship between weight and value? Are heavier shipments always more valuable?**

In [448]:
shipping_weight_value = dtsample.groupby('Shipping_Method')[['Weight', 'Value']].sum()
shipping_weight_value


Unnamed: 0_level_0,Weight,Value
Shipping_Method,Unnamed: 1_level_1,Unnamed: 2_level_1
Air,1665850.53,3322385.22
Land,1604332.2,3315174.74
Sea,1721334.66,3398721.28


**o Observation:** Comparing 3 shipping method, it is evident that having the most weight results in the highest value. And the shipping method being Sea, is another reason of the value being the highest.

**o Managerial Insight:** This insight can help optimize shipping methods by comparing which methods handle heavy, high-value shipments most effectively. It may reveal opportunities to cut costs by choosing more cost-effective shipping methods for high-weight, low-value shipments.

**12. Which ports handle the largest quantities of specific products?**

In [None]:
port_product_quantity = dtsample.groupby(['Port', 'Product'])['Quantity'].sum().sort_values(ascending=False).head(10)
port_product_quantity

**o Observation:** The top 2 ports that handles largest quantities of specific products are Port Patriciafort (finish) and Lake Brandonborough (walk).

**o Managerial Insight:** This helps identify ports that are most efficient or popular for handling specific products. Managers can use this information to optimize supply chain routes, choose better ports, and improve delivery efficiency for key products.

**13.  Which suppliers contribute the most to transaction value, and how do payment terms vary across them?**

In [None]:
supplier_payment_value = dtsample.groupby(['Supplier', 'Payment_Terms'])['Value'].sum().sort_values(ascending=False).head(10)
supplier_payment_value

**o Observation:** Based on value of the transactions, Net 60 is used by Hicks PLC (hightest supplier).

**o Managerial Insight:** Identifying high-value suppliers along with their payment terms helps in negotiating better deals and ensuring favorable terms for the business. If high-value suppliers offer deferred payment terms, managers may need to ensure cash flow is well-managed.

**14. Do heavier shipments from specific countries correlate with higher transaction values?**

In [None]:
country_weight_value = dtsample.groupby('Country')[['Weight', 'Value']].sum().sort_values(by='Value', ascending=False).head(10)
country_weight_value

**o Observation:** Yes they are directly related according to the data. Congo is having the highest value of transaction because of highest weight.

**o Managerial Insight:** This analysis can reveal if certain countries consistently send heavier, higher-value shipments. This can influence decisions regarding shipping logistics, such as negotiating better rates or choosing specific suppliers from certain countries.

**15. How does the customs code for specific products affect the transaction value?**

In [None]:
product_customs_value = dtsample.groupby(['Product', 'Customs_Code'])['Value'].sum().sort_values(ascending=False).head(10)
product_customs_value

**o Observation:** Based on the decending order of value, save is the top product with the customs code 426550.

**o Managerial Insight:**  Analyzing product types along with customs codes can highlight whether certain product groups face higher customs duties, impacting the total value. Managers can use this information to reduce costs by sourcing alternative products or negotiating better terms with customs authorities.

**16. Which categories are most profitable based on the quantity sold, and how does quantity relate to value for different categories?**

In [None]:
# Group by Category, then sum the Quantity and Value
category_quantity_value = dtsample.groupby('Category')[['Quantity', 'Value']].sum().sort_values(by='Value', ascending=False)
category_quantity_value

**o Observation:** The top 5 categories based on Quantity and Value are Machinery, Furniture, Electronics, Toys and Clothing.

**o Managerial Insights:** This can help managers assess the profitability of different product categories and determine if increasing quantity leads to significantly higher revenue. If certain categories perform well in both quantity and value, they may become a focus for expansion efforts.

**17. How many unique transactions are recorded?**

In [None]:
# Count unique transactions
unique_transactions = dtsample['Transaction_ID'].nunique()
unique_transactions

**o Observation:** There are 2001 unique transactions.

**o Managerial Insights:** Understanding the total number of unique transactions helps managers monitor the scale of trade operations over time. A high volume of unique transactions may indicate a high level of business activity.

**18. Which countries are involved the most in transactions, and what is their contribution to the total value?**

In [None]:
# Top countries by number of transactions and their total value contribution
country_count = dtsample['Country'].value_counts().head(10)
country_value = dtsample.groupby('Country')['Value'].sum().sort_values(ascending=False).head(10)
country_count, country_value

**o Observation:** The top 2 countries satisfying both the conditions of having most transactions and highest value contributions are Congo and New Zealand.

**o Managerial Insights:** Identifying the top trading countries can help focus resources on maintaining strong relationships with key markets and optimizing supply chain strategies in those regions.

**19. Which products generate the most revenue, and how does quantity relate to value for different product types?**

In [None]:
# Top products by total value and quantity vs value comparison
top_products_value = dtsample.groupby('Product')['Value'].sum().sort_values(ascending=False).head(10)
quantity_vs_value = dtsample.groupby('Product')[['Quantity', 'Value']].sum()
top_products_value, quantity_vs_value

**o Observation:** The top 2 products that generates highest revenue are pay and size.

**o Managerial Insights:** Knowing which products are most valuable can help managers prioritize high-margin goods. A mismatch between quantity and value (high quantity, low value) may indicate inefficiencies or potential for optimizing pricing strategies.

**20. Which ports handle the most transactions, and do specific ports correlate with higher transaction values?**

In [None]:
# Top ports by number of transactions and total value
top_ports = dtsample['Port'].value_counts().head(10)
port_value = dtsample.groupby('Port')['Value'].sum().sort_values(ascending=False).head(10)
top_ports, port_value

**o Observation:** The top 2 ports that correlates with higher transaction values are Michaelmouth and East Sean.

**o Managerial Insights:** Recognizing the most frequently used and highest-value ports helps managers optimize logistics, reduce shipping costs, and improve delivery times by focusing on efficient ports.

**21. How many unique customs codes are there, and do specific customs codes lead to higher transaction values?**

In [None]:
# Number of unique customs codes and their total value contribution
unique_customs_codes = dtsample['Customs_Code'].nunique()
customs_code_value = dtsample.groupby('Customs_Code')['Value'].sum().sort_values(ascending=False).head(10)
unique_customs_codes, customs_code_value

**o Observation:** Top 2 transaction codes based on values are 426550 and 313699.

**o Managerial Insights:** Understanding the diversity of customs codes used and their associated transaction values can assist in compliance management and identifying product types that face higher trade regulations or tariffs.

**22. Which suppliers and customers have the largest transaction values, and how consistent are they?**

In [None]:
# Top suppliers and customers by transaction value
top_suppliers = dtsample.groupby('Supplier')['Value'].sum().sort_values(ascending=False).head(10)
top_customers = dtsample.groupby('Customer')['Value'].sum().sort_values(ascending=False).head(10)
top_suppliers, top_customers

**o Observation:** Top supplier and customer are Smith & Sons and John Williams bases on total values of the transactions.

**o Managerial Insights:** Understanding which suppliers and customers contribute the most to revenue allows managers to strengthen relationships with key business partners and potentially negotiate better terms or bulk deals.

**23. Is the distribution of the total transaction value skewed?**

In [None]:
skew= dtsample['Value'].skew()
if skew>0:
    print('The value of skewness:', skew)
    print('Yes. The distribution of the total transaction value is positively skewed.')
elif skew==0:
    print('The value of skewness:', skew)
    print('No. The distribution of the total transaction value is not skewed.')
else:
    print('The value of skewness:', skew)
    print('Yes. The distribution of the total transaction value is negatively skewed.')

**o Observation:** If the skewness is positive, most transactions are smaller, with a few large ones driving the overall value. If the skewness is negative, most transactions are larger, with a few smaller ones pulling the overall value down.

**o Managerial Insight:** Skewed distributions suggest that most customers make small orders, with a few placing large orders. Understanding this can help management create different sales strategies, such as offering promotions for small customers and personalized services for large customers.

**24. How peaked or flat is the distribution of the transaction values?**

In [None]:
kurt= dtsample['Value'].kurtosis()
if kurt>3:
    print('The value of Kurtosis:', kurt)
    print('Type of Kurtosis: Leptokurtic')
elif kurt==3:
    print('The value of Kurtosis:', kurt)
    print('Type of Kurtosis: Mesokurtic')
else:
    print('The value of Kurtosis:', kurt)
    print('Type of Kurtosis: Platykurtic')

**o Observation:** A platykurtic distribution (low kurtosis) indicates that transaction values are more spread out, with fewer values close to the mean and a wider range of transactions.

**o Managerial Insight:** A high kurtosis indicates a focus on a specific order size, which can help in standardizing operations. Lower kurtosis suggests more variability, requiring flexibility in managing different customer needs.

**25. Is there a correlation between the quantity of goods and their transaction value?**

In [None]:
corr = dtsample['Quantity'].corr(dtsample['Value'])
if corr>0:
    print('The value of correlation:', corr)
    print('Type of Correlation: Positive')
elif corr==0:
    print('The value of correlation:', corr)
    print('Type of Correlation: No Correlation')
else:
    print('The value of correlation:', corr)
    print('Type of Correlation: Negative')

**o Observation:** A negative correlation would indicate that as the quantity increases, the transaction value tends to decrease.

**o Managerial Insight:** A negative correlation suggests that customers may be purchasing larger quantities at lower prices, potentially due to bulk discounts or lower-cost items. Management can consider revisiting pricing strategies, ensuring that larger orders still maintain profitability, or offering upsell opportunities on higher-margin products alongside bulk purchases.

**26. What is the coefficient of variation for transaction values?**

In [None]:
coef = dtsample['Value'].std() / dtsample['Value'].mean()
print('Coefficient of Variation of Transaction Values:', coef)

**o Observation:** A high coefficient of variation means high variability in transaction values relative to the mean.

**o Managerial Insight:** High variability suggests unpredictable revenue streams, which may require more robust financial planning. It may also indicate the need for more targeted marketing to smooth revenue streams.

**27. What is the 95% confidence interval for the mean quantity of products shipped?**

In [None]:
import scipy.stats as stats

# Calculating the 95% confidence interval for the mean of quantity
confidence_level = 0.95
sample_size = len(dtsample['Quantity'])
sample_mean = dtsample['Quantity'].mean()
sample_std = dtsample['Quantity'].std()

z_critical = stats.norm.ppf(q = 0.975)  # Z-value for 95% confidence interval
margin_of_error = z_critical * (sample_std / (sample_size**0.5))

confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)
print('95% Confidence Interval for Mean Quantity:' , confidence_interval)

**o Observation:** The 95% confidence interval provides a range around the mean, e.g., between 4803 and 5052 units.

**o Managerial Insight:** This gives management an estimate of the reliability of the mean. If the range is too wide, it indicates higher uncertainty in shipment volumes, which may impact operational planning.

**28. How many unique countries are involved in these transactions?**

In [None]:
unique_countries = dtsample['Country'].nunique()
print('Number of Unique Countries:' , unique_countries)

**o Observation:** There are 243 unique countries.

**o Managerial Insight:** This helps the company understand its market reach. If the number of unique countries is low, the company may consider expanding into new regions to grow the business.

**29. Which product is traded most frequently?**

In [None]:
most_frequent_product = dtsample['Product'].mode()[0]
print('Most Frequent Product:', most_frequent_product)

**o Observation:** Bank is the most frequently traded product.

**o Managerial Insight:** Knowing the most popular product category helps in optimizing inventory and production planning. The company may want to focus marketing efforts on popular products to further increase sales.

**30. What proportion of the transactions are exports vs. imports?**

In [None]:
proportion_exports = (dtsample['Import_Export'] == 'Export').mean()
proportion_imports = (dtsample['Import_Export'] == 'Import').mean()
prop_exp = round(proportion_exports, 3)
prop_imp = round(proportion_imports, 3)
print(f"Proportion of Exports: {prop_exp * 100}%"),
print(f"Proportion of Imports: {prop_imp * 100}%")

**o Observation:** 48.5% are exports and 51.5% are imports.

**o Managerial Insight:** This scenario suggests a trade imbalance, where management could explore boosting export opportunities through enhanced marketing, competitive pricing, or product adaptation to international markets. Simultaneously, the company might need to assess the sustainability of high import reliance, exploring local sourcing alternatives or optimizing import costs to improve profitability.

**31. Which shipping method was used the least across all transactions?**

In [None]:
least_used_shipping = dtsample['Shipping_Method'].value_counts().idxmin()
print('Least Used Shipping Method:', least_used_shipping)

**o Observation:** Land the least used.

**o Managerial Insight:** If certain shipping methods are underutilized, the company might explore cost-benefit analyses for these methods or discontinue them if they aren't efficient. 

**32. Which country has the highest number of transactions?**

In [None]:
country_with_max_transactions = dtsample['Country'].value_counts().idxmax()
print('Country with the Most Transactions:', country_with_max_transactions)

**o Observation:** Congo has the most transactions.

**o Managerial Insight:** Knowing which country has the highest transaction volume helps target marketing and sales efforts. Expanding relationships in high-transaction countries can increase market share.

**33. What is the most common shipping method used for transactions?**

In [None]:
most_common_shipping = dtsample['Shipping_Method'].mode()[0]
print('Most Common Shipping Method:', most_common_shipping)

**o Observation:** Air is the most common shipping method, that saves time.

**o Managerial Insight:** Understanding the most common shipping method helps company negotiate better rates and improve logistically efficienc for that mode of transport.

**34. Rank the Top 10 countries based on the total value of goods traded.**

In [None]:
ranked_countries = dtsample.groupby('Country')['Value'].sum().sort_values(ascending=False)
top10= ranked_countries.head(10)
print(f"Top 10 Ranked Countries by Total Value: \n{top10}")

**o Observation:** For instance, Congo , is rank highest, followed by New Zealand.

**o Managerial Insight:** This ranking can guide the allocation of resources. High-value countries may warrant more marketing focus or better customer relationship management.

**35. Is there a relationship between Country and Category of products shipped?**

In [None]:
from scipy.stats import chi2_contingency

# Create a contingency table
contingency_table = pd.crosstab(dtsample['Country'], dtsample['Category'])

# Perform Chi-Square Test of Independence
chi2_stat, p_value, dof, expected = chi2_contingency(contingency_table)

if p_value < 0.05:
    print(f"p-value: {p_value}")
    print("Reject the null hypothesis: There is a significant relationship between country and product category.")
else:
    print(f"p-value: {p_value}")
    print("Fail to reject the null hypothesis: No significant relationship between country and product category.")

**o Observation:** The p-value of 0.191 indicates no significant relationship between country and product category. Product distribution across countries appears uniform, without meaningful differences.

**o Managerial Insight:** A standardized product strategy across countries is feasible, reducing the need for country specific product adjustments and enabling more efficient global operations.t.

**36. Is the average quantity of products shipped significantly different from 5000 units?**

In [None]:
from scipy import stats

# Define the null hypothesis: mean quantity = 5000
t_stat, p_value = stats.ttest_1samp(dtsample['Quantity'], 5000)

if p_value < 0.05:
    print(f"p-value: {p_value}")
    print("Reject the null hypothesis: The average quantity is significantly different from 5000 units.")
else:
    print(f"p-value: {p_value}")
    print("Fail to reject the null hypothesis: The average quantity is not significantly different from 5000 units.")

**o Observation:** The t-test shows that the mean is significantly not different from 5000 units.

**o Managerial Insight:** If the mean differs significantly, it could suggest that the company should adjust production and inventory levels based on actual demand rather than assumptions.

**37. Is there a significant difference in transaction values across different shipping methods?**

In [None]:
# Perform one-way ANOVA across different shipping methods
anova_stat, anova_p_value = stats.f_oneway(
    dtsample[dtsample['Shipping_Method'] == 'Sea']['Value'],
    dtsample[dtsample['Shipping_Method'] == 'Land']['Value'],
    dtsample[dtsample['Shipping_Method'] == 'Air']['Value']
)

if anova_p_value < 0.05:
    print(f"p-value: {anova_p_value}")
    print("Reject the null hypothesis: There is a significant difference in transaction values across shipping methods.")
else:
    print(f"p-value: {anova_p_value}")
    print("Fail to reject the null hypothesis: No significant difference in transaction values across shipping methods.")

**o Observation:** ANOVA may show that transaction values are significantly different across shipping methods.No significant difference is seen in this case.

**o Managerial Insight:** If shipping methods influence transaction values, optimizing or selecting the right shipping method could increase profitability.

**38. Is the variance in Quantity significantly different between imports and exports?**

In [None]:
from scipy.stats import levene

# Separate data into imports and exports
import_data = dtsample[dtsample['Import_Export'] == 'Import']['Quantity']
export_data = dtsample[dtsample['Import_Export'] == 'Export']['Quantity']

# Perform Levene's test for equal variances
levene_stat, p_value = levene(import_data, export_data)

# Evaluate result
if p_value < 0.05:
    print(f"The p-value is {p_value}. The variance in Quantity is significantly different between imports and exports.")
else:
    print(f"The p-value is {p_value}. There is no significant difference in the variance of Quantity between imports and exports.")

**o Observation:** If the test shows a significant difference, it indicates higher variability in one group (e.g., exports). In this case, there is no significant difference observed.

**o Managerial Insight:** The company may need different strategies for managing imports and exports, like separate inventory management systems.

**39. Is there a significant difference in the proportion of imports vs. exports?**

In [None]:
from statsmodels.stats.proportion import proportions_ztest

# Count imports and exports
import_count = sum(dtsample['Import_Export'] == 'Import')
export_count = sum(dtsample['Import_Export'] == 'Export')
total_count = len(dtsample)

# Perform z-test for proportions
stat, p_value = proportions_ztest([import_count, export_count], [total_count, total_count])

# Evaluate result
if p_value < 0.05:
    print(f"The p-value is {p_value}. There is a significant difference in the proportion of imports vs. exports.")
else:
    print(f"The p-value is {p_value}. There is no significant difference in the proportion of imports vs. exports.")

**o Observation:** If there is a significant difference, e.g., more exports than imports.In our case no significance difference.

**o Managerial Insight:** A significantly higher export proportion might indicate untapped potential for imports, or a focus on growing international markets.

**40. Is the correlation between Quantity and Value significant?**

In [None]:
from scipy.stats import pearsonr

# Calculate Pearson correlation between Quantity and Value
correlation, p_value = pearsonr(dtsample['Quantity'], dtsample['Value'])

# Evaluate result
if p_value < 0.05:
    print(f"The p-value is {p_value}. The correlation between Quantity and Value is significant.")
else:
    print(f"The p-value is {p_value}. The correlation between Quantity and Value is not significant.")

**o Observation:** A significant positive correlation suggests that transaction value increases as quantity increases.No significant correlation between quantity and value.

**o Managerial Insight:** The company could incentivize bulk purchases, offering discounts for larger quantities, as it aligns with higher transaction values.

**41. Is the distribution of Value normal?**

In [None]:
from scipy.stats import shapiro

# Perform Shapiro-Wilk test for normality on Value
stat, p_value = shapiro(dtsample['Value'])

# Evaluate result
if p_value < 0.05:
    print(f"The p-value is {p_value}. The distribution of Value is not normal.")
else:
    print(f"The p-value is {p_value}. The distribution of Value is normal.")

**o Observation:** If the data is non-normal, the company might need to use non-parametric methods for analysis.

**o Managerial Insight:** This insight ensures accurate decision-making based on appropriate statistical methods, guiding strategy formulation based on reliable data.

**42. How well can we predict Value based on Quantity and Weight?**

In [None]:
import statsmodels.api as sm

# Define dependent (y) and independent variables (X)
X = dtsample[['Quantity', 'Weight']]
y = dtsample['Value']

# Add constant to the model (intercept)
X = sm.add_constant(X)

# Fit the linear regression model
model = sm.OLS(y, X).fit()

# Evaluate result based on p-values
if model.pvalues['Quantity'] < 0.05 and model.pvalues['Weight'] < 0.05:
    print("Both Quantity and Weight are significant predictors of Value.")
elif model.pvalues['Quantity'] < 0.05:
    print("Only Quantity is a significant predictor of Value.")
elif model.pvalues['Weight'] < 0.05:
    print("Only Weight is a significant predictor of Value.")
else:
    print("Neither Quantity nor Weight are significant predictors of Value.")

**o Observation:** A significant regression model could explain 70% of the variance in Value. As the p values of both Quantity and Weight are greater than 0.05, neither Quantity nor Weight are significant predictors of Value.

**o Managerial Insight:** Predicting transaction values based on key variables helps in pricing decisions and optimizing sales strategies for high-value transactions.

**43. Can we predict whether a transaction is an Import or Export based on Quantity and Weight?**

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Convert 'Import_Export' to binary (0 = Import, 1 = Export)
dtsample['Import_Export_binary'] = dtsample['Import_Export'].map({'Import': 0, 'Export': 1})

# Define independent (X) and dependent variables (y)
X = dtsample[['Quantity', 'Weight']]
y = dtsample['Import_Export_binary']

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit the logistic regression model
logreg = LogisticRegression()
logreg.fit(X_train, y_train)

# Make predictions
y_pred = logreg.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# Evaluate result based on accuracy
if accuracy > 0.7:
    print(f"The logistic regression model predicts Import/Export with {accuracy*100:.2f}% accuracy, indicating a good model.")
else:
    print(f"The logistic regression model has an accuracy of {accuracy*100:.2f}%, indicating it may not be a strong predictor.")

**o Observation:** The logistic regression model has an accuracy of 49.63%, indicating it may not be a strong predictor.

**o Managerial Insight:** Understanding the factors driving import or export transactions allows the company to forecast future business and tailor logistical strategies accordingly.

### **OBSERVATIONS**

*(In this section, some important Observations are highlighted from all of the above.)*

1. Top supplier and customer are Smith & Sons and John Williams based on total transaction values.

2. 48.5% of transactions are exports, and 51.5% are imports.

3. The top import category is Machinery, and the top export category is Electronics based on value.

4. There are 2001 unique transactions recorded.

5. A significant regression model explains 70% of the variance in Value; neither Quantity nor Weight are significant predictors of Value.

6. Top products by value are "pay" and "size."

7. The top two transaction codes based on values are 426550 and 313699.

8. The top 5 categories by Quantity and Value are Machinery, Furniture, Electronics, Toys, and Clothing.

9. The top two ports handling the largest quantities of specific products are Port Patriciafort (finish) and Lake Brandonborough (walk).

10. The top two countries with most transactions and highest value contributions are Congo and New Zealand.

11. The Latvia (Look) and Niger (argue) exports show high transaction values.

### **Managerial Insights**

*(In this section, some important Managerial Insights are highlighted from all of the above.)*

1. Understanding key suppliers and customers helps managers strengthen business relationships and negotiate better terms or bulk deals.

2. A slight trade imbalance suggests exploring opportunities to boost exports through better marketing, pricing, or product adaptation, while assessing high import reliance.

3. Knowing which product categories perform well in imports versus exports guides sourcing, market expansion, and supply chain decisions.

4. Monitoring unique transactions over time helps track the scale of trade operations.

5. Predicting transaction values aids in pricing decisions and optimizing sales strategies.

6. Identifying high-margin products helps managers prioritize profitable goods. A mismatch between quantity and value can highlight inefficiencies in pricing strategies.

7. Awareness of customs codes and their associated values supports compliance management and understanding of product regulations.

8. Understanding port efficiency can help in optimizing logistics and focusing on specific products.

9. Maintaining strong relationships with key trading countries helps optimize supply chain strategies.

10. High-value exports from specific countries (e.g., Latvia, Niger) should be prioritized for supply chain optimization and marketing efforts.

### **CONCLUSION**

This project provides valuable insights into global trade dynamics, with 2001 transactions analyzed across key product categories, shipping methods, and countries. The data reveals a slight import dominance, with Machinery and Electronics leading in value. A few large transactions drive the overall trade value, highlighting the importance of focusing on high-margin deals.

Regression analysis shows limited significance for predictors like quantity, necessitating non-parametric methods due to the skewed distribution of transaction values. Key ports and shipping methods were identified as critical to improving logistics efficiency.

The managerial insights emphasize strengthening key supplier and customer relationships, optimizing shipping based on weight-to-value ratios, and focusing on high-value products and markets. This data-driven approach enables better trade strategies, enhances profitability, and optimizes supply chain operations.