# Data Visualization (Dashboard) Project 2

#### IMPORT/EXPORT ANALYSIS OF VARIOUS PRODUCTS

##### Submitted to Prof. Amarnath Mitra by Dimple (055009) 

This project aims to establish relevant managerial insights from the given dataset after thorough data visualization using python.

##### Data Source

dataset_source = "https://www.kaggle.com/datasets/chakilamvishwas/imports-exports-15000"


## Description of the Data
This dataset provides detailed information on international trade transactions, capturing both import and export activities. It includes comprehensive data on various aspects of trade, making it a valuable resource for business analysis, economic research, and financial modeling.

Features:

Transaction_ID: Unique identifier for each trade transaction.
Country: Country of origin or destination for the trade.
Product: Product being traded.
Import_Export: Indicates whether the transaction is an import or export.
Quantity: Amount of the product traded.
Value: Monetary value of the product in USD.
Date: Date of the transaction.
Category: Category of the product (e.g., Electronics, Clothing, Machinery).
Port: Port of entry or departure.
Customs_Code: Customs or HS code for product classification.
Weight: Weight of the product in kilograms.
Shipping_Method: Method used for shipping (e.g., Air, Sea, Land).
Supplier: Name of the supplier or manufacturer.
Customer: Name of the customer or recipient.
Invoice_Number: Unique invoice number for the transaction.
Payment_Terms: Terms of payment (e.g., Net 30, Net 60, Cash on Delivery).
Usage: This dataset can be used for various analyses, including:

Trade Analysis: Understanding trade patterns and trends.
Economic Research: Studying the impact of trade on different economies.
Financial Modeling: Evaluating financial performance and risks associated with trade.
Supply Chain Management: Analyzing logistics and shipping methods.
Data Format: The dataset is provided in CSV format, making it easy to load into data analysis tools and frameworks.

Additional Notes:
The dataset consists of 15,000 rows, ensuring a substantial sample size for meaningful analysis.
Data is generated with realistic variability, simulating actual trade transactions.

## Project Objectives and Problem Statements:

In [45]:
#1. What is the total value of transactions by country?
#2. Which product categories contribute the most to transaction value?
#3. How has the transaction value changed over time?
#4. What is the average transaction value by country?
#5. What is average transaction value by category?
#6. How does shipping method affect the transaction value?
#7. Which are the highest number of transactions by payment terms?
#8. Which ports are handling the highest number of transactions?
#9. Who are the top suppliers and customers based on transaction value?
#10. How does weight correlate with transaction value?
#11. What is the transaction frequency by product category? 
#12. How is the import/export split by country?
#13. Which country has the highest export value?
#14. How does transaction value vary across months?
#15. What are the most frequently used customs codes?
#16. Does the port of origin/destination affect transaction value?
#17. What is the correlation between the data points?

# Observations
The dataset offers a comprehensive view of the company's import and export activities, highlighting key trends, customer behaviors, and operational dynamics. Below is a synthesized analysis based on the insights derived from various queries:

**1. Trade Balance**

**Imports vs. Exports:** The dataset reveals a nearly balanced approach to imports and exports, with approximately 50.12% of transactions being imports and a significant portion being exports. This indicates that the company has a robust trading strategy, leveraging both foreign and local markets to maximize opportunities.

**2. Product Preferences**

**Popular Categories:** The largest imported category is Furniture, while Electronics leads in exports. This insight suggests a clear differentiation in product offerings based on market demand. The focus on high-demand categories reflects effective market alignment but also raises questions about potential over-reliance on specific segments.

**Most Common Product:** The most frequently traded product is identified as a hotel, indicating that there may be a specialized market segment that could be further explored or expanded.

**3. Customer Behavior and Payment Preferences**

**Payment Terms:** The predominance of Cash on Delivery (COD) as the preferred payment method reflects customer preferences for security and trust in transactions. This could impact cash flow management and operational efficiency, suggesting a need to explore alternative payment options.

**Customer Segmentation:** The identification of Richard Allen as the top customer and the frequency of transactions with Korea highlight the importance of understanding customer behaviors and preferences.

**4. Geographical Insights**
**Country Engagement:** Korea is the most engaged country in terms of transaction volume, while Chad ranks as the least engaged. This discrepancy underscores the importance of focusing efforts on strong markets while exploring opportunities in underrepresented regions.

**Export and Import Ports:** Michaelmouth is the most common port for exports, while Lake Michael is the primary port for imports. Understanding these logistical hubs can help optimize supply chain strategies and enhance operational efficiencies.

**5. Financial Considerations**

**Transaction Values:** The total values of imports and exports are relatively close, with $5,209,639.50 in imports and $4,953,436.43 in exports. The largest single transaction reaching nearly $10,000 points to significant sales opportunities but also highlights the need for robust cash flow management strategies.

**Average Weight of Imports:** The average weight of imported goods being 2532.47 suggests that heavy items might be predominant in trade. This insight can guide inventory management and logistics planning.

**6. Operational Considerations**

**Logistics and Supply Chain:** The high volume of air imports (1,714,941 units) indicates a reliance on fast shipping methods, potentially for high-demand products. This reliance could lead to higher shipping costs and needs evaluation for cost-effective alternatives.

**Common Customs Code:** The prominence of customs code 239621 suggests a significant product category that may require focused compliance and risk management strategies.

# Strategic Recommendations
Based on the observations drawn from the dataset, here are several strategic recommendations:

**Diversification of Supplier and Market Base:**
Enhance relationships with suppliers in high-performing markets like Korea while exploring underrepresented regions such as Chad for potential growth opportunities.

**Cash Flow Management Improvements:**
Consider promoting alternative payment methods to reduce reliance on COD, thus improving cash flow and operational efficiency.

**Focused Marketing Strategies:**
Tailor marketing efforts to high-demand product categories, particularly in Furniture and Electronics, while analyzing customer preferences in the hotel segment to further strengthen market positioning.

**Logistics Optimization:**
Optimize logistics and supply chain management by leveraging insights from common ports and shipping methods, ensuring cost-effective operations without sacrificing service quality.

**Risk Management Framework:**
Establish a risk management framework focused on compliance with customs regulations and payment term complexities to ensure smooth operations across international markets.

**Market Analysis for Expansion:**
Conduct thorough market analysis in potential growth regions to identify barriers and opportunities, enhancing the overall market footprint.


##### Importing Relevant Libraries

In [46]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import streamlit as st

##### Loading the Project's Data

In [47]:
file_path = "C:\\Users\\batra\\Downloads\\dimpi\\Imports_Exports_Dataset.csv" #Load the Test Data.
project=pd.read_csv(file_path)
my_project= pd.DataFrame(project);
my_project 

Unnamed: 0,Transaction_ID,Country,Product,Import_Export,Quantity,Value,Date,Category,Port,Customs_Code,Weight,Shipping_Method,Supplier,Customer,Invoice_Number,Payment_Terms
0,e3e70682-c209-4cac-a29f-6fbed82c07cd,Colombia,describe,Export,1979,9506.57,07-12-2023,Machinery,Robertbury,620537,4248.65,Air,"Garrison, Hubbard and Hendricks",Seth Hall,21000294,Cash on Delivery
1,f728b4fa-4248-4e3a-8a5d-2f346baa9455,Chile,president,Export,5763,7100.91,04-04-2023,Clothing,Shahport,927600,4340.81,Air,Webb-Mack,Kimberly Ryan,88738432,Prepaid
2,eb1167b3-67a9-4378-bc65-c1e582e2e662,Turkey,far,Import,5740,2450.34,21-08-2024,Electronics,South Joshuatown,299258,4538.41,Air,"Mendez, Jones and Johnson",Ryan Silva,89922099,Prepaid
3,f7c1bd87-4da5-4709-9471-3d60c8a70639,Christmas Island,agency,Export,2592,7226.42,09-05-2022,Furniture,Adamfort,996084,4886.14,Air,Schroeder-Smith,Jacob Gray,63216265,Net 60
4,e443df78-9558-467f-9ba9-1faf7a024204,Finland,policy,Export,2622,2817.29,03-04-2023,Electronics,Juliebury,555981,4406.74,Air,Zimmerman LLC,Amy Stephens,94600248,Cash on Delivery
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14995,48df15a8-0823-4964-8c16-eddf2756f382,Marshall Islands,not,Export,2860,2055.19,09-07-2024,Furniture,South Karenfort,393463,4120.35,Land,Smith-Lewis,Darlene Davis,29605073,Net 60
14996,31106617-94a6-4646-a001-5e7bd45abc26,Bermuda,air,Export,2443,6407.06,18-06-2024,Furniture,Jeffreyside,484143,1832.71,Air,Jones Group,John Ramos,39044695,Cash on Delivery
14997,ee485839-fbde-4ced-af18-d98f5e863081,Tanzania,show,Export,1702,9918.29,30-04-2020,Toys,North Meganborough,354935,4203.52,Land,Barnes-Romero,Rebecca Phelps,78492040,Net 30
14998,5acd54aa-ec8c-4055-be8b-a447861a471c,Tuvalu,TV,Export,8108,9288.57,29-04-2021,Clothing,Villafurt,234296,1597.72,Land,"Smith, Allison and Bennett",Scott Yates,20799602,Cash on Delivery


In [48]:
my_project.shape

(15000, 16)

##### Creating a Unique Sample of 3001 Records using Student Roll Number as Random State.

In [49]:
my_sample=my_project.sample(n=3001, random_state=55009) 
rows, columns = my_sample.shape
print(f"Number of rows: {rows}, Number of columns: {columns}")

Number of rows: 3001, Number of columns: 16


##### Data Type:

In [50]:
my_sample.dtypes

Transaction_ID      object
Country             object
Product             object
Import_Export       object
Quantity             int64
Value              float64
Date                object
Category            object
Port                object
Customs_Code         int64
Weight             float64
Shipping_Method     object
Supplier            object
Customer            object
Invoice_Number       int64
Payment_Terms       object
dtype: object

##### Data Dimension:

In [51]:
my_sample.shape

(3001, 16)

##### Data Variable Type:

In [52]:
my_sample.info

<bound method DataFrame.info of                              Transaction_ID                   Country  \
5803   087d9b9c-202c-440d-bbfe-de43b337ed27  Northern Mariana Islands   
11118  01a67882-f685-4a4a-914e-611199031ed2                     Chile   
6494   8f9136af-ce07-4b1e-8cee-420cc95bc7ae                   Ireland   
6230   b76cfca2-5cf1-4f05-a927-2cd1c7639a9f                     Sudan   
6482   f7300019-92b3-4a88-8958-574b30bcf302                    Israel   
...                                     ...                       ...   
10769  51d11a92-e3d3-440f-9ced-4d5abe51541b                    Gambia   
2779   1dafc318-d82c-4825-9eb7-e4e373df18f6                   Armenia   
3987   46780a81-6cb7-45bf-8e58-cd63c7e470d9                      Iraq   
6264   c9d9d2a6-ea5f-4267-9877-3d3ae8b5a07a                  Portugal   
7837   6e9d842b-76ca-4a3e-9d04-9ba4e7d6885a                   Albania   

        Product Import_Export  Quantity    Value        Date     Category  \
5803      crim

##### Sample Information:

In [53]:
#Display the First 03 Records of the Sample Data.
my_sample.head(3)

Unnamed: 0,Transaction_ID,Country,Product,Import_Export,Quantity,Value,Date,Category,Port,Customs_Code,Weight,Shipping_Method,Supplier,Customer,Invoice_Number,Payment_Terms
5803,087d9b9c-202c-440d-bbfe-de43b337ed27,Northern Mariana Islands,crime,Import,4969,2116.77,03-03-2020,Electronics,Harristown,860509,4730.89,Air,"Gonzalez, Miller and Yates",William Best,96061230,Cash on Delivery
11118,01a67882-f685-4a4a-914e-611199031ed2,Chile,PM,Export,6882,7945.13,08-03-2020,Electronics,South Sarahtown,276664,1875.84,Sea,"King, Cuevas and Evans",Karen Anderson,7874322,Net 30
6494,8f9136af-ce07-4b1e-8cee-420cc95bc7ae,Ireland,someone,Export,2067,1963.88,21-08-2023,Furniture,Robinsonhaven,985683,1198.32,Sea,Jordan PLC,Gary Spears,78942059,Cash on Delivery


In [54]:
#Obtain the column names of the pandas DataFrame
my_sample.columns.tolist()

['Transaction_ID',
 'Country',
 'Product',
 'Import_Export',
 'Quantity',
 'Value',
 'Date',
 'Category',
 'Port',
 'Customs_Code',
 'Weight',
 'Shipping_Method',
 'Supplier',
 'Customer',
 'Invoice_Number',
 'Payment_Terms']

##### Data Variable Category:

In [55]:
#Identify & Display the List of the following Variables:
# 1. Index Variables[Transaction_ID]
# 2. Categorical Variables - Nominal Type [Country, Product, Import_Export, Category, Port, Customs_Code, Shipping_Method, Supplier, Customer, Invoice_Number]
# 3. Categorical Variables - Ordinal Type. [Payment_Terms]
# 4. Non-Categorical Variables [Quantity, Value, Date, Weight]

#### Subseting Categorical & Non Categorical Variables and Finding Descriptive Statistics of both

In [56]:
#Subset and Display the Non-Categorical Variables.
non_categorical = my_sample[['Quantity', 'Value', 'Date', 'Weight']]
non_categorical

Unnamed: 0,Quantity,Value,Date,Weight
5803,4969,2116.77,03-03-2020,4730.89
11118,6882,7945.13,08-03-2020,1875.84
6494,2067,1963.88,21-08-2023,1198.32
6230,132,6951.83,18-04-2021,2469.92
6482,7688,1287.15,17-07-2023,3465.36
...,...,...,...,...
10769,2235,4981.43,01-06-2023,3952.41
2779,5422,6449.47,14-05-2023,1971.12
3987,9020,9835.71,25-09-2023,775.43
6264,8589,1597.58,27-04-2023,3599.34


In [57]:
#Display the Descriptive Statistics of the Non-Categorical Subset.
non_categorical.describe()

Unnamed: 0,Quantity,Value,Weight
count,3001.0,3001.0,3001.0
mean,4965.52949,5001.621586,2456.484319
std,2818.092517,2861.995494,1447.73689
min,9.0,104.49,1.64
25%,2606.0,2471.43,1206.15
50%,4925.0,5004.07,2416.9
75%,7273.0,7463.4,3707.36
max,9988.0,9995.06,4998.73


In [58]:
#Subset and Display the Categorical Variables.
my_categorical = my_sample[['Country', 'Product', 'Import_Export', 'Category', 'Port', 'Customs_Code', 'Shipping_Method', 'Supplier', 'Customer', 'Invoice_Number']]
my_categorical

Unnamed: 0,Country,Product,Import_Export,Category,Port,Customs_Code,Shipping_Method,Supplier,Customer,Invoice_Number
5803,Northern Mariana Islands,crime,Import,Electronics,Harristown,860509,Air,"Gonzalez, Miller and Yates",William Best,96061230
11118,Chile,PM,Export,Electronics,South Sarahtown,276664,Sea,"King, Cuevas and Evans",Karen Anderson,7874322
6494,Ireland,someone,Export,Furniture,Robinsonhaven,985683,Sea,Jordan PLC,Gary Spears,78942059
6230,Sudan,dark,Import,Toys,Veronicahaven,954611,Land,Jackson PLC,Marissa Murphy,66821650
6482,Israel,stand,Import,Clothing,Debbieside,270601,Sea,"Byrd, Tran and Mitchell",John Payne,40186283
...,...,...,...,...,...,...,...,...,...,...
10769,Gambia,Mrs,Export,Machinery,Lisaland,159456,Sea,Jenkins Group,Kimberly Hernandez,59379361
2779,Armenia,hard,Import,Electronics,West Brian,695490,Air,Bird Group,Julie Williams,76071071
3987,Iraq,recently,Export,Furniture,North Sandraton,597804,Land,"Patterson, Cross and Hernandez",Tiffany Cook,92405669
6264,Portugal,wall,Import,Furniture,Howeview,182615,Sea,Clark PLC,Allison Alexander,21909272


In [59]:
#Display the Descriptive Statistics of any 02 Categorical Variables.
my_categorical02 = my_sample[['Country','Product']]
my_categorical02.value_counts()

Country                 Product    
Tonga                   bar            2
American Samoa          maybe          2
Dominica                billion        2
Cyprus                  clear          2
Bosnia and Herzegovina  term           2
                                      ..
Georgia                 information    1
                        quickly        1
                        role           1
                        scene          1
Zimbabwe                very           1
Name: count, Length: 2978, dtype: int64

## Data Visualisation:

#1. What is the total value of transactions by country?

In [60]:
# Total value of transactions by country
country_value_sorted = country_value.sort_values(ascending=False)

# Visualization
plt.figure(figsize=(12, 6))
country_value_sorted.plot(kind='bar', color='skyblue')
plt.title('Total Transaction Value by Country')
plt.xlabel('Country')
plt.ylabel('Total Value')
plt.xticks(ticks=range(0, len(country_value_sorted), 10), labels=country_value_sorted.index[::10], rotation=45)  # Show labels at every 10th position
plt.tight_layout()
plt.show()

NameError: name 'country_value' is not defined

#2. Which product categories contribute the most to transaction value?

In [None]:
# Sum of transaction value by product category
category_value = my_sample.groupby('Category')['Value'].sum()

# Visualization
plt.figure(figsize=(8, 8))
category_value.plot(kind='pie', autopct='%1.1f%%', startangle=140, cmap='viridis')
plt.title('Contribution of Product Categories to Transaction Value')
plt.ylabel('')
plt.tight_layout()
plt.show()

#3. How has the transaction value changed over time?

In [None]:
# Convert 'Date' to datetime with correct format and parameters
my_sample['Date'] = pd.to_datetime(my_sample['Date'], dayfirst=True, format='%d-%m-%Y')

# Monthly trend in transaction values using the correct resample parameter
monthly_trend = my_sample.resample('ME', on='Date')['Value'].sum()

# Visualization
plt.figure(figsize=(12, 6))
plt.plot(monthly_trend.index, monthly_trend.values, marker='o', color='green')
plt.title('Monthly Trend in Total Transaction Values')
plt.xlabel('Month')
plt.ylabel('Total Value')
plt.grid(True)
plt.tight_layout()
plt.show()


#4. What is the average transaction value by country?

In [None]:
# Average transaction value by country (top 15 countries)
top_avg_value = average_value.sort_values(ascending=False).head(15)

# Visualization
plt.figure(figsize=(10, 6))
top_avg_value.plot(kind='bar', color='orange')
plt.title('Average Transaction Value by Top 15 Countries')
plt.xlabel('Country')
plt.ylabel('Average Value')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

#5. What is average transaction value by category?

In [None]:
# Calculate average transaction value by category
average_value_by_category = my_sample.groupby('Category')['Value'].mean().sort_values(ascending=False)

# Visualization
plt.figure(figsize=(12, 6))
average_value_by_category.plot(kind='bar', color='purple')
plt.title('Average Transaction Value by Category')
plt.xlabel('Category')
plt.ylabel('Average Value')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

#6. How does shipping method affect the transaction value?

In [None]:
#Box plot of transaction value by shipping method 
plt.figure(figsize=(10, 6))
my_sample.boxplot(column='Value', by='Shipping_Method', grid=False)
plt.title('Shipping Method vs. Transaction Value')
plt.suptitle('')
plt.xlabel('Shipping Method')
plt.ylabel('Transaction Value')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

#7. Which are the highest number of transactions by payment terms?

In [None]:
# Number of transactions by payment terms
payment_terms_counts = my_sample['Payment_Terms'].value_counts()

# Visualization
plt.figure(figsize=(10, 6))
payment_terms_counts.plot(kind='bar', color='purple')
plt.title('Frequency of Payment Terms Used')
plt.xlabel('Payment Terms')
plt.ylabel('Number of Transactions')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


#8. Which ports are handling the highest number of transactions?

In [None]:
# Number of transactions by port
port_counts = my_sample['Port'].value_counts()

# Visualization
plt.figure(figsize=(10, 6))
port_counts.sort_values(ascending=False).head(10).plot(kind='bar', color='teal')
plt.title('Top 10 Ports by Number of Transactions')
plt.xlabel('Port')
plt.ylabel('Number of Transactions')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


#9. Who are the top suppliers and customers based on transaction value?

In [None]:
# Sum of transaction value by supplier
supplier_value = my_sample.groupby('Supplier')['Value'].sum().sort_values(ascending=False).head(10)

# Visualization
plt.figure(figsize=(10, 6))
supplier_value.plot(kind='bar', color='red')
plt.title('Top 10 Suppliers by Transaction Value')
plt.xlabel('Supplier')
plt.ylabel('Total Value')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


#10. How does weight correlate with transaction value?

In [None]:
import seaborn as sns
import numpy as np
my_sample['Weight'] = my_sample['Weight'].astype(float)  
my_sample['Value'] = my_sample['Value'].astype(float)    

# Creating a scatter plot with regression line
plt.figure(figsize=(12, 8))
sns.scatterplot(data=my_sample, x='Weight', y='Value', hue='Shipping_Method', size='Quantity', sizes=(20, 200), alpha=0.6, palette='viridis')

# Adding a regression line
sns.regplot(data=my_sample, x='Weight', y='Value', scatter=False, color='black', ci=None)

plt.title('Weight vs. Transaction Value with Shipping Method and Quantity')
plt.xlabel('Weight')
plt.ylabel('Transaction Value')
plt.grid(True)
plt.tight_layout()
plt.legend(title='Shipping Method', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()


#11. What is the transaction frequency by product category?

In [None]:
# Transaction frequency by product category
category_freq = my_sample['Category'].value_counts()

# Visualization
plt.figure(figsize=(10, 6))
category_freq.plot(kind='bar', color='grey')
plt.title('Transaction Frequency by Product Category')
plt.xlabel('Product Category')
plt.ylabel('Number of Transactions')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


#12. How is the import/export split by country?

In [None]:
# Calculate total import/export value by country
import_export_split = my_sample.groupby(['Country', 'Import_Export'])['Value'].sum().unstack().fillna(0)

# Sum across all countries to determine top N countries
top_n = 10  # Change this number to display more or fewer countries
top_countries = import_export_split.sum(axis=1).nlargest(top_n).index
import_export_split = import_export_split.loc[top_countries]

# Visualization
plt.figure(figsize=(12, 8))
ax = import_export_split.plot(kind='barh', stacked=True, color=['orange', 'green'], edgecolor='black')

# Adding data labels
for container in ax.containers:
    ax.bar_label(container, fmt='%.0f', label_type='center', color='white', fontsize=10)

# Titles and labels
plt.title('Top Import vs. Export Value per Country')
plt.xlabel('Total Value')
plt.ylabel('Country')
plt.tight_layout()
plt.legend(title='Import/Export', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()


#13. Which country has the highest export value?

In [None]:
# Filter data for exports only
export_data = my_sample[my_sample['Import_Export'] == 'Export']
country_export_value = export_data.groupby('Country')['Value'].sum()

# Sort values and select top N countries
top_n = 10  # Change this number to display more or fewer countries
top_export_countries = country_export_value.nlargest(top_n)

# Calculate the percentage contribution of each country
total_export_value = top_export_countries.sum()
percentage_contribution = (top_export_countries / total_export_value) * 100

# Visualization
plt.figure(figsize=(12, 8))
ax = top_export_countries.sort_values(ascending=True).plot(kind='barh', color='lime', edgecolor='black')

# Adding annotations for values and percentages using iloc
for i, (country, value) in enumerate(top_export_countries.sort_values(ascending=True).items()):
    ax.text(value + 0.02 * total_export_value, i, f'{value:,.0f} ({percentage_contribution.iloc[i]:.1f}%)', color='black', va='center')

# Titles and labels
plt.title('Top Export Value by Country')
plt.xlabel('Total Export Value')
plt.ylabel('Country')
plt.axvline(x=total_export_value, color='red', linestyle='--', label='Total Export Value')  # Optional line
plt.legend()
plt.tight_layout()
plt.show()


#14. How does transaction value vary across months?

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Convert 'Date' column to datetime format
my_sample['Date'] = pd.to_datetime(my_sample['Date'], dayfirst=True)

# Resample to calculate total transaction value by month
monthly_value = my_sample.resample('ME', on='Date')['Value'].sum()  # Use 'ME' for month-end

# Visualization
plt.figure(figsize=(12, 6))
plt.plot(monthly_value.index, monthly_value.values, marker='o', linestyle='-', color='blue')
plt.title('Total Monthly Transaction Value')
plt.xlabel('Month')
plt.ylabel('Total Value')
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()


#15. What are the most frequently used customs codes?

In [None]:
# Frequency of customs codes
customs_code_counts = my_sample['Customs_Code'].value_counts().head(10)

# Visualization
plt.figure(figsize=(10, 6))
customs_code_counts.plot(kind='bar', color='darkblue')
plt.title('Top 10 Customs Codes by Frequency')
plt.xlabel('Customs Code')
plt.ylabel('Number of Transactions')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

#16. Does the port of origin/destination affect transaction value?

In [None]:
import seaborn as sns

# Calculate the top N ports based on total transaction value
top_n = 10  # Change this number as needed
top_ports = my_sample.groupby('Port')['Value'].sum().nlargest(top_n).index

# Filter the dataset for top ports
filtered_data = my_sample[my_sample['Port'].isin(top_ports)]

# Create a combined box plot and violin plot
plt.figure(figsize=(12, 8))
# Update the scale parameter to density_norm
sns.violinplot(x='Port', y='Value', data=filtered_data, inner=None, color='lightgray', density_norm='count')
sns.boxplot(x='Port', y='Value', data=filtered_data, color='lightblue', width=0.4)

# Add title and labels
plt.title('Transaction Value by Port of Origin/Destination (Top Ports)')
plt.xlabel('Port')
plt.ylabel('Transaction Value')
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()

# Adding statistical annotations
for port in top_ports:
    data_port = filtered_data[filtered_data['Port'] == port]
    median_val = data_port['Value'].median()
    plt.annotate(f'Median: {median_val:,.0f}', 
                 xy=(port, median_val), 
                 xytext=(port, median_val + 5000),  # Adjust vertical position as needed
                 arrowprops=dict(facecolor='black', arrowstyle='->'),
                 ha='center')

plt.show()


#17. What is the correlation between the data points?

In [None]:
#correlation of data
my_sample.corr( numeric_only = True)

# Compute the correlation matrix
corr_matrix = my_sample.corr(numeric_only=True)

# Set up the matplotlib figure
plt.figure(figsize=(10, 8))

# Create a heatmap
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', linewidths=0.5, fmt='.2f')

# Add labels and title
plt.title('Correlation Matrix of Numeric Features')
plt.show()

## Problem statements for drawing insights based on this data:

#### Insights:

#### Project Objectives:

- Understanding Key Trading Patterns
 By analyzing the most and least preferred countries, frequently used shipping methods, and top customers and suppliers, the objective is to identify high-value trade relationships and optimize supply chain strategies.
- ⁠Optimize Logistics and Shipping
 Questions related to shipping methods, average weight of goods, and specific shipping modes (air, sea, land) help in understanding logistical preferences, which can optimize cost, efficiency, and lead to better freight management.
- ⁠Financial and Trade Balance Analysis
 Determining the total value of imported and exported products and the balance between the two helps in monitoring the financial health of the company and understanding whether it is a net importer or exporter. This also aids in formulating pricing and profitability strategies.
- Market Insights and Demand Analysis
 Identifying the largest product categories for imports and exports, most common products, and specific country-wise trade volumes offers insights into market demand, allowing the business to focus on high-demand products and markets.
- Payment and Transaction Trends
 Analyzing payment terms, total transactions by payment methods, and the average transaction values provides insights into customer behavior and payment preferences. This allows the company to tailor payment options and policies to better suit their customers' needs.
- Customer and Supplier Relationship Management
 Understanding which customers make the most transactions and which suppliers have the highest number of transactions can guide relationship management strategies, ensuring that key partners and clients receive priority service and engagement.
- Inventory and Capacity Management
 Analyzing transaction quantities and product weights helps in planning inventory and logistics more effectively, ensuring that the company has sufficient stock and transport capacity to meet demand without over-investing in warehousing or shipping resources.
- Cost and Performance Analysis
 Questions related to transaction values, such as the largest and smallest transactions, average values for specific categories, and top customers, provide the opportunity to assess profitability and identify the most valuable customers and products, allowing for targeted marketing and sales strategies.
- Risk and Compliance Monitoring
 Understanding the most common customs codes, import/export preferences, and transaction patterns by payment terms aids in compliance with international trade regulations and helps manage risks associated with foreign trade.
- Strategic Planning and Forecasting


## Managerial Insights

Inferences Drawn from These Problem Statements:
- 1. Identifying trends in shipping methods and payment terms can improve logistical planning and financial management.
- 2. Understanding the average weight and value of imports/exports can assist in negotiating better freight and insurance deals.
- 3. Recognizing the top contributing product categories and countries helps optimize inventory, focus on key markets, and make strategic trade decisions.
- 4. Comparing the value of imported versus exported goods provides insights into trade imbalances or opportunities for growth in specific sectors.

## Recommendations


Based on the analysis and visualizations you've performed on this dataset, here are some potential recommendations:

- Focus on high-value products and markets: Identify the products and countries that generate the highest transaction values and prioritize them in your business strategy. Consider expanding operations or marketing efforts in those areas.
- Optimize shipping and logistics: Analyze the distribution of shipping methods and their associated costs. Explore opportunities to optimize shipping routes, negotiate better rates with carriers, or consider alternative transportation methods to reduce costs and improve delivery times.
- Tailor payment options: Analyze the preferred payment terms of your customers and ensure you offer a variety of options to cater to their needs. Consider offering incentives for early payments or exploring more efficient payment processing methods.
- Manage customer and supplier relationships: Identify your top customers and suppliers and develop strategies to nurture those relationships. Consider offering loyalty programs, personalized services, or exclusive deals to strengthen these partnerships.
- Monitor and mitigate risks: Analyze customs codes and import/export regulations to ensure compliance and minimize potential risks associated with international trade. Stay informed about changes in regulations and implement appropriate measures to mitigate any potential disruptions.
- Continuously analyze data: Regularly analyze your transaction data to identify emerging trends, potential issues, and opportunities for improvement. Use data-driven insights to inform your business decisions and optimize your operations.