# Data Exploration with Python using Pandas & Numpy Libraries
---

## Project Contents:

1. Project Information
2. Description of Data
3. Project Objectives
4. Data Analysis
5. Observations / Findings
6. Managerial Insights / Recommendations
---

## 1. Project Information
* Title: Data Exploration with Python using Pandas & Numpy Libraries
* Students : **Shefali Dhingra (055043) , Mohit Agarwal (055024)**
* Group Number: 25

---

## 2. Description of Data

- Data Source: [Dataset link](https://www.kaggle.com/datasets/chakilamvishwas/imports-exports-15000)

### Data Columns Description:

1. Transaction_ID: Unique identifier for each trade transaction.
2. Country: Country of origin or destination for the trade.
3. Product: Product being traded.
4. Import_Export: Indicates whether the transaction is an import or export.
5. Quantity: Amount of the product traded.
6. Value: Monetary value of the product in USD.
7. Date: Date of the transaction.
8. Category: Category of the product (e.g., Electronics, Clothing, Machinery).
9. Port: Port of entry or departure.
10. Customs_Code: Customs or HS code for product classification.
11. Weight: Weight of the product in kilograms.
12. Shipping_Method: Method used for shipping (e.g., Air, Sea, Land).
13. Supplier: Name of the supplier or manufacturer.
14. Customer: Name of the customer or recipient.
15. Invoice_Number: Unique invoice number for the transaction.
16. Payment_Terms: Terms of payment (e.g., Net 30, Net 60, Cash on Delivery).

Data Type: Since the dataset contains multiple entities (countries) and records data over time, this is an example of **Panel Data** (also called longitudinal data)

In [45]:
# Import Relevant Python Libraries

import pandas as pd
import numpy as np

In [47]:
# Load the Test Data

my_df = pd.read_csv("Imports_Exports_Dataset.csv",index_col="Transaction_ID")

In [39]:
# Data Dimensions
print("The dimensions of the data is: ",my_df.shape)

The dimensions of the data is:  (15000, 15)


In [41]:
# Data Variable Type
my_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 15000 entries, e3e70682-c209-4cac-a29f-6fbed82c07cd to 5cc039d0-a052-41fd-bfbb-c9f60c4565ac
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Country          15000 non-null  object 
 1   Product          15000 non-null  object 
 2   Import_Export    15000 non-null  object 
 3   Quantity         15000 non-null  int64  
 4   Value            15000 non-null  float64
 5   Date             15000 non-null  object 
 6   Category         15000 non-null  object 
 7   Port             15000 non-null  object 
 8   Customs_Code     15000 non-null  int64  
 9   Weight           15000 non-null  float64
 10  Shipping_Method  15000 non-null  object 
 11  Supplier         15000 non-null  object 
 12  Customer         15000 non-null  object 
 13  Invoice_Number   15000 non-null  int64  
 14  Payment_Terms    15000 non-null  object 
dtypes: float64(2), int64(3), object(10)
memory 

### Data Variable Type
As observed, the dataset contains:
1. **All** non-null Variables
2. Numbers:
    1. Interger Variables: **3**  (*Quantity, Customs_Code, Invoice_Number*)
    2. Float (Decimal) Variables: **2**  (*Value, Weight*)
3. Text: **9**  (*Country, Product, Import_Export, Category, Port, Shipping_Method, Supplier, Customer, Payment_Terms*)
4. DateTime: **1**   (*Date*)

### Data Variable Category

In [48]:
# Bifurcating the Variables into Index, Categorical (Nominal, Ordinal) and Non-Categorical Variables

Index_variables=['Transaction_ID']
Nominal_Variables=[ 'Country','Product','Import/Export','Category','Port','Shipping Method','Supplier','Customer']
Ordinal_variables=['Payment Terms']
Non_Categorical_Variables=['Quantity','Value','Date','Customs Code','Weight','Invoice Number']

---

## Project Objectives: Trade Analysis

The objective of this project is to analyze **international trade patterns by identifying key trends in imports and exports across countries, products, and time periods.**
The study aims to **uncover trade flow dynamics, assess the most frequently traded goods, and highlight seasonal or regional variations, providing actionable insights into global trade behaviors.**

---

In [14]:
# A3. Create an Unique Sample of 2001 Records 
my_sample = pd.DataFrame.sample(my_df, n=2001, random_state=55043,ignore_index=True)

In [15]:
# A4. Display the Dimensions of Sample Data.
my_sample.shape

(2001, 16)

In [17]:
# A6. Display the First 03 Records of the Sample Data.
my_sample.head(3)

Unnamed: 0,Transaction_ID,Country,Product,Import_Export,Quantity,Value,Date,Category,Port,Customs_Code,Weight,Shipping_Method,Supplier,Customer,Invoice_Number,Payment_Terms
0,0e4bbca8-986c-4bc7-ab91-5f36014340b1,Jordan,station,Import,3587,4880.35,09-04-2022,Furniture,Paulland,143626,2920.1,Land,"Cross, Vargas and Brown",Sylvia Jenkins,43155550,Prepaid
1,57647dd7-9194-4745-9b93-822bcec854aa,Ethiopia,whatever,Export,5297,4322.95,22-04-2020,Electronics,West Pamela,588424,2116.66,Land,Cruz-Beard,Shannon Cooper,2187604,Net 30
2,88229f9e-6b3c-45fb-ab94-a0c52a483c47,Suriname,per,Import,7937,1913.46,22-12-2020,Electronics,North Madisonborough,530712,3118.5,Land,"Cowan, Hoffman and Yu",Andrea Bray,45149516,Net 60


In [19]:
# B1. Display the List of Names of the Variables.
variables=list(my_sample.columns)
print("the Vriables are:", variables)

the Vriables are: ['Transaction_ID', 'Country', 'Product', 'Import_Export', 'Quantity', 'Value', 'Date', 'Category', 'Port', 'Customs_Code', 'Weight', 'Shipping_Method', 'Supplier', 'Customer', 'Invoice_Number', 'Payment_Terms']
