<a href="https://colab.research.google.com/github/Divya-DataAnalystPortfolio/E-Commerce-Sales-Analysis-Data_Transformation-Modeling-DAX-Visualization-Power-BI/blob/main/India%E2%80%99s_Imports_Trade_Insights_from_Asian_Countries_Using_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **India’s Imports: Trade Insights from Asian Countries Using Python(2023–2025)**

# **Domain:**
Finance (International Trade Analytics)

# **Objective:**
•	Maintain and organize import trade data from Asian countries (countries,commodities, quantity, import value, regions, and dates) in a structured format.

•	Reduce data duplication, inconsistencies, and missing values through proper data cleaning and preprocessing techniques.

•	Monitor total import value, quantity, and trade trends over time to understand import patterns.

•	Evaluate import performance by country, commodity, region, and time period to identify key contributors and trends.

•	Analyze trade relationships and identify high-value commodities and major exporting countries.

•	Generate meaningful insights using data analysis and visualization to support better understanding of international trade patterns.


# **Dataset Information:**

**Source**: Indian Open Data Portal

**Location**: Asia

**Available** **Timeline**: 2015 to June 2025

**Selected Timeline** **for Analysis**: 2023 to June 2025

The dataset contains import trade records from 2015 to 2025. However, for focused analysis and computational efficiency, the data from 2023 to June 2025 has been selected for this project.

# **Stage 1**

**Problem Definition and Data Selection**

In [1]:
#IMPORTING LIB

import pandas as pd                  # Data manipulation, cleaning, and analysis library
import matplotlib.pyplot as plt      # Basic plotting and visualization library
import seaborn as sns               # Advanced statistical visualization library built on matplotlib
import plotly.express as px         # Interactive and dynamic visualization library

In [2]:
#  Read from a URL (Online Dataset)
url ="https://raw.githubusercontent.com/Divya-DataAnalystPortfolio/India-s-Imports-Trade-Insights-from-Asian-Countries-Using-Python/refs/heads/main/Imports%20from%20Asian%20Countries%202023-2024.csv"

In [3]:
#CSV
df = pd.read_csv(url)

  df = pd.read_csv(url)


In [4]:
df

Unnamed: 0,Date (date),Country Name (country_name),ISO Alpha 3 Code (alpha_3_code),Country Code (country_code),Region Name (region),Region Code (region_code),Sub-Region Name (sub_region),Sub-Region Code (sub_region_code),Harmonized System Code (hs_code),Commodity Name (commodity),Unit of Quantity (unit),Quantity of commodity (value_qt),Value of commodity quantity in INR (value_rs),Value of commodity quantity in US Dollars (value_dl)
0,01-01-2023,China,CHN,156,Asia,142,Eastern Asia,30,68029900,Other Stone,Kgs,0.32,0.48,0.0
1,01-01-2023,China,CHN,156,Asia,142,Eastern Asia,30,83062990,Oters,Kgs,96.87,191.91,0.23
2,01-01-2023,China,CHN,156,Asia,142,Eastern Asia,30,64069010,Of Wood,Kgs,5.73,87.77,0.11
3,01-01-2023,China,CHN,156,Asia,142,Eastern Asia,30,85171190,Others,Nos,15.16,182.07,0.22
4,01-01-2023,China,CHN,156,Asia,142,Eastern Asia,30,85241900,*Other,Nos,82.5,907.26,1.11
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
628082,01-12-2024,Malaysia,MYS,458,Asia,142,South-eastern Asia,35,63039990,Othr Curtains Etc Other Than Handloom Of Other...,Kgs,138,0.02,0
628083,01-12-2024,Malaysia,MYS,458,Asia,142,South-eastern Asia,35,63041100,"Bedspreads,Knitted Or Crocheted",Nos,1020,0.07,0.01
628084,01-12-2024,Malaysia,MYS,458,Asia,142,South-eastern Asia,35,63041910,Bedsheets And Bed Covers Of Cotton,Nos,6148,1.13,0.13
628085,01-12-2024,Malaysia,MYS,458,Asia,142,South-eastern Asia,35,63041930,Bedsheets And Bed Covers Of Man-Made Fibres,Nos,48,0.01,0


**Dataset description**

Initial EDA (head, info, describe, shape, null checks, duplicate check)

In [5]:
# First 5 rows of the Dataframe
df.head()

Unnamed: 0,Date (date),Country Name (country_name),ISO Alpha 3 Code (alpha_3_code),Country Code (country_code),Region Name (region),Region Code (region_code),Sub-Region Name (sub_region),Sub-Region Code (sub_region_code),Harmonized System Code (hs_code),Commodity Name (commodity),Unit of Quantity (unit),Quantity of commodity (value_qt),Value of commodity quantity in INR (value_rs),Value of commodity quantity in US Dollars (value_dl)
0,01-01-2023,China,CHN,156,Asia,142,Eastern Asia,30,68029900,Other Stone,Kgs,0.32,0.48,0.0
1,01-01-2023,China,CHN,156,Asia,142,Eastern Asia,30,83062990,Oters,Kgs,96.87,191.91,0.23
2,01-01-2023,China,CHN,156,Asia,142,Eastern Asia,30,64069010,Of Wood,Kgs,5.73,87.77,0.11
3,01-01-2023,China,CHN,156,Asia,142,Eastern Asia,30,85171190,Others,Nos,15.16,182.07,0.22
4,01-01-2023,China,CHN,156,Asia,142,Eastern Asia,30,85241900,*Other,Nos,82.5,907.26,1.11


In [6]:
# Last 5 rows of the DataFrame
df.tail()

Unnamed: 0,Date (date),Country Name (country_name),ISO Alpha 3 Code (alpha_3_code),Country Code (country_code),Region Name (region),Region Code (region_code),Sub-Region Name (sub_region),Sub-Region Code (sub_region_code),Harmonized System Code (hs_code),Commodity Name (commodity),Unit of Quantity (unit),Quantity of commodity (value_qt),Value of commodity quantity in INR (value_rs),Value of commodity quantity in US Dollars (value_dl)
628082,01-12-2024,Malaysia,MYS,458,Asia,142,South-eastern Asia,35,63039990,Othr Curtains Etc Other Than Handloom Of Other...,Kgs,138,0.02,0.0
628083,01-12-2024,Malaysia,MYS,458,Asia,142,South-eastern Asia,35,63041100,"Bedspreads,Knitted Or Crocheted",Nos,1020,0.07,0.01
628084,01-12-2024,Malaysia,MYS,458,Asia,142,South-eastern Asia,35,63041910,Bedsheets And Bed Covers Of Cotton,Nos,6148,1.13,0.13
628085,01-12-2024,Malaysia,MYS,458,Asia,142,South-eastern Asia,35,63041930,Bedsheets And Bed Covers Of Man-Made Fibres,Nos,48,0.01,0.0
628086,01-12-2024,Malaysia,MYS,458,Asia,142,South-eastern Asia,35,63049190,Others,Nos,516,0.02,0.0


In [7]:
# Number of rows and columns in the DataFrame
df.shape

(628087, 14)

In [8]:
# Summary of the DataFrame including column types and count of non-null values
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 628087 entries, 0 to 628086
Data columns (total 14 columns):
 #   Column                                                Non-Null Count   Dtype 
---  ------                                                --------------   ----- 
 0   Date (date)                                           628087 non-null  object
 1   Country Name (country_name)                           628087 non-null  object
 2   ISO Alpha 3 Code (alpha_3_code)                       628087 non-null  object
 3   Country Code (country_code)                           628087 non-null  int64 
 4   Region Name (region)                                  628087 non-null  object
 5   Region Code (region_code)                             628087 non-null  int64 
 6   Sub-Region Name (sub_region)                          628087 non-null  object
 7   Sub-Region Code (sub_region_code)                     628087 non-null  int64 
 8   Harmonized System Code (hs_code)                      

In [9]:
# Get the list of column names in the DataFrame
df.columns

Index(['Date (date)', 'Country Name (country_name)',
       'ISO Alpha 3 Code (alpha_3_code)', 'Country Code (country_code)',
       'Region Name (region)', 'Region Code (region_code)',
       'Sub-Region Name (sub_region)', 'Sub-Region Code (sub_region_code)',
       'Harmonized System Code (hs_code)', 'Commodity Name (commodity)',
       'Unit of Quantity (unit)', 'Quantity of commodity (value_qt)',
       'Value of commodity quantity in INR (value_rs)',
       'Value of commodity quantity in US Dollars (value_dl)'],
      dtype='object')

In [10]:
# Display the data types of each column in the DataFrame
df.dtypes

Unnamed: 0,0
Date (date),object
Country Name (country_name),object
ISO Alpha 3 Code (alpha_3_code),object
Country Code (country_code),int64
Region Name (region),object
Region Code (region_code),int64
Sub-Region Name (sub_region),object
Sub-Region Code (sub_region_code),int64
Harmonized System Code (hs_code),int64
Commodity Name (commodity),object


In [11]:
# Summary statistics for numerical columns in the DataFrame
df.describe()

Unnamed: 0,Country Code (country_code),Region Code (region_code),Sub-Region Code (sub_region_code),Harmonized System Code (hs_code)
count,628087.0,628087.0,628087.0,628087.0
mean,425.549766,142.0,48.915359,63337610.0
std,238.006436,0.0,40.392935,23759310.0
min,4.0,142.0,30.0,1012990.0
25%,156.0,142.0,30.0,39259090.0
50%,392.0,142.0,30.0,71131960.0
75%,702.0,142.0,35.0,84772000.0
max,887.0,142.0,145.0,99930020.0


In [12]:
# Summary statistics for categorical columns
df.describe(include='object')

Unnamed: 0,Date (date),Country Name (country_name),ISO Alpha 3 Code (alpha_3_code),Region Name (region),Sub-Region Name (sub_region),Commodity Name (commodity),Unit of Quantity (unit),Quantity of commodity (value_qt),Value of commodity quantity in INR (value_rs),Value of commodity quantity in US Dollars (value_dl)
count,628087,628087,628087,628087,628087,628087,628078,628087,628087,628087
unique,24,50,50,1,5,8736,84,114495,118548,7423
top,01-10-2024,China,CHN,Asia,Eastern Asia,Others,Kgs,0,0,0
freq,28522,130259,130259,628087,316164,43523,378692,40638,8291,116149


In [13]:
# Summary statistics for all columns, including numeric and categorical
df.describe(include='all')

Unnamed: 0,Date (date),Country Name (country_name),ISO Alpha 3 Code (alpha_3_code),Country Code (country_code),Region Name (region),Region Code (region_code),Sub-Region Name (sub_region),Sub-Region Code (sub_region_code),Harmonized System Code (hs_code),Commodity Name (commodity),Unit of Quantity (unit),Quantity of commodity (value_qt),Value of commodity quantity in INR (value_rs),Value of commodity quantity in US Dollars (value_dl)
count,628087,628087,628087,628087.0,628087,628087.0,628087,628087.0,628087.0,628087,628078,628087.0,628087.0,628087.0
unique,24,50,50,,1,,5,,,8736,84,114495.0,118548.0,7423.0
top,01-10-2024,China,CHN,,Asia,,Eastern Asia,,,Others,Kgs,0.0,0.0,0.0
freq,28522,130259,130259,,628087,,316164,,,43523,378692,40638.0,8291.0,116149.0
mean,,,,425.549766,,142.0,,48.915359,63337610.0,,,,,
std,,,,238.006436,,0.0,,40.392935,23759310.0,,,,,
min,,,,4.0,,142.0,,30.0,1012990.0,,,,,
25%,,,,156.0,,142.0,,30.0,39259090.0,,,,,
50%,,,,392.0,,142.0,,30.0,71131960.0,,,,,
75%,,,,702.0,,142.0,,35.0,84772000.0,,,,,


In [14]:
# Count the number of non-null values in each column of the DataFrame
df.count()

Unnamed: 0,0
Date (date),628087
Country Name (country_name),628087
ISO Alpha 3 Code (alpha_3_code),628087
Country Code (country_code),628087
Region Name (region),628087
Region Code (region_code),628087
Sub-Region Name (sub_region),628087
Sub-Region Code (sub_region_code),628087
Harmonized System Code (hs_code),628087
Commodity Name (commodity),628087


In [15]:
#Total null values in columns
df.isnull().sum()

Unnamed: 0,0
Date (date),0
Country Name (country_name),0
ISO Alpha 3 Code (alpha_3_code),0
Country Code (country_code),0
Region Name (region),0
Region Code (region_code),0
Sub-Region Name (sub_region),0
Sub-Region Code (sub_region_code),0
Harmonized System Code (hs_code),0
Commodity Name (commodity),0
