### Exploratory Data Analysis

This notebook presents the exploratory data analysis (EDA) conducted on the\
processed dataset located in the **`data/processed`** folder.

The data preparation steps are documented in the following notebooks:
- **`data_cleaning_primary.ipynb`** – primary data cleaning and preprocessing
- **`data_cleaning_secondary.ipynb`** – secondary data cleaning and preprocessing
- **`combine_primary_secondary.ipynb`** – combining of primary and secondary datasets

The final output, **`combined_primary_secondary.csv`**,\
serves as the input dataset for all analyses in this notebook.

In [None]:
# import all the required modules and functions
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import shapiro
from data_visualizations import plot_line, plot_scatter, \
plot_dumbbell, plot_bar, plot_line_grid, plot_animated_scatter,\
plot_two_histograms

### Read the processed data

In [None]:
# define root folder
ROOT_FOLDER = "."

In [None]:
# First read the combined primary-secondary data csv file into a dataframe
combined_df = pd.read_csv(ROOT_FOLDER + '/data/processed/combined_primary_secondary.csv')
combined_df.head(5)

### What are the overall exports and imports of the USA over the years?

Let’s explore how imports and exports have changed over time,\
identify which is growing faster, and analyze how the trade balance has evolved.


In [None]:
# aggregate import and export data to yearly level
yearly_df = combined_df.groupby('year').agg({
    'import_value': 'sum',
    'export_value': 'sum'
}).reset_index()

# create a new column for trade balance
yearly_df['trade_balance'] = yearly_df['export_value'] - yearly_df['import_value']

# rename columns for better readability
yearly_df.rename(columns={
    'import_value': 'Import',
    'export_value': 'Export',
    'trade_balance': 'Trade Balance',
    'year':'Year'
}, inplace=True)
yearly_df.head()


In [None]:
# Scale to billions
yearly_df[['Import', 'Export', 'Trade Balance']] /= 1e9

# Plot the overall import and export values of USA over the years
# call the function from data_visualizations.py
fig = plot_line(
    df=yearly_df,
    x='Year',
    y=['Import', 'Export', 'Trade Balance'],
    title='U.S. Trade Trends: Imports, Exports & Balance',
    y_label='Value (USD Billions)',
    x_label='Year',
    legend_label=None,
    markers=True
)

# show the plot
fig.show()

Blue line shows steady upward trajectory for overall automotive imports into the US. Exports\
shown in red line also reveal an upward trend, but still consistently lower than the imports.\
Trade balance in green line is persistently negative, indicating a trade deficit for the US.\
This means the US imports more automotive goods than it exports every year.

### What are the overall exports and imports of the US over the years across different categories?

Let see how do these trends differ across various automotive categories?\
Specifically, which categories exhibit significant growth,\
and which demonstrate stable trade volumes over time?

In [None]:
# aggregate data to yearly level by category
yearly_cat_df = combined_df.groupby(['year', 'category']).agg({
    'import_value': 'sum',
    'export_value': 'sum'
}).reset_index()

# create a new column for trade balance
yearly_cat_df['trade_balance'] = yearly_cat_df['export_value'] - yearly_cat_df['import_value']

# rename columns for better readability
yearly_cat_df.rename(columns={
    'year':'Year',
    'import_value': 'Import',
    'export_value': 'Export',
    'trade_balance': 'Trade Balance'
}, inplace=True)

yearly_cat_df.head(5)

In [None]:
# Scale to billions
yearly_cat_df[['Import', 'Export', 'Trade Balance']] /= 1e9

# Plot the overall import and export values of USA over the years for each category
fig = plot_line(
    df=yearly_cat_df,
    x='Year',
    y=['Import', 'Export', 'Trade Balance'],
    title='USA Trade Trends by Category: Imports, Exports, and Trade Balance',
    y_label='Value (USD Billions)',
    x_label='Year',
    markers=True,
    # facet by category
    facet_col='category',
    facet_col_spacing=0.07
)

# simplify facet titles by removing "category="
fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))

# show different y axis for different plots
fig.update_yaxes(matches=None, showticklabels=True)

# add margin between title and plots
fig.update_layout(
    margin=dict(t=100)  # increases top margin (pixels)
)

# show the plot
fig.show()


Import of passenger vehicles is consistently higher than exports, reflecting strong consumer\
demand for foreign brands. Parts and trucks also show a widening gap between import and export\
over time. All categories show a dip in import and exports values in 2009 and 2020, possibly due\
to recession and Covid pandemic, respectively.

### Who are the top trading partners with USA?

Let see who are the top trading partners (importers and exporters) with USA

In [None]:
# aggregate data to country level and select top_n export values
country_col = 'country'
value_col = 'export_value'
top_n = 20
country_export_df = combined_df.groupby(country_col)[value_col].sum().nlargest(top_n).reset_index()
country_export_df.head(5)

In [None]:
# Scale to billions
country_export_df[['export_value']] /= 1e9

# Plot Export Trade by Country
fig = plot_bar(
    df=country_export_df,
    x=value_col,
    y=country_col,
    title='Exports by Country from 2008 to 2022 (Top 20)',
    x_label='Value (USD Billions)',
    # y_label='Country',
)

# show the plot
fig.show()


Canada dominates as top importer, with total value nearing 800 billion. This reflects deep trade ties,\
geographic proximity and integrated supply chains between the US and Canada. Mexico ranks second,\
slightly above 200 billion due to USMCA trade agreement. China, Germany and Saudi Arabia are next,\
representing major global markets with strong demand for the American automotive brands.

In [None]:
# aggregate data to country level and select top_n import values
country_col = 'country'
value_col = 'import_value'
top_n = 20
country_export_df = combined_df.groupby(country_col)[value_col].sum().nlargest(top_n).reset_index()
country_export_df.head(5)

In [None]:
# Scale to billions
country_export_df[['import_value']] /= 1e9

# Plot Import Trade by Country
fig = plot_bar(
    df=country_export_df,
    x=value_col,
    y=country_col,
    title='Imports by Country from 2008 to 2022 (Top 20)',
    x_label='Value (USD Billions)',
    # y_label='Country',
)

# show the plot
fig.show()

Mexico leads the pack with almost 1350 billion in exports to the US, Canada is second with 700 billion\
and Japan stands at #3 reflecting long-standing industrial ties and tech collaboration. China’s position\
is lower than expected, possibly due to trade tensions, tariffs and shift toward nearshoring/reshoring.

### US exports and imports by country, 2022
This section analyzes the import and export dynamics between the US and\
its top trading partners (top exporters or top importers) in 2022.

In [None]:
# find top 20 countries by export value
df = combined_df.copy()
df = df.groupby('country').agg({
    'import_value': 'sum',
    'export_value': 'sum'
}).reset_index()

df = df.sort_values(by='export_value', ascending=False).head(20)
top_exporters = df["country"].tolist()


# find top 20 countries by import value
df = combined_df.copy()
df = df.groupby('country').agg({
    'import_value': 'sum',
    'export_value': 'sum'
}).reset_index()
df = df.sort_values(by='import_value', ascending=False).head(20)
top_importers = df["country"].tolist()

top_countries = list(set(top_exporters) or set(top_importers))

# select data for top countries and year 2022
top_countries_df = combined_df[(combined_df['country'].isin(top_countries))
                               & (combined_df['year'] == 2022)]

# aggregate import and export data for top countries
top_countries_df = top_countries_df.groupby('country').agg({
    'import_value': 'sum',
    'export_value': 'sum'
}).reset_index().sort_values(by='import_value', ascending=True)

# rename columns for better readability
top_countries_df.rename(columns={
    'import_value': 'Import',
    'export_value': 'Export',
    'country':'Country'
}, inplace=True)

top_countries_df.head(5)

In [None]:
# Scale to billions
top_countries_df[['Import', 'Export']] /= 1e9

#  Plot dumbbell plot for top countries
fig = plot_dumbbell(
    df=top_countries_df,
    y='Country',
    x1='Export',
    x2='Import',
    x_label="Value (USD Billions)",
    title="U.S. Exports and Imports with Top Trade Partners in 2022",
)

# increase height
fig.update_layout(
    height=1000
)

# set axes to log scale
# fig.update_xaxes(type="log")
# fig.update_yaxes(type="log")

fig.show()

Mexico shows massive import value to the US, with exports from the US trailing significantly.\
This is potentially because of high volume of parts exports from the US to Mexico.\
Canada is also a top trading partner but with the difference of higher exports from the US\
than the imports. Apart from these two close allies, Japan shows a strong trend of high imports\
into the US compared to the exports. This trend is dominated by vehicle imports and parts especially\
from manufacturers like Toyota, Honda and some Tier-1 automotive parts suppliers. Other notable \
countries - UK, Italy, Brazil, Belgium, France, Chile, Australia etc. countries show balanced trade. 


### Exploring GDP and Exports of Trading Partners

Let’s examine how the GDP of the US’s trading partners relates to their export volumes.

In [None]:
# NOTE:
# Since interpolation requires existing data points to estimate missing values,
# the cleaning step did not fill nulls in gdp and mfn fields for countries with insufficient data.
# These remaining nulls correspond to countries that are not major trading partners of the USA.
# We will drop those rows from the dataset.


# Drop rows where gdp or mfn has NaN values
combined_df_clean = combined_df.dropna(subset=['mfn_by_us_simple_avg',
                                    'mfn_by_us_weighted_avg',
                                    'mfn_on_us_simple_avg',
                                    'mfn_on_us_weighted_avg',
                                    'gdp',
                                    'gdp_2015_adj'], 
                                how='any')

#### Comparing GDP and Exports: USMCA vs. Other Countries

Let’s explore how GDP and export values differ between USMCA (The U.S.-Mexico-Canada Agreement)\
members and other trading partners.

In [None]:
# make a copy of original cleaned df
df = combined_df_clean.copy()

# rename columns for better readability
df.rename(columns={
    'import_value': 'Import',
    'export_value': 'Export',
    'gdp': 'Nominal GDP',
    'gdp_2015_adj': 'GDP',
    'country':'Country'
}, inplace=True)

# Add a column to identify USMCA members
df["Trade bloc"] = df["Country"].apply(
    lambda x: "USMCA" if x in ["USA", "Mexico", "Canada"] else "Other"
)

# # Scale to billions / trillions
# df[['Export']] /= 1e9
# df[['GDP']] /= 1e9


# Keep only rows with positive Export Value as log scale cannot have 0
df = df[df["Export"] > 0]

# Create scatter plot
fig = plot_scatter(
    df=df,
    x='GDP',
    y='Export',
    color="Trade bloc",
    symbol="Trade bloc",
    facet_col='category',
    animation_frame="year",
    hover_data=['Country', 'year', 'GDP', 'Export'],
    x_label='GDP (USD)',
    y_label='Export (USD)',
    title='U.S. Exports vs Country GDP (USMCA Highlighted)'
)

# Compute min/max avoiding zero
y_min = df['Export'].min()
y_max = df['Export'].max()
x_min = df['GDP'].min()
x_max = df['GDP'].max()

# Convert to log10 scale for Plotly range
x_range = [np.log10(x_min*0.9), np.log10(x_max*1.5)]  # optional padding
y_range = [np.log10(y_min*0.9), np.log10(y_max*1.5)]  # optional padding

# set axes to log scale
fig.update_xaxes(dict(type='log', range=x_range))
fig.update_yaxes(dict(type='log', range=y_range))

# simplify facet titles by removing "category="
fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))

# add margin between titles of plot and facet plots
fig.update_layout(
    margin=dict(t=100)  # increases top margin (pixels)
)

# plot
fig.show()

The animated scatter plot clearly highlighted Canada and Mexico as extreme outliers,\
with exceptionally high export values that could dominate any trend analysis.

Because percentiles don’t assume the data is normal,\
they’re a good way to detect outliers in skewed data like exports.

In [None]:
# make a copy of original cleaned df
df = combined_df_clean.copy()

# calculate exports for each country by summing all years exports
df = df.groupby('country').agg(
    {'export_value': 'sum',
    'gdp':'mean'}
).reset_index()

# draw hist
df['export_value'].hist()

In [None]:
# create a percentile column (0 to 100)
df['export_percentile'] = df['export_value'].rank(pct=True) * 100

# Sort by percentile (highest first)
df = df.sort_values(by='export_percentile', ascending=False)

# Preview
df.head()

#### Comparing GDP and Exports: Non-USMCA Countries

Canada and Mexico fall above the 99th percentile of export values,\
meaning fewer than 1% of all countries have exports that high.\
Because these extreme values can dominate the analysis and obscure\
overall patterns, we excluded them to better capture the main trends\
among non-USMCA countries.

We are limiting our analysis to the **top 20 non-USMCA countries**.\
This allows for clearer visualizations and more interpretable trends,\
avoiding clutter and distortion from smaller-volume countries.

In [None]:
# find top 22 countries by export values as we will
# exclude canada and mexico later analysis
df = combined_df.copy()
df = df.groupby('country').agg({
    'import_value': 'sum',
    'export_value': 'sum'
}).reset_index()

df = df.sort_values(by='export_value', ascending=False).head(22)
top_exporters = df["country"].tolist()


# make a copy of original cleaned df
df = combined_df_clean.copy()

# rename columns for better readability
df.rename(columns={
    'import_value': 'Import',
    'export_value': 'Export',
    'gdp': 'Nominal GDP',
    'gdp_2015_adj': 'GDP',
    'country':'Country',
    'year':'Year'
}, inplace=True)


# top non-USMCA country only
df_non_USMCA = df[
    (df["Country"].isin(top_exporters))
    & (~df["Country"].isin(["USA", "Mexico", "Canada"]))
]

In [None]:
# let see how GDP and Export are ditributed
plot_two_histograms(df_non_USMCA, 'Export', 'GDP')

In [None]:
# Do a shapiro test
for col in ['Export', 'GDP']:
    data = df_non_USMCA[col]  # remove missing values
    stat, p = shapiro(data)

    print('p-value',p)
    if p > 0.05:
        print(f'{col} looks normally distributed')
    else:
        print(f'{col} does NOT look normally distributed')
    data = np.log1p(df_non_USMCA[col]) # to avoid zeroes use 1p -- log(1+x)
    stat, p = shapiro(data)

    print('p-value',p)
    if p > 0.05:
        print(f'log {col} looks normally distributed')
    else:
        print(f'log {col} does NOT look normally distributed')

In [None]:
# Histograms of export and GDP confirmed non-normal distributions.
# So, using Spearman correlation to measure the strength of monotonic relationships.
for cat in df_non_USMCA['category'].unique():
    subset = df_non_USMCA[df_non_USMCA['category'] == cat]
    corr = subset['GDP'].corr(subset['Export'], method='spearman')
    print(f"Correlation for {cat}: {corr:.2f}")

These results indicate that GDP is most strongly associated with U.S. exports\
of **parts** (**0.68**), moderately with **passenger vehicles** (**0.29**),\
and shows no clear relationship with **trucks** (**0.06**) among non-USMCA countries.

##### To further illustrate this relationship, plot a trendline using log-transformed values.

In [None]:
# Create scatter plot
fig = plot_scatter(
    df=df_non_USMCA,
    x='GDP',
    y='Export',
    x_label='GDP (USD)',
    y_label='Export (USD)',
    facet_col='category',
    facet_col_spacing=0.07,
    trendline="ols",
    trendline_options=dict(log_x=True, log_y=True),
    hover_data=['Country', 'Year', 'GDP', 'Export'],
    title='U.S. Exports vs GDP of Top 20 Non-USMCA Countries (2008–2022)'
)

# set axes to log scale
fig.update_xaxes(type="log", dtick=1)
fig.update_yaxes(type="log", dtick=1)

# add margin between titles of plot and facet plots
fig.update_layout(
    margin=dict(t=100)  # increases top margin (pixels)
)

# simplify facet titles by removing "category="
fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))

# show different y axis for different plots
fig.update_yaxes(matches=None, showticklabels=True)

# plot
fig.show()

The trendline on log-scaled GDP and export values shows an upward slope,\
for Parts and Passenger Vehicles indicating that higher-GDP countries\
generally import more, consistent with the Spearman correlation.

#### Explore relationship between export value and tariff for top 8 countries

To examine how U.S. trade flows relate to MFN tariff rates, we focused on the top 8 trading partners, aggregating export and import values by category and year. 

In [None]:
# build a new dataframe from combined_df with only country, year,
# export_value and mfn_on_us_simple_avg
export_tariff_df = combined_df_clean[
    ['country', 'year','category', 'export_value', 'mfn_on_us_simple_avg']
].copy()

# agrregate by category
export_tariff_df = export_tariff_df.groupby(['country', 'year']).agg({
    'mfn_on_us_simple_avg': 'mean',
    'export_value': 'sum',
}).reset_index()

# sort new dataframe by year and export_value in descending order
export_tariff_df = export_tariff_df.sort_values(by=['year', 'export_value'],
                                                ascending=[True, False])

# Keep only top 8 countries by export_value for each year
export_tariff_df = export_tariff_df.groupby('year').head(8).reset_index(drop=True)
export_tariff_df.head()

In [None]:
# rename columns for better readability
export_tariff_df.rename(columns={
    'country': 'Country',
    'year':'Year',
    'mfn_on_us_simple_avg': 'Tariff Rate',
    'export_value': 'Export Value'}, 
    inplace=True)


# Plot the animated scatter using the new dataframe
fig = plot_animated_scatter(
        export_tariff_df,
        x_col='Export Value',
        y_col='Tariff Rate',
        size_col='Export Value',
        color_col='Country',
        animation_col='Year',
        title='Correlation between Export Value and Tariff Rate (Top 8 Countries by Export Value)'
    )

# Compute min/max
x_min = export_tariff_df['Export Value'].min()
x_max = export_tariff_df['Export Value'].max()
y_min = export_tariff_df['Tariff Rate'].min()
y_max = export_tariff_df['Tariff Rate'].max()

# Convert to log10 scale for Plotly range
x_range = [np.log10(x_min*0.9), np.log10(x_max*1.5)]  # optional padding
y_range = [y_min-5, y_max+5]  # optional padding

# Keep log x-axis if desired (optional) and style
fig.update_layout(
    xaxis=dict(type='log', title='Export Value (log scale)', range=x_range),
    yaxis=dict(title='Tariff Rate (%)', range=y_range)
)

fig.show()


We visualized the trends using an animated scatter plot, where bubble color represents country\
and size reflects export value. Given the wide range of export volumes, we applied a log\
transformation to normalize scale and improve comparability across countries. Like the GDP analysis,\
the MFN plots revealed Canada and Mexico as clear outliers, driven by disproportionately high trade volumes.

In [None]:
# build a new dataframe from combined_df with only country, year,
# import_value and mfn_by_us_simple_avg
import_tariff_df = combined_df_clean[
    ["country", "year", "category", "import_value", "mfn_by_us_simple_avg"]
].copy()

# agrregate by category
import_tariff_df = (
    import_tariff_df.groupby(["country", "year"])
    .agg(
        {
            "mfn_by_us_simple_avg": "mean",
            "import_value": "sum",
        }
    )
    .reset_index()
)

# sort new dataframe by year and import_value in descending order
import_tariff_df = import_tariff_df.sort_values(
    by=["year", "import_value"], ascending=[True, False]
)

# Keep only top 8 countries by import_value for each year
import_tariff_df = import_tariff_df.groupby("year").head(8).reset_index(drop=True)

# rename columns for better readability
import_tariff_df.rename(
    columns={
        "country": "Country",
        "year": "Year",
        "mfn_by_us_simple_avg": "Tariff Rate",
        "import_value": "Import Value",
    },
    inplace=True,
)

# plot the animation
fig = plot_animated_scatter(
    import_tariff_df,
    x_col="Import Value",
    y_col="Tariff Rate",
    size_col="Import Value",
    color_col="Country",
    animation_col="Year",
    title="Correlation between Import Value and Tariff Rate (Top 8 Countries by Import Value)",
)

# Compute min/max
x_min = import_tariff_df["Import Value"].min()
x_max = import_tariff_df["Import Value"].max()
y_min = import_tariff_df["Tariff Rate"].min()
y_max = import_tariff_df["Tariff Rate"].max()

# Convert to log10 scale for Plotly range
x_range = [np.log10(x_min * 0.9), np.log10(x_max * 1.5)]  # optional padding
y_range = [y_min - 5, y_max + 5]  # optional padding

# Keep log x-axis if desired (optional) and style
fig.update_layout(
    xaxis=dict(type="log", title="Import Value (log scale)", range=x_range),
    yaxis=dict(title="Tariff Rate (%)", range=y_range),
)

fig.show()

We visualized the trends using an animated scatter plot, where bubble color represents country\
and size reflects import value. Given the wide range of export volumes, we applied a log\
transformation to normalize scale and improve comparability across countries. Like the GDP analysis,\
the MFN plots revealed Canada and Mexico as clear outliers, driven by disproportionately high trade volumes.

#### Visualize the trend between tariff and import/export for next top 10 partners.
Note: Canada and Mexico are the top 2 partners for export as well as import.\
Let us filter them out

In [None]:
# build a new dataframe from combined_df_clean with columns for country, category, year,
# export value and mfn_on_us
export_tariff_df1 = combined_df_clean[
    ["country", "category", "year", "export_value", "mfn_on_us_simple_avg"]
].copy()

# Copy export_tariff_df1 into a dataframe called filtered_df with grouping
# on country and aggregate to get top 10 countries by export value for each year
filtered_df = export_tariff_df1[
    export_tariff_df1["country"].isin(["Canada", "Mexico"]) == False
]

# Keep only rows where category is 'Passenger Vehicles'
filtered_df = filtered_df[filtered_df["category"] == "Passenger Vehicles"]

# Aggregate the export value by country and category
filtered_df = (
    filtered_df.groupby(["country"])
    .agg({"export_value": "sum", "mfn_on_us_simple_avg": "mean"})
    .reset_index()
)

# Sort the dataframe by Export_value in descending order and take top 10
# countries by export value for each category
filtered_df = (
    filtered_df.sort_values("export_value", ascending=False)
    .groupby(["country"])
    .head(10)
)
filtered_df.head()

In [None]:
# filtered_df gives top 10 countries by export value for passenger vehicles
# excluding Canada and Mexico. Take the top 10 countries from this dataframe in a list.
top_countries = filtered_df["country"].unique().tolist()[:10]

# Now create a new dataframe from combined_df where the country is in top_10_countries
# and category is 'Passenger Vehicles'
top_countries_df = combined_df[
    (combined_df["country"].isin(top_countries))
    & (combined_df["category"] == "Passenger Vehicles")
].copy()

# Keep columns country, year, Export_value and mfn_on_us_simple_avg
top_countries_df = top_countries_df[
    ["country", "year", "export_value", "mfn_on_us_simple_avg"]
].copy()

# rename for better readability
top_countries_df.rename(
    columns={
        "country": "Country",
        "year": "Year",
        "export_value": "Export",
        "mfn_on_us_simple_avg": "Tariff on US",
    },
    inplace=True,
)

top_countries_df.head(10)

In [None]:
# plot a grid of line charts
plot_line_grid(top_countries_df, x='Year', y1='Export', 
               y2='Tariff on US', group_col='Country', groups=top_countries,
               title = "Export vs Tarrif by Country (Top 10)")

- China shows a strong exports from the US trend until 2018, followed by a sharp decline in 2019, 2020.\
This is paired with tariff spike in 2018 showing inverse relationship. 
- Germany, on the other hand shows fluctuating trend for the exports from the US as well as tariff rates.\
Trade between Germany and the US is most likely influenced by demand cycles. 
- Saudi Arabia and United Arab Emirates importing from the US peaked around 2012, followed by steady decline.\
Apparently, there is a sharp increase in tariff rates in 2018 which might have contributed to export decline in Gulf countries.
- South Korea and Australia show steady rise in their imports from the US. Tariffs rates do not appear to have\
significant influence on the trade value.


#### Create a similar grid plot for imports into USA against tariff imposed by USA <br> on top 10 trading partners

In [None]:
# build a new dataframe from combined_df with columns for country, category, year,
# import value and mfn_by_us
import_tariff_df1 = combined_df_clean[
    ["country", "category", "year", "import_value", "mfn_by_us_simple_avg"]
].copy()

# Filter out Canada and Mexico
filtered_df1 = combined_df_clean[
    ~import_tariff_df1["country"].isin(["Canada", "Mexico"])
]

# Keep only rows where category is 'Passenger Vehicles'
filtered_df1 = filtered_df1[filtered_df1["category"] == "Passenger Vehicles"]

# Aggregate the import value by country and category
filtered_df1 = (
    filtered_df1.groupby(["country"])
    .agg({"import_value": "sum", "mfn_by_us_simple_avg": "mean"})
    .reset_index()
)

# Sort the dataframe by import value in descending order and take top 10 countries by import value
filtered_df1 = filtered_df1.sort_values("import_value", ascending=False).head(10)

filtered_df1.head(10)

In [None]:
top_10_import = filtered_df1["country"].unique().tolist()[:10]

# Now create a new dataframe from combined_df where the country is in top_10_import
# and category is 'Passenger Vehicles'
top_import_countries_df = combined_df[
    (combined_df["country"].isin(top_10_import))
    & (combined_df["category"] == "Passenger Vehicles")
].copy()

# Keep columns country, year, Import_value and mfn_by_us_simple_avg
top_import_countries_df = top_import_countries_df[
    ["country", "year", "import_value", "mfn_by_us_simple_avg"]
].copy()

# rename for readability
top_import_countries_df.rename(
    columns={
        "country": "Country",
        "year": "Year",
        "mfn_by_us_simple_avg": "Tariff by US",
        "import_value": "Import",
    },
    inplace=True,
)


In [None]:
# plot a grid of line charts
plot_line_grid(top_import_countries_df, x='Year', y1='Import',
               y2='Tariff by US', group_col='Country', groups=top_10_import,
               title = "Import vs Tarrif by Country (Top 10)")

- Imports from Japan show a stead growth until 2018, then a dip in 2019 and further decline in 2020.
- Import trend from Germany is rising steadily until 2014, then there is a gradual decline.
- South Korea and other countries exportin to the US also show consistent growth until 2018,\
followed by decline in the after years.\
- Tariff by US shows a common pattern across most of the countries. US’ tariff rate was between\
3.5% to 3.8% until 2018 and it suddenly spiked to over 4.5% in 2018 and then reduced back to pre-2018 range.
- This sudden short-term policy change and then COVID-19 pandemic in 2020 show decline in US’ import values of passenger vehicles.