# Test Analysis 
Use data from 2015 to 2016 to test `Misclassifications_Analysis`

[The main 2017-2018 data analysis is here.](https://github.com/GateHouseMedia/Chinese-Imports/blob/master/data/Missclassifications_Analysis.ipynb)

Note: I used the same method but replaced the 2018 data with the 2016 data, and replaced the 2017 data with the 2015 data.


In [1]:
# import 
import pandas as pd
import re
import matplotlib.pyplot as plt

In [2]:
# dataframe setting
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', 500)

# 1. Data Acquisition

In [3]:
# read in 2015-16 data
df_2016= pd.read_csv('2016.csv')
df_2015= pd.read_csv('2015.csv')

In [4]:
df_2016.head()

Unnamed: 0,Commodity,Time,Country,Calculated Duty ($US),Dutiable Value ($US),Customs Value (Cons) ($US),Customs Value (Gen) ($US),Unnamed: 7
0,"0101290090 Horses, Live, Nesoi (no)",July 2016,China,,,30000,30000,
1,"0106110000 Primates, Live (no)",January 2016,China,,,3445800,3445800,
2,"0106110000 Primates, Live (no)",February 2016,China,,,3370800,3370800,
3,"0106110000 Primates, Live (no)",March 2016,China,,,4142395,4142395,
4,"0106110000 Primates, Live (no)",April 2016,China,,,4430830,4430830,


### Some definitions:


(Cons)	-	"Imports for consumption." Measures the total of merchandise that has physically cleared through Customs either entering consumption channels immediately or entering after withdrawal for consumption from bonded warehouses or Foreign Trade Zones under U.S. Customs and Border Protection (CBP) custody.

(Gen)	-	"General Imports." This measures all merchandise imported from foreign countries, whether such merchandise enters consumption channels immediately or is entered into bonded warehouses or Foreign Trade Zones under Customs custody.
 		
Customs Value (Gen)  ($US)	-	The value of goods imported as appraised by U.S. Customs and Border Protection. This value is generally defined as the price actually paid or payable for merchandise when sold for exportation to the U.S. It excludes U.S. import duties, freight, insurance, and other charges incurred in bringing the merchandise to the U.S. (General Imports)

Customs Value (Cons) ($US)	-	The value of goods imported as appraised by U.S. Customs and Border Protection. Excludes freight and duties. (Imports for consumption)

Dutiable Value ($US)	-	The customs value of imported goods subject to duties. (Imports for consumption)

Calculated Duty($US)	-	Estimates of calculated duty do not necessarily reflect amounts of duty paid and should, therefore, be used with caution. The inclusion in the figures of some U.S. products returned after processing and assembly abroad, for which a portion of the value is eligible for duty free consideration, may cause these duty figures to be somewhat overstated as a result. In cases where articles are dutiable at various or special rates, a dutiable value is shown but no duty is calculated. Thus, there is an understatement in the estimates of calculated duty to the extent that these situations exist.




In [5]:
len(df_2016)

127024

In [6]:
df_2016.dtypes

Commodity                       object
Time                            object
Country                         object
Calculated Duty ($US)           object
Dutiable Value ($US)            object
Customs  Value (Cons) ($US)     object
Customs Value (Gen) ($US)       object
Unnamed: 7                     float64
dtype: object

# 2. Data Preparation
Clean `2016.csv`

In [7]:
# delete unnamed: 7 forever
df_2016.drop('Unnamed: 7', axis=1, inplace=True)
# df_2016.rename(columns={'Customs  Value (Cons) ($US)': 'Customs Value (Cons) ($US)'})
df_2016.head()

Unnamed: 0,Commodity,Time,Country,Calculated Duty ($US),Dutiable Value ($US),Customs Value (Cons) ($US),Customs Value (Gen) ($US)
0,"0101290090 Horses, Live, Nesoi (no)",July 2016,China,,,30000,30000
1,"0106110000 Primates, Live (no)",January 2016,China,,,3445800,3445800
2,"0106110000 Primates, Live (no)",February 2016,China,,,3370800,3370800
3,"0106110000 Primates, Live (no)",March 2016,China,,,4142395,4142395
4,"0106110000 Primates, Live (no)",April 2016,China,,,4430830,4430830


In [8]:
# this function removes the commas and returns a float
def clean_dollar_values(number_string):
    return float(str(number_string).replace(",", ""))

In [9]:
# Why there are additional customs columns

df_2016['Customs Value (Gen) ($US)'] = df_2016['Customs Value (Gen) ($US)']\
    .apply(clean_dollar_values)\
    .fillna(0)\
    .astype(int)

df_2016['Customs Value (Cons) ($US)'] = df_2016['Customs  Value (Cons) ($US)']\
    .apply(clean_dollar_values)\
    .fillna(0)\
    .astype(int)

df_2016['Calculated Duty ($US)'] = df_2016['Calculated Duty ($US)']\
    .apply(clean_dollar_values)\
    .fillna(0)\
    .astype(int)

df_2016['Dutiable Value ($US)'] = df_2016['Dutiable Value ($US)']\
    .apply(clean_dollar_values)\
    .fillna(0)\
    .astype(int)
    
df_2016.dtypes

Commodity                      object
Time                           object
Country                        object
Calculated Duty ($US)           int64
Dutiable Value ($US)            int64
Customs  Value (Cons) ($US)    object
Customs Value (Gen) ($US)       int64
Customs Value (Cons) ($US)      int64
dtype: object

In [10]:
# add a "code" column
df_2016['code'] = df_2016['Commodity'].str.extract(r"(^\d\d\d\d\d\d\d\d\d\d) .*", expand=False)

df_2016.tail()

Unnamed: 0,Commodity,Time,Country,Calculated Duty ($US),Dutiable Value ($US),Customs Value (Cons) ($US),Customs Value (Gen) ($US),Customs Value (Cons) ($US).1,code
127019,9999950000 Estimated Imports Of Low Valued Transactions (x),August 2016,China,0,282189072,282189072,282189072,282189072,9999950000
127020,9999950000 Estimated Imports Of Low Valued Transactions (x),September 2016,China,0,262277782,262277782,262277782,262277782,9999950000
127021,9999950000 Estimated Imports Of Low Valued Transactions (x),October 2016,China,0,260432509,260432509,260432509,260432509,9999950000
127022,9999950000 Estimated Imports Of Low Valued Transactions (x),November 2016,China,0,262633855,262633855,262633855,262633855,9999950000
127023,9999950000 Estimated Imports Of Low Valued Transactions (x),December 2016,China,0,259496947,259496947,259496947,259496947,9999950000


In [11]:
# add a "year" column
df_2016['year'] = df_2016['Time'].str.extract(r"(\d\d\d\d)", expand=False)
df_2016.head()

Unnamed: 0,Commodity,Time,Country,Calculated Duty ($US),Dutiable Value ($US),Customs Value (Cons) ($US),Customs Value (Gen) ($US),Customs Value (Cons) ($US).1,code,year
0,"0101290090 Horses, Live, Nesoi (no)",July 2016,China,0,0,30000,30000,30000,101290090,2016
1,"0106110000 Primates, Live (no)",January 2016,China,0,0,3445800,3445800,3445800,106110000,2016
2,"0106110000 Primates, Live (no)",February 2016,China,0,0,3370800,3370800,3370800,106110000,2016
3,"0106110000 Primates, Live (no)",March 2016,China,0,0,4142395,4142395,4142395,106110000,2016
4,"0106110000 Primates, Live (no)",April 2016,China,0,0,4430830,4430830,4430830,106110000,2016


In [12]:
# add a "month" column 
df_2016['month'] = df_2016['Time'].str.extract(r"(.*) \d\d\d\d", expand=False)
df_2016.tail()

Unnamed: 0,Commodity,Time,Country,Calculated Duty ($US),Dutiable Value ($US),Customs Value (Cons) ($US),Customs Value (Gen) ($US),Customs Value (Cons) ($US).1,code,year,month
127019,9999950000 Estimated Imports Of Low Valued Transactions (x),August 2016,China,0,282189072,282189072,282189072,282189072,9999950000,2016,August
127020,9999950000 Estimated Imports Of Low Valued Transactions (x),September 2016,China,0,262277782,262277782,262277782,262277782,9999950000,2016,September
127021,9999950000 Estimated Imports Of Low Valued Transactions (x),October 2016,China,0,260432509,260432509,260432509,260432509,9999950000,2016,October
127022,9999950000 Estimated Imports Of Low Valued Transactions (x),November 2016,China,0,262633855,262633855,262633855,262633855,9999950000,2016,November
127023,9999950000 Estimated Imports Of Low Valued Transactions (x),December 2016,China,0,259496947,259496947,259496947,259496947,9999950000,2016,December


# 3. Tariff Rate Calculation

In [13]:
# add estimated tariff column

df_2016["tariff"] = df_2016['Calculated Duty ($US)'] / df_2016['Customs Value (Cons) ($US)']  * 100
df_2016.tail()

Unnamed: 0,Commodity,Time,Country,Calculated Duty ($US),Dutiable Value ($US),Customs Value (Cons) ($US),Customs Value (Gen) ($US),Customs Value (Cons) ($US).1,code,year,month,tariff
127019,9999950000 Estimated Imports Of Low Valued Transactions (x),August 2016,China,0,282189072,282189072,282189072,282189072,9999950000,2016,August,0.0
127020,9999950000 Estimated Imports Of Low Valued Transactions (x),September 2016,China,0,262277782,262277782,262277782,262277782,9999950000,2016,September,0.0
127021,9999950000 Estimated Imports Of Low Valued Transactions (x),October 2016,China,0,260432509,260432509,260432509,260432509,9999950000,2016,October,0.0
127022,9999950000 Estimated Imports Of Low Valued Transactions (x),November 2016,China,0,262633855,262633855,262633855,262633855,9999950000,2016,November,0.0
127023,9999950000 Estimated Imports Of Low Valued Transactions (x),December 2016,China,0,259496947,259496947,259496947,259496947,9999950000,2016,December,0.0


In [14]:
print("The highest calculated tariff in 2016 was {:.2f}% and the lowest was {:.2f}%."\
          .format(
            df_2016.tariff.max(),
            df_2016.tariff.min()
))

The highest calculated tariff in 2016 was 66.49% and the lowest was 0.00%.


# 4. Data Combination

### Dutiable 2016 DataFrame:

In [15]:
df_2016_dutiable = df_2016[['Commodity','code','year','month','Dutiable Value ($US)']].copy()
df_2016_dutiable.head()

Unnamed: 0,Commodity,code,year,month,Dutiable Value ($US)
0,"0101290090 Horses, Live, Nesoi (no)",101290090,2016,July,0
1,"0106110000 Primates, Live (no)",106110000,2016,January,0
2,"0106110000 Primates, Live (no)",106110000,2016,February,0
3,"0106110000 Primates, Live (no)",106110000,2016,March,0
4,"0106110000 Primates, Live (no)",106110000,2016,April,0


In [16]:
# rename Dutiable Value ($US)
df_2016_dutiable.rename(columns = {'Dutiable Value ($US)':'dutiable'}, inplace = True)
df_2016_dutiable.tail()

Unnamed: 0,Commodity,code,year,month,dutiable
127019,9999950000 Estimated Imports Of Low Valued Transactions (x),9999950000,2016,August,282189072
127020,9999950000 Estimated Imports Of Low Valued Transactions (x),9999950000,2016,September,262277782
127021,9999950000 Estimated Imports Of Low Valued Transactions (x),9999950000,2016,October,260432509
127022,9999950000 Estimated Imports Of Low Valued Transactions (x),9999950000,2016,November,262633855
127023,9999950000 Estimated Imports Of Low Valued Transactions (x),9999950000,2016,December,259496947


In [17]:
# Convert the data to monthly
df_2016_dutiable = df_2016_dutiable.pivot_table('dutiable', ['Commodity', 'code', 'year'], 'month')
df_2016_dutiable.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,month,April,August,December,February,January,July,June,March,May,November,October,September
Commodity,code,year,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
"0101290090 Horses, Live, Nesoi (no)",101290090,2016,,,,,,0.0,,,,,,
"0106110000 Primates, Live (no)",106110000,2016,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"0106199195 Mammals, Live, Nesoi (no)",106199195,2016,,,,,0.0,,,0.0,,0.0,,0.0
"0106900110 Worms, Live (x)",106900110,2016,,,,,,0.0,,,,,,
"0206100000 Offal Of Bovines, Edible, Fresh Or Chilled (kg)",206100000,2016,,,,,,0.0,,,,,,


In [18]:
df_2016_dutiable.reset_index( drop=False, inplace=True )
df_2016_dutiable.reindex(
    [ 
        'January', 'February', 'March', 
        'April', 'May', 'June', 'July', 
        'August', 'September', 'October' 
    ], axis=1
)
df_2016_dutiable.tail()                            
                               

month,Commodity,code,year,April,August,December,February,January,July,June,March,May,November,October,September
14565,9817004400 Motion Pict Film On Which Pict/soud Recd Devlp/not (x),9817004400,2016,,,,,,,,,,,0.0,
14566,"9817005000 Agricultural/horticultural Mach,equip&implements (x)",9817005000,2016,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
14567,"9817006000 Pts Used In Articles In 8432,8433,8434& 8436 (x)",9817006000,2016,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
14568,"9818000700 Equip/pts Repaired In Foreign Cty On Vessel, Nesoi (x)",9818000700,2016,2545054.0,3433928.0,21533.0,27869.0,9188.0,24885.0,35917.0,12298451.0,6973200.0,,110375.0,1484666.0
14569,9999950000 Estimated Imports Of Low Valued Transactions (x),9999950000,2016,237356439.0,282189072.0,259496947.0,250859404.0,253917504.0,253658918.0,255685479.0,239033484.0,252942614.0,262633855.0,260432509.0,262277782.0


### Tariff 2016 DataFrame:

In [19]:
df_2016_tariff = df_2016[['Commodity','code','year','month','tariff']].copy()
df_2016_tariff.head()

Unnamed: 0,Commodity,code,year,month,tariff
0,"0101290090 Horses, Live, Nesoi (no)",101290090,2016,July,0.0
1,"0106110000 Primates, Live (no)",106110000,2016,January,0.0
2,"0106110000 Primates, Live (no)",106110000,2016,February,0.0
3,"0106110000 Primates, Live (no)",106110000,2016,March,0.0
4,"0106110000 Primates, Live (no)",106110000,2016,April,0.0


In [20]:
# Convert the month row to column
df_2016_tariff=df_2016_tariff.pivot_table('tariff', ['Commodity', 'code', 'year'], 'month')

df_2016_tariff.reset_index( drop=False, inplace=True )
df_2016_tariff.reindex(
    [ 
        'January', 'February', 'March', 
        'April', 'May', 'June', 'July', 
        'August', 'September', 'October' 
    ], axis=1
)

df_2016_tariff.tail()                            
                             

month,Commodity,code,year,April,August,December,February,January,July,June,March,May,November,October,September
14565,9817004400 Motion Pict Film On Which Pict/soud Recd Devlp/not (x),9817004400,2016,,,,,,,,,,,0.0,
14566,"9817005000 Agricultural/horticultural Mach,equip&implements (x)",9817005000,2016,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
14567,"9817006000 Pts Used In Articles In 8432,8433,8434& 8436 (x)",9817006000,2016,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
14568,"9818000700 Equip/pts Repaired In Foreign Cty On Vessel, Nesoi (x)",9818000700,2016,50.000039,50.000058,50.006966,50.001794,50.0,50.002009,50.001392,50.000012,50.0,,50.000453,50.000067
14569,9999950000 Estimated Imports Of Low Valued Transactions (x),9999950000,2016,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Combine the Dutiable 2016 and Tariff 2016 DataFrames:

In [21]:
tariff_dutiable_2016 = pd.merge(
    df_2016_dutiable,
    df_2016_tariff,
    on="code",
    how="left",
    suffixes=["_dutiable", "_tariff"])

tariff_dutiable_2016.head()

month,Commodity_dutiable,code,year_dutiable,April_dutiable,August_dutiable,December_dutiable,February_dutiable,January_dutiable,July_dutiable,June_dutiable,March_dutiable,May_dutiable,November_dutiable,October_dutiable,September_dutiable,Commodity_tariff,year_tariff,April_tariff,August_tariff,December_tariff,February_tariff,January_tariff,July_tariff,June_tariff,March_tariff,May_tariff,November_tariff,October_tariff,September_tariff
0,"0101290090 Horses, Live, Nesoi (no)",101290090,2016,,,,,,0.0,,,,,,,"0101290090 Horses, Live, Nesoi (no)",2016,,,,,,0.0,,,,,,
1,"0106110000 Primates, Live (no)",106110000,2016,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"0106110000 Primates, Live (no)",2016,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"0106199195 Mammals, Live, Nesoi (no)",106199195,2016,,,,,0.0,,,0.0,,0.0,,0.0,"0106199195 Mammals, Live, Nesoi (no)",2016,,,,,0.0,,,0.0,,0.0,,0.0
3,"0106900110 Worms, Live (x)",106900110,2016,,,,,,0.0,,,,,,,"0106900110 Worms, Live (x)",2016,,,,,,0.0,,,,,,
4,"0206100000 Offal Of Bovines, Edible, Fresh Or Chilled (kg)",206100000,2016,,,,,,0.0,,,,,,,"0206100000 Offal Of Bovines, Edible, Fresh Or Chilled (kg)",2016,,,,,,0.0,,,,,,


### Clean `2015.csv` as well and create a dutiable 2015 dataframe

In [22]:
df_2015.head()

Unnamed: 0,Commodity,Time,Country,Calculated Duty ($US),Dutiable Value ($US),Customs Value (Cons) ($US),Customs Value (Gen) ($US),Unnamed: 7
0,"0101290090 Horses, Live, Nesoi (no)",February 2015,China,,,92000,92000,
1,"0101290090 Horses, Live, Nesoi (no)",August 2015,China,,,128000,128000,
2,"0101290090 Horses, Live, Nesoi (no)",December 2015,China,,,20000,20000,
3,"0106110000 Primates, Live (no)",January 2015,China,,,781000,781000,
4,"0106110000 Primates, Live (no)",February 2015,China,,,519000,519000,


In [23]:
df_2015['Customs Value (Gen) ($US)'] = df_2015['Customs Value (Gen) ($US)']\
    .apply(clean_dollar_values)\
    .fillna(0)\
    .astype(int)

df_2015['Customs Value (Cons) ($US)'] = df_2015['Customs  Value (Cons) ($US)']\
    .apply(clean_dollar_values)\
    .fillna(0)\
    .astype(int)

df_2015['Calculated Duty ($US)'] = df_2015['Calculated Duty ($US)']\
    .apply(clean_dollar_values)\
    .fillna(0)\
    .astype(int)

df_2015['Dutiable Value ($US)'] = df_2015['Dutiable Value ($US)']\
    .apply(clean_dollar_values)\
    .fillna(0)\
    .astype(int)
    
df_2015.dtypes

Commodity                       object
Time                            object
Country                         object
Calculated Duty ($US)            int64
Dutiable Value ($US)             int64
Customs  Value (Cons) ($US)     object
Customs Value (Gen) ($US)        int64
Unnamed: 7                     float64
Customs Value (Cons) ($US)       int64
dtype: object

In [24]:
# add a "code" column
df_2015['code']=df_2015['Commodity'].str.extract(r"(^\d\d\d\d\d\d\d\d\d\d) .*", expand=False)

# add a "year" column
df_2015['year']=df_2015['Time'].str.extract(r"(\d\d)", expand=False)

# add a "month" column 
df_2015['month']=df_2015['Time'].str.extract(r"([A-Z][a-z][a-z])", expand=False)

df_2015.head()

Unnamed: 0,Commodity,Time,Country,Calculated Duty ($US),Dutiable Value ($US),Customs Value (Cons) ($US),Customs Value (Gen) ($US),Unnamed: 7,Customs Value (Cons) ($US).1,code,year,month
0,"0101290090 Horses, Live, Nesoi (no)",February 2015,China,0,0,92000,92000,,92000,101290090,20,Feb
1,"0101290090 Horses, Live, Nesoi (no)",August 2015,China,0,0,128000,128000,,128000,101290090,20,Aug
2,"0101290090 Horses, Live, Nesoi (no)",December 2015,China,0,0,20000,20000,,20000,101290090,20,Dec
3,"0106110000 Primates, Live (no)",January 2015,China,0,0,781000,781000,,781000,106110000,20,Jan
4,"0106110000 Primates, Live (no)",February 2015,China,0,0,519000,519000,,519000,106110000,20,Feb


In [25]:
# select columns from df_2015 to new dutiable 2015 datafram
df_2015_dutiable = df_2015[['Commodity','code','year','month','Dutiable Value ($US)']].copy()

# change column name
df_2015_dutiable.rename(columns = {'Dutiable Value ($US)':'dutiable'}, inplace = True)


# Convert the month row to column
df_2015_dutiable = df_2015_dutiable.pivot_table('dutiable', ['Commodity', 'code', 'year'], 'month')

df_2015_dutiable.reset_index( drop=False, inplace=True )
df_2015_dutiable.reindex(
    [ 
        'January', 'February', 'March', 
        'April', 'May', 'June', 'July', 
        'August', 'September', 'October' 
    ], axis=1
)

df_2015_dutiable.head()

month,Commodity,code,year,Apr,Aug,Dec,Feb,Jan,Jul,Jun,Mar,May,Nov,Oct,Sep
0,"0101290090 Horses, Live, Nesoi (no)",101290090,20,,0.0,0.0,0.0,,,,,,,,
1,"0106110000 Primates, Live (no)",106110000,20,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"0106199195 Mammals, Live, Nesoi (no)",106199195,20,,,,,0.0,,0.0,0.0,,,,
3,"0106200000 Reptiles (including Snakes And Turtles), Live (no)",106200000,20,,0.0,,,,0.0,,,,,,0.0
4,"0106900180 Animals, Live, Nesoi (x)",106900180,20,0.0,,,,,,,,,,,


In [26]:
# add suffix to dutiable 2017
keep_same = ['Commodity', 'code', 'year']
month_dict = {
    "Apr": "April",
    "Aug": "August",
    "Dec": "December",
    "Feb": "February",
    "Jan": "January",
    "Jul": "July",
    "Jun": "June",
    "Mar": "March",
    "May": "May",
    "Nov": "November",
    "Oct": "October",
    "Sep": "September"
}
df_2015_dutiable.columns = ['{}{}'\
                                .format(
                                    c if c in keep_same else month_dict[c], 
                                    '' if c in keep_same else '_dutiable'
                                )
               for c in df_2015_dutiable.columns]
df_2015_dutiable.tail()

Unnamed: 0,Commodity,code,year,April_dutiable,August_dutiable,December_dutiable,February_dutiable,January_dutiable,July_dutiable,June_dutiable,March_dutiable,May_dutiable,November_dutiable,October_dutiable,September_dutiable
13787,"9817005000 Agricultural/horticultural Mach,equip&implements (x)",9817005000,20,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
13788,"9817006000 Pts Used In Articles In 8432,8433,8434& 8436 (x)",9817006000,20,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
13789,"9817007000 Animal, Game, Imported To Be Liberated In The U.S. (no)",9817007000,20,,,,,0.0,,,0.0,,,,
13790,"9818000700 Equip/pts Repaired In Foreign Cty On Vessel, Nesoi (x)",9818000700,20,7498597.0,2518076.0,155224.0,2500000.0,14756.0,4646667.0,11060.0,11096.0,6631013.0,2232456.0,257838.0,67920.0
13791,9999950000 Estimated Imports Of Low Valued Transactions (x),9999950000,20,255723929.0,286378975.0,272857709.0,219431227.0,252327033.0,297775157.0,276534619.0,277290487.0,273449300.0,266477846.0,266546240.0,281601617.0


### Combine 2015 and 2016 data

In [27]:
df_2015_2016 = pd.merge(
    df_2015_dutiable,
    tariff_dutiable_2016,
    on="code",
    how="left",
    suffixes=["_2015", "_2016"]
)

df_2015_2016.head()

Unnamed: 0,Commodity,code,year,April_dutiable_2015,August_dutiable_2015,December_dutiable_2015,February_dutiable_2015,January_dutiable_2015,July_dutiable_2015,June_dutiable_2015,March_dutiable_2015,May_dutiable_2015,November_dutiable_2015,October_dutiable_2015,September_dutiable_2015,Commodity_dutiable,year_dutiable,April_dutiable_2016,August_dutiable_2016,December_dutiable_2016,February_dutiable_2016,January_dutiable_2016,July_dutiable_2016,June_dutiable_2016,March_dutiable_2016,May_dutiable_2016,November_dutiable_2016,October_dutiable_2016,September_dutiable_2016,Commodity_tariff,year_tariff,April_tariff,August_tariff,December_tariff,February_tariff,January_tariff,July_tariff,June_tariff,March_tariff,May_tariff,November_tariff,October_tariff,September_tariff
0,"0101290090 Horses, Live, Nesoi (no)",101290090,20,,0.0,0.0,0.0,,,,,,,,,"0101290090 Horses, Live, Nesoi (no)",2016.0,,,,,,0.0,,,,,,,"0101290090 Horses, Live, Nesoi (no)",2016.0,,,,,,0.0,,,,,,
1,"0106110000 Primates, Live (no)",106110000,20,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"0106110000 Primates, Live (no)",2016.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"0106110000 Primates, Live (no)",2016.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"0106199195 Mammals, Live, Nesoi (no)",106199195,20,,,,,0.0,,0.0,0.0,,,,,"0106199195 Mammals, Live, Nesoi (no)",2016.0,,,,,0.0,,,0.0,,0.0,,0.0,"0106199195 Mammals, Live, Nesoi (no)",2016.0,,,,,0.0,,,0.0,,0.0,,0.0
3,"0106200000 Reptiles (including Snakes And Turtles), Live (no)",106200000,20,,0.0,,,,0.0,,,,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,"0106900180 Animals, Live, Nesoi (x)",106900180,20,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


# 5. Comparison and Match & 6. Find Tariff Changes

### The estimated tariff rate doesn't chage and the dutiable value increases

In [28]:
# how many HTS code in October 2016
df_2015_2016["October_tariff"].count()

9913

In [29]:
# In 2018, the tariff didn't change in January and August.
# The $50 billion worth of tariffs on Chinese goods that went into effect in August.
# We are trying to capture the full effect of the tariff,
# so we're looking at October, which would capture a full month's worth of tariffed goods.

# Let'w apply the same thing towards 2015 and 2016

df_2015_2016["oct_jan_tariff_change"] = df_2015_2016["October_tariff"] - df_2015_2016["January_tariff"]




In [30]:
df_2015_2016["oct_jan_tariff_change"].describe()

count    8951.000000
mean       -0.035572
std         0.680888
min       -17.045555
25%        -0.000238
50%         0.000000
75%         0.000140
max        33.043478
Name: oct_jan_tariff_change, dtype: float64

In [31]:
# compare 2015 and 2016's dutiable value
df_2015_2016["oct_change_in_dutiable"] = (df_2015_2016["October_dutiable_2016"] - \
                                          df_2015_2016["October_dutiable_2015"]) \
                                          / df_2015_2016["October_dutiable_2015"] * 100

In [32]:
df_2015_2016["oct_change_in_dutiable"].describe()

count    5702.000000
mean             inf
std              NaN
min      -100.000000
25%       -35.579650
50%         0.563595
75%        56.145792
max              inf
Name: oct_change_in_dutiable, dtype: float64

In [33]:
df_2015_2016["change_from_right_before_in_dutiable"] = (df_2015_2016["October_dutiable_2016"] - \
                                          df_2015_2016["May_dutiable_2016"]) \
                                        / df_2015_2016["May_dutiable_2016"] * 100

In [34]:
df_2015_2016["change_from_right_before_in_dutiable"].describe()

count    5722.000000
mean             inf
std              NaN
min      -100.000000
25%       -41.001553
50%        -2.985488
75%        63.464462
max              inf
Name: change_from_right_before_in_dutiable, dtype: float64

In [35]:
tariff_no_change_increased_dutiable = df_2015_2016[
    (df_2015_2016["oct_jan_tariff_change"] < 0.5) &
    (df_2015_2016["oct_jan_tariff_change"] > -0.5) &
    (df_2015_2016["oct_change_in_dutiable"] > 0)
].copy()

In [36]:
len(tariff_no_change_increased_dutiable)

2570

### The estimated tariff rate doesn't increase and the dutiable value increases

In [37]:
# There are 5,332 goods for us to analyze.
# ~ means "not"

len(df_2015_2016[
        (~df_2015_2016["oct_jan_tariff_change"].isnull()) &
        (~df_2015_2016["oct_change_in_dutiable"].isnull())
])

5409

In [38]:
# no increase -> decrease and no change

tariff_no_increase_increased_dutiable = df_2015_2016[
    (df_2015_2016["oct_jan_tariff_change"] < 0.5) &
    (df_2015_2016["oct_change_in_dutiable"] > 0)
].copy()

len(tariff_no_increase_increased_dutiable)

2661

In [39]:
# save to csv
#tariff_no_increase_increased_dutiable.to_csv('tariff_no_increase_increased_dutiable.csv')

In [40]:
tariff_no_increase_increased_dutiable[[
        "Commodity", "code", 
        "August_dutiable_2016", "August_dutiable_2015", 
        "January_tariff", "August_tariff",
        "oct_jan_tariff_change", "oct_change_in_dutiable"
    ]].head()

Unnamed: 0,Commodity,code,August_dutiable_2016,August_dutiable_2015,January_tariff,August_tariff,oct_jan_tariff_change,oct_change_in_dutiable
166,"0306142000 Crabmeat, Frozen (kg)",306142000,326262.0,335616.0,7.499547,7.500107,0.0004,1704.40173
216,"0408190000 Egg Yolks, Frsh, Frzn, Cooked By Water, Molded Etc (kg)",408190000,467994.0,153014.0,11.116762,0.99638,-10.381721,251.79798
222,"0502100000 Pigs, Hogs, Boars Bristles & Hair & Waste Thereof (kg)",502100000,493278.0,251104.0,0.030159,0.028179,-0.001278,65.150318
243,"0511994070 Animal Products Nesoi, Dead Animals Ch 1, Inedible (kg)",511994070,86284.0,45294.0,1.100258,1.099856,-0.000599,0.006755
248,"0602100000 Unrooted Cuttings And Slips Of Plants, Nesoi (no)",602100000,341906.0,140871.0,4.799654,4.800442,0.001455,49.579598


### The estimated tariff rate increases and the dutiable value decreases

In [41]:
tariff_up_decreased_dutiable = df_2015_2016[
    (df_2015_2016["oct_jan_tariff_change"] >= 0.5) &
    (df_2015_2016["oct_change_in_dutiable"] < 0)
].copy()

len(tariff_up_decreased_dutiable)

54

In [42]:
tariff_up_decreased_dutiable[[
        "Commodity", "code", 
        "August_dutiable_2015", "August_dutiable_2016", 
        "January_tariff", "August_tariff",
        "oct_jan_tariff_change", "oct_change_in_dutiable"
    ]]

Unnamed: 0,Commodity,code,August_dutiable_2015,August_dutiable_2016,January_tariff,August_tariff,oct_jan_tariff_change,oct_change_in_dutiable
279,"0709510100 Mushrooms, Of The Genus Agaricus, Fresh Or Chilled (kg)",709510100,192948.0,225574.0,24.213872,25.534858,1.438846,-21.222371
299,"0710802000 Mushrooms Raw/cooked By Steam/boiling In Water, Fz (kg)",710802000,38306.0,189927.0,10.190226,10.289743,1.111378,-20.481336
366,"0713394160 White Beans Ex Seed, 9/1-4/30, Dried Shelled,nesoi (kg)",713394160,,,0.480769,,0.62368,-8.664324
510,"1006309055 Rice, Long Grain, Semi- Or Wholly Milled, Nesoi (kg)",1006309055,110153.0,27995.0,0.597199,0.710841,0.554188,-44.999897
863,2003908010 Straw Mushroom Prep/pres Ex By Vinegar/acetic Acid (kg),2003908010,404437.0,64285.0,9.920635,12.397916,0.63744,-5.349341
988,"2009898039 Juice Of Single Vegetable Nesoi, Unfermentd, Nesoi (l)",2009898039,2644785.0,607149.0,0.208562,0.317385,0.585129,-87.555727
1402,2835240000 Potassium Phosphate (kg),2835240000,590202.0,576727.0,2.541811,3.1006,0.558096,-5.852358
1656,"2909491000 Aromatic Ethers, Etc Of Prod In U.S. Note3 Sec6 (kg)",2909491000,,66866.0,4.223555,5.500553,1.276671,-90.333979
1788,2916313000 Benzoic Acid Estrs Of Prod In U.S. Note 3 Sect 6 (kg),2916313000,42919.0,74490.0,3.518582,3.270583,0.54241,-20.136788
1956,2922197000 Other Arom Amino-alc;etc(ex Prod In U.S.NT3 Sec 6) (kg),2922197000,0.0,0.0,0.0,0.0,5.099201,-88.5154


### Check overlap between datasets:

The first four digits of code are called headings.  They are merely a subdivision of a chapter. They vary in terms of scope and breadth depending on the chapter and heading. There're 96 chapters. A chapter is first two digits of the code.

In [43]:
tariff_up_decreased_dutiable.dtypes

Commodity                                object
code                                     object
year                                     object
April_dutiable_2015                     float64
August_dutiable_2015                    float64
December_dutiable_2015                  float64
February_dutiable_2015                  float64
January_dutiable_2015                   float64
July_dutiable_2015                      float64
June_dutiable_2015                      float64
March_dutiable_2015                     float64
May_dutiable_2015                       float64
November_dutiable_2015                  float64
October_dutiable_2015                   float64
September_dutiable_2015                 float64
Commodity_dutiable                       object
year_dutiable                            object
April_dutiable_2016                     float64
August_dutiable_2016                    float64
December_dutiable_2016                  float64
February_dutiable_2016                  

In [44]:
# Tariff increase and dutibale value decrease
tariff_up_decreased_dutiable["class"] = tariff_up_decreased_dutiable["code"].apply(lambda x: x[:4])

# Tariff no increase means that tariff doesn't change and decrease 
tariff_no_increase_increased_dutiable["class"] = tariff_no_increase_increased_dutiable["code"].apply(lambda x: x[:4])

In [45]:
tariff_up_decreased_dutiable["class"].value_counts().to_frame().reset_index().head()

Unnamed: 0,index,class
0,9102,5
1,6402,4
2,9109,3
3,2932,3
4,3913,2


In [46]:
tariff_no_increase_increased_dutiable["class"].value_counts().to_frame().reset_index().head()

Unnamed: 0,index,class
0,6404,46
1,6302,45
2,6110,40
3,6204,31
4,6403,31


In [47]:
find_classes_with_changes = pd.merge(
    tariff_up_decreased_dutiable["class"].value_counts().to_frame().reset_index(),
    tariff_no_increase_increased_dutiable["class"].value_counts().to_frame().reset_index(),
    on="index",
    how="outer",
    suffixes=["_up_decreased", "_down_increased"]
).fillna(0)

find_classes_with_changes["class_up_decreased"] = find_classes_with_changes["class_up_decreased"].astype(int)
find_classes_with_changes["class_down_increased"] = find_classes_with_changes["class_down_increased"].astype(int)

In [48]:
find_classes_with_changes.head()

Unnamed: 0,index,class_up_decreased,class_down_increased
0,9102,5,17
1,6402,4,21
2,9109,3,1
3,2932,3,3
4,3913,2,1


### What do these codes represent?

In [49]:
# Import all HTS code, which I downloaded from HTS code Website
# The website link: https://hts.usitc.gov/export

all_hts_code= pd.read_csv('hts_all_code.csv')
all_hts_code.head()

Unnamed: 0,HTS Number,Indent,Description,Unit of Quantity,General Rate of Duty,Special Rate of Duty,Column 2 Rate of Duty,Quota Quantity,Additional Duties
0,0101,0,"Live horses, asses, mules and hinnies:",,,,,,
1,,1,Horses:,,,,,,
2,0101.21.00,2,Purebred breeding animals,,Free,,Free,,
3,0101.21.00.10,3,Males,"[""No.""]",,,,,
4,0101.21.00.20,3,Females,"[""No.""]",,,,,


In [50]:
all_hts_code=all_hts_code.rename(columns = {'HTS Number':'index'})
all_hts_code.head()

Unnamed: 0,index,Indent,Description,Unit of Quantity,General Rate of Duty,Special Rate of Duty,Column 2 Rate of Duty,Quota Quantity,Additional Duties
0,0101,0,"Live horses, asses, mules and hinnies:",,,,,,
1,,1,Horses:,,,,,,
2,0101.21.00,2,Purebred breeding animals,,Free,,Free,,
3,0101.21.00.10,3,Males,"[""No.""]",,,,,
4,0101.21.00.20,3,Females,"[""No.""]",,,,,


In [51]:
all_hts_code.keys()

Index(['index', 'Indent', 'Description', 'Unit of Quantity',
       'General Rate of Duty', 'Special Rate of Duty', 'Column 2 Rate of Duty',
       'Quota Quantity', 'Additional Duties'],
      dtype='object')

In [52]:
find_classes_with_changes_with_description = pd.merge(
    find_classes_with_changes,
    all_hts_code,
    on="index",
    how="left")
find_classes_with_changes_with_description.head()

Unnamed: 0,index,class_up_decreased,class_down_increased,Indent,Description,Unit of Quantity,General Rate of Duty,Special Rate of Duty,Column 2 Rate of Duty,Quota Quantity,Additional Duties
0,9102,5,17,0.0,"Wrist watches, pocket watches and other watches, including stop watches, other than those of heading 9101:",,,,,,
1,6402,4,21,0.0,Other footwear with outer soles and uppers of rubber or plastics:,,,,,,
2,9109,3,1,0.0,"Clock movements, complete and assembled:",,,,,,
3,2932,3,3,0.0,Heterocyclic compounds with oxygen hetero-atom(s) only:,,,,,,
4,3913,2,1,0.0,"Natural polymers (for example, alginic acid) and modified natural polymers (for example, hardened proteins, chemical derivatives of natural rubber), not elsewhere specified or included, in primary forms:",,,,,,


In [53]:
find_classes_with_changes_with_description.drop(['Indent', 'Unit of Quantity','General Rate of Duty', 'Special Rate of Duty', 'Column 2 Rate of Duty','Quota Quantity', 'Additional Duties'], axis=1, inplace=True)
find_classes_with_changes_with_description


Unnamed: 0,index,class_up_decreased,class_down_increased,Description
0,9102,5,17,"Wrist watches, pocket watches and other watches, including stop watches, other than those of heading 9101:"
1,6402,4,21,Other footwear with outer soles and uppers of rubber or plastics:
2,9109,3,1,"Clock movements, complete and assembled:"
3,2932,3,3,Heterocyclic compounds with oxygen hetero-atom(s) only:
4,3913,2,1,"Natural polymers (for example, alginic acid) and modified natural polymers (for example, hardened proteins, chemical derivatives of natural rubber), not elsewhere specified or included, in primary forms:"
5,2922,2,8,Oxygen-function amino-compounds:
6,6108,2,17,"Women's or girls' slips, petticoats, briefs, panties, night dresses, pajamas, negligees, bathrobes, dressing gowns and similar articles, knitted or crocheted:"
7,6404,2,46,"Footwear with outer soles of rubber, plastics, leather or composition leather and uppers of textile materials:"
8,6204,1,31,"Women's or girls' suits, ensembles, suit-type jackets, blazers, dresses, skirts, divided skirts, trousers, bib and brace overalls, breeches and shorts (other than swimwear):"
9,8473,1,1,"Parts and accessories (other than covers, carrying cases and the like) suitable for use solely or principally with machines of headings 8470 to 8472:"


# Final Result:

In [54]:
# These classes have a lot of goods on both sides and are worth exploring further
# number here represents how many HTS codes are in each class

# I want to find catogrie with significant changes, so I decided to find changes bigger than 4

find_classes_with_changes_with_description[
    (find_classes_with_changes_with_description["class_up_decreased"] >= 4) &
    (find_classes_with_changes_with_description["class_down_increased"] >= 4)
]

Unnamed: 0,index,class_up_decreased,class_down_increased,Description
0,9102,5,17,"Wrist watches, pocket watches and other watches, including stop watches, other than those of heading 9101:"
1,6402,4,21,Other footwear with outer soles and uppers of rubber or plastics:


---

---

---