## **Problem Statment**

Develop a recommendation engine for firms across the world that help them diversify their imports and exports

**DATA SOURCES**

1. WTO - Billateral trade data for the past 17 years
2. CPEII - Distance and Gravity data

**Approach**

1. Run tSNE to identify clusters within the data in high dimensional space
2. Run clusting algorithm (DBSCAN) to identify clusters - currently running for 1 product
2. Build a Neural Net to unmask relationship between the features and recommend.

**Features**
1. GDP
2. Distance between countries
3. Trading routes - currently not incoporated
4. Output capacity of suppliers
5. No.Trading partners
6. Products
7. Year
8. Gravity between countries - currently not incoporated

In [0]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
from sklearn.manifold import TSNE
import seaborn as sns
%matplotlib inline

In [0]:
from google.colab import drive
drive.mount('/datathon')

In [0]:
data = pd.read_csv('/Data.csv',encoding='iso-8859-1')

Analyze for a particular product group : Apparels in this case

In [0]:
data['Product Group'] = (data['Product/Sector Code'] >= 50) & (data['Product/Sector Code'] <= 67)

In [0]:
le = preprocessing.LabelEncoder()
data['Partner Economy Label'] = le.fit_transform(data['Partner Economy'])
data['Product Sector Label'] = le.fit_transform(data['Product/Sector'])
data['Product Group Label'] = le.fit_transform(data['Product Group'])

In [0]:
#data_filter= data[data['Product Group Label'] == 1]
#data_filter = data_filter[data_filter['Value'] < 100000000]
data_filter = data_filter.fillna(0)

Features selected for Clustering: Output of the Origin Firm, # Trading Partners, Distiance between Origin and Destinatin Countrym GDP of countries, whether they share colonial past, Presence of HSBC

In [0]:
final_df = data_filter[['Partner Economy Label','Product Sector Label','Value','Origin output','#trading partners','distance between origin and destination','colonised(1/0)','gdp of source','hsbc presence in destination']]

In [0]:
final_df['Value']=final_df['Value'].astype(float)
final_df['gdp of source']=final_df['gdp of source'].astype(float)

In [0]:
X = final_df[['Origin output','gdp of source','Value','#trading partners','distance between origin and destination','hsbc presence in destination']]

#X['Scaled Origin output'] = StandardScaler().fit_transform(X[['Origin output']])
#X['Scaled gdp of source'] = StandardScaler().fit_transform(X[['gdp of source']])
#X['Scaled Value'] = StandardScaler().fit_transform(X[['Value']])

#X = X[['Scaled Origin output','Scaled gdp of source','Scaled Value','#trading partners','distance between origin and destination','hsbc presence in destination']]

In [0]:
final_df = data[['Value','Supplier Output','#Trading Partners - Supplier','Distance','Importing Economy GDP','Supplier Economy GDP','FTA','import_duty']]

final_df=final_df.fillna(0)
final_df['Scaled Origin Output'] = StandardScaler().fit_transform(final_df[['Supplier Output']])
final_df['scaled distance between origin and destination'] = StandardScaler().fit_transform(final_df[['Distance']])
final_df['Scaled Reporting Economy GDP'] = StandardScaler().fit_transform(final_df[['Importing Economy GDP']])
final_df['Scaled Partner Economy GDP'] = StandardScaler().fit_transform(final_df[['Supplier Economy GDP']])
final_df['Scaled Value'] = StandardScaler().fit_transform(final_df[['Value']])
#final_df['Scaled #trading partners'] = StandardScaler().fit_transform(final_df[['#Trading Partners - Supplier']])
#final_df['Scaled import_duty'] = StandardScaler().fit_transform(final_df[['import_duty']])


In [0]:
X = final_df[['Scaled Origin Output','scaled distance between origin and destination','Scaled Reporting Economy GDP','Scaled Partner Economy GDP']]
              #,'Scaled #trading partners']]
              #,'FTA','Scaled import_duty']]

In [0]:
X_embedded = TSNE(n_components=2).fit_transform(X)
df_tsne = pd.DataFrame()
df_tsne['oned'] = X_embedded[:,0]
df_tsne['twod'] = X_embedded[:,1]

In [0]:
plt.figure(figsize=(16,10))
sns.scatterplot(x=df_tsne['oned'],y=df_tsne['twod'],data=df_tsne,alpha=0.3)

In [0]:
cluster = DBSCAN(eps=0.1,min_samples=3).fit_predict(X)

In [0]:
plt.subplot(3,1,1)
plt.scatter(X['scaled distance between origin and destination'],X['Scaled Origin Output'],c=cluster,cmap="plasma")

'''
plt.subplot(3,1,2)
plt.scatter(final_df['Origin output'],final_df['Product Sector Label'],c=cluster,cmap="plasma")

plt.subplot(3,1,3)
plt.scatter(final_df['#trading partners'],final_df['Product Sector Label'],c=cluster,cmap="plasma")
'''

plt.show()

In [0]:
final_df['color'] = final_df['Partner Economy Label'].apply(lambda x : 'Green' if x == 24 else 'Red')

In [0]:
final_us = final_df[final_df['Partner Economy Label']==24]
final_ban = final_df[final_df['Partner Economy Label']==12]


In [0]:
plt.subplot(2,1,1)
plt.scatter(final_us['Origin output'],final_us['#trading partners'],c=final_us['color'],cmap="plasma")

plt.subplot(2,1,2)
plt.scatter(final_ban['Origin output'],final_ban['#trading partners'],c=final_ban['color'],cmap="plasma")


plt.show()

In [0]:
final_df[final_df['Partner Economy Label'] == 26]

In [0]:
data_filter.reset_index(drop=True,inplace=True)


In [0]:
data_filter['cluster'] = pd.Series(cluster)

In [0]:
data_filter.to_excel('/content/output.xlsx')