# Hey everybody!
This is my first kernel and today we will take a first look into animal trade in 2016. I plan to further explore this dataset and try new things as i get better. Please comment if you want me to be more elaborate on code or anything. So let's just dive right in :)

First we want to make sure we have all the tools available that we need. I like to use **pathlib**, even when i'm just loading one file, because i don't have to worry about escapes and encodings later on, no matter what i want to do.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
import numpy as np
import seaborn as sns
from pathlib import Path
plt.style.use('seaborn-colorblind')

If you didn't know: the command below gives you several options to choose from for color palettes, background and grid combinations to be applied to all your figures when used with ``plt.style.use()``.

In [None]:
print(plt.style.available) # it's nice to see whats available :)

* Next we are going to **load all the data** into a pandas DataFrame

In [None]:
path = Path(r'D:\Coding/comptab_2018-01-29 16_00_comma_separated.csv')
# path = Path('../input/comptab_2018-01-29 16_00_comma_separated.csv') #  kaggle path
data = pd.read_csv(path)

Take a quick **look at the data**. The first 5 rows will be enough, just to get a feel of what we are looking at

In [None]:
data.head(5)

Okay, so what can we see? There are different columns for animal/plant taxonomy, shipping info, data about the parts that are shipped and the purpose why they were shipped. This seems quite interesting! Let us first start by looking at what animals/plants are actually shipped and **which of these are dominating the dataset**.

Most of my figures will have normalized values, because i think they are easier to grasp and look at. I also like to use bigger figures in general.

**IMPORTANT!**  
**It needs to be clear that most of the following plots are not weighted against the actual traded amounts. Every trade has the same "value", regardless of the trade volume. This may lead to skewed results.**

So, lets have a look at the distribution of values concerning the amounts of traded goods. I have omitted the outliers with ``showfliers=False``. So it seems **~50% of all trades contain no more than 80 items of that good. ~25% contain up to 200 items per trade**.

In [None]:
plt.figure(figsize=(20,10))
plt.subplot(121)
data.boxplot(column='Importer reported quantity', showfliers=False)
plt.title('Amount of imported goods per trade', fontsize=20)

plt.subplot(122)
data.boxplot(column='Exporter reported quantity', showfliers=False)
plt.title('Amount of exported goods per trade', fontsize=20)

In [None]:
class_distribution = data['Class'].value_counts(normalize=True)
plt.figure(figsize=(10,10))
class_distribution.head(5).plot(kind='bar', fontsize=18)
plt.title('5 most traded animal classes', fontsize=20)
plt.xticks(rotation=40)
plt.figure(figsize=(10,10))

- the **5 most traded** animal classes **make up ~97%** of all traded goods
- Almost 40% of animal goods trade is done with **reptiles**
- ['Anthozoa is a class of **marine invertebrates** which includes the sea anemones, stony corals, soft corals and gorgonians.'](https://en.wikipedia.org/wiki/Anthozoa)
- Aves are **birds**
- Actinopteri are **bony fish**

Excerpt from the [CITES Guidelines](https://trade.cites.org/cites_trade_guidelines/en-CITES_Trade_Database_Guide.pdf):
#  Trading Purposes:

- **B** Breeding in captivity or artificial propagation
- **E** Educational
- **G** Botanical garden
- **H** Hunting trophy
- **L** Law enforcement / judicial / forensic
- **M** Medical (including biomedical research)
- **N** Reintroduction or introduction into the wild
- **P** Personal
- **Q** Circus or travelling exhibition
- **S** Scientific
- **T** Commercial
- **Z** Zoo

# Source of the traded goods:

**A** 
Plants that are artificially propagated in accordance with Resolution Conf. 11.11 (Rev.
CoP15), as well as parts and derivatives thereof, exported under the provisions of
Article VII, paragraph 5, of the Convention (specimens of species included in Appendix I
that have been propagated artificially for non-commercial purposes and specimens of
species included in Appendices II and III).

**C** 
Animals bred in captivity in accordance with Resolution Conf. 10.16 (Rev.), as well as
parts and derivatives thereof, exported under the provisions of Article VII, paragraph 5,
of the Convention.

**D** 
Appendix-I animals bred in captivity for commercial purposes in operations included in
the Secretariat's Register, in accordance with Resolution Conf. 12.10 (Rev. CoP15), and
Appendix-I plants artificially propagated for commercial purposes, as well as parts and
derivatives thereof, exported under the provisions of Article VII, paragraph 4, of the
Convention.

**F** 
Animals born in captivity (F1 or subsequent generations) that do not fulfil the definition
of 'bred in captivity' in Resolution Conf. 10.16 (Rev.), as well as parts and derivatives
thereof.

**I** 
Confiscated or seized specimens

**O** 
Pre-Convention specimens

**R** 
Ranched specimens: specimens of animals reared in a controlled environment, taken as
eggs or juveniles from the wild, where they would otherwise have had a very low
probability of surviving to adulthood.

**U** 
Source unknown.

**W** 
Specimens taken from the wild.

**X** 
Specimens taken in "the marine environment not under the jurisdiction of any State"

In [None]:
data['Source'].value_counts(normalize=True).plot(kind='bar', fontsize=18)
plt.title('Relative amounts of different sources of traded animal goods', fontsize=20)
plt.xticks(rotation=0) #  this rotates the labels of x-axis by 90°

It seems like there are multiple categories concerning captivity (C,D,F and R). Let's have a **closer look at the categories W, A, C**. For this i wanted to use **subplots**. ``plt.subplot(yxi)`` creates a grid in the figure, where ``y`` is the number of slices for the y-axis, ``x`` for the x-axis and ``i`` is the index of the plot. i split the grid in 5 to have more vertical space between the plots.

In [None]:
plt.figure(figsize=(15,10))
plt.subplot(511)
data['Class'].loc[data['Source']=='W'].value_counts(normalize=True).head(5).plot.barh(fontsize=18)
plt.title('Top five traded animal classes caught in the wild', fontsize=20)
plt.subplot(513)
data['Order'].loc[data['Source']=='A'].value_counts(normalize=True).head(5).plot.barh(fontsize=18)
plt.title('Top five traded plant orders propagated artificially', fontsize=20)
plt.subplot(515)
data['Class'].loc[data['Source']=='C'].value_counts(normalize=True).head(5).plot.barh(fontsize=18)
plt.title('Top five traded animal classes that were bred in captivity', fontsize=20)
#  when it comes to captivity and breeding, things get a little more complicated and blurred.


Now we have seen some information about the different organisms that are traded and which of these are the most significant ones, depending on their source. But i wonder, what is actually traded? For this i wanted to try a stacked bar chart, since it would also give us information about the **Amount of each animal class corresponding to the traded good**. For this i *think* i needed to create a new DataFrame ``stacked_data`` which i filled with the counted values of the goods ('Term') of all the classes in ``animal_classes``. The 15 most traded goods are then plotted as a horizontal bar chart with ``stacked=True``keyword  argument. Notice that i used a **logarithmic x-axis**. This makes it easier to see the smaller bars, when one of them is bigger by a factor of ten or more.

In [None]:
animal_classes = ['Reptilia', 'Aves', 'Actinopteri', 'Mammalia', 'Anthozoa']
stacked_data = pd.DataFrame()
for i in animal_classes:
    stacked_data[i] = data['Term'].loc[data['Class']==i].value_counts()
stacked_data.head(15).plot.barh(figsize=(20,10), fontsize=24, stacked=True)
plt.xscale('log')
plt.legend(fontsize=24)
plt.title('Amount of 15 most traded animal goods', fontsize=24)

When we look at the quantities that are reportet for trade by importer/exporter, we can see that most of the time one of the values is missing. Actually **only ~5% of trades have matching import/export values**. So i was thinking:'Is it possible that missing values are unevenly distributed among the three appendices?' Remember that animals from appendix I are most endangered and therefore, illegal trade might be more lucrative. So let's plot it. My idea was to compute the normalized trade amounts of all appendices and subtract the normalized amount of trades where import and export quantities match. The plot clearly shows **~6% less unknown traded quantities for App. I animals** which seems good. It looks like animal trades for highly endangered species are more carefully looked at. On the other hand, App. II and III seem to have more trades of unilaterally known amounts, but it kind of has to be this way, since we are subtracting normalized values, that sum up to 1. This means the sum of all differences must always be 0. Is this skewing the analysis? I'm not sure...please comment below :)

In [None]:
all_appendices = data['App.'].value_counts(normalize=True)
known_amounts = data.loc[data['Importer reported quantity']==data['Exporter reported quantity'], 'App.'].value_counts(normalize=True)
y = all_appendices - known_amounts
y.plot.barh(figsize=(20,5), fontsize=24)
plt.title('Relative difference in known trade amounts compared to sum of all trades', fontsize=20)

Next i want to look at the countries. Just a few plots to get a feeling of things. The database is using two-letter codes to identify the countries.

In [None]:
plt.figure(figsize=(20,10))

plt.subplot(121)
toptrader_imp = data['Importer'].value_counts(normalize=True)
toptrader_imp.head(5).plot(kind='bar', fontsize=18)
plt.title('Top 5 importing countries', fontsize=20)
plt.xticks(rotation=0)

plt.subplot(122)
toptrader_exp = data['Exporter'].value_counts(normalize=True)
toptrader_exp.head(5).plot(kind='bar', fontsize=18)
plt.title('Top 5 exporting countries', fontsize=20)
plt.xticks(rotation=0)

Okay, top 5 importing countries are the **US, Japan, Germany, France and Hong Kong**. 
On the other side, top exporters are the **Netherlands, Indonesia, Italy, the US and France**.