# Sheeps vs Goats

Because of this is a very large dataset, I'm only going to analyze two commodities, live sheeps and goats. 
![](https://richardbahrcomsite.files.wordpress.com/2016/06/sheep-goat.jpg?w=863)

In [None]:
#   Processing
import pandas as pd
import numpy as np
np.set_printoptions(threshold=np.nan)
import re
#   Visuals
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize']=(20,10)

Firstly, we need to read the dataset:

In [None]:
df = pd.read_csv("../input/commodity_trade_statistics_data.csv", na_values=["No Quantity",0.0,''],sep=',')

In [None]:
df.head()

# Data cleaning

With this dataset we need to deal with missing values.

In [None]:
df.count()

In [None]:
df.isnull().sum()

Since there are a lot of rows, I can remove the rows with any missing and I don´t lose much information.

In [None]:
df = df.dropna(how='any').reset_index(drop=True)  

In [None]:
df.isnull().sum()

In [None]:
df['commodity'].unique()[:5]

# Getting sheeps & goats

I make 2 dataframes, one for sheeps and another for goats, in order to analyse them separately.

In [None]:
dfSheeps = df[df['commodity']=='Sheep, live'].reset_index(drop=True)  
dfGoats = df[df['commodity']=='Goats, live'].reset_index(drop=True)  

In [None]:
dfSheeps.head()

# Plotting the number of imported kgs of Sheeps & Goats

In [None]:
dfSheepsGrouped = pd.DataFrame({'weight_kg' : dfSheeps.groupby( ["year","flow","commodity"] )["weight_kg"].sum()}).reset_index()
dfGoatsGrouped = pd.DataFrame({'weight_kg' : dfGoats.groupby( ["year","flow","commodity"] )["weight_kg"].sum()}).reset_index()
dfSheepsGrouped.head()

In [None]:
f, ax = plt.subplots(1, 1)
dfgr = pd.concat([dfSheepsGrouped,dfGoatsGrouped])
ax = sns.pointplot(ax=ax,x="year",y="weight_kg",data=dfgr[dfgr['flow']=='Import'],hue='commodity')
_ = ax.set_title('Global imports of kgs by animal')

As we can see, the number of imported sheeps is much larger than the number of imported goats in the world.

# Analysis of sheeps

In [None]:
dfSheeps.head()

In [None]:
dfSheepsGrouped = pd.DataFrame({'weight_kg' : dfSheeps.groupby( ["country_or_area","flow","commodity"] )["weight_kg"].sum()}).reset_index()
dfSheepsGrouped.head()

In [None]:
sheepsImportsCountry = dfSheepsGrouped[dfSheepsGrouped['flow']=='Import']
sheepsExportsCountry = dfSheepsGrouped[dfSheepsGrouped['flow']=='Export']
sheepsImportsCountry.head()

In [None]:
ax = sns.barplot(x="weight_kg", y="country_or_area", data=sheepsImportsCountry.sort_values('weight_kg',ascending=False)[:15])
_ = ax.set(xlabel='Kgs', ylabel='Country or area',title = "Countries or areas that imported more kgs of Sheeps")

The countries that import more kgs of sheeps are Saudi Arabia, Italy and Kuwait.

In [None]:
ax = sns.barplot(x="weight_kg", y="country_or_area", data=sheepsExportsCountry.sort_values('weight_kg',ascending=False)[:15])
_ = ax.set(xlabel='Kgs', ylabel='Country or area',title = "Countries or areas that exported more kgs of Sheeps")

The countries that export less kgs of sheeps are Australia, Romania and Sudan.