# Global Oilseed Supply from 2012-2022

This data was pulled from the United States Department of Agriculture Foreign Agriculture Service (USDA FSA) Production, Supply and Distribution (PSD) online database using the custom query to create an Excel document that was then converted to a CSV file. This data can be found at the USAID FSA PSD at https://apps.fas.usda.gov/psdonline/app/index.html#/app/home.

In [3]:
import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns

%matplotlib inline

Reading the CSV file and printing it shows that there are over 5000 rows and 9 columns for the 10 years.

In [4]:
df = pd.read_csv("global_oilseed_2012-2022.csv")
df

Unnamed: 0,Attribute,Year,Country,"Oilseed, Peanut","Oilseed, Rapeseed","Oilseed, Soybean","Oilseed, Soybean (Local)","Oilseed, Sunflowerseed",Unit Description
0,Beginning Stocks,2012/2013,Argentina,269,0,14338,2679,369,(1000 MT)
1,Beginning Stocks,2012/2013,Australia,0,572,3,0,0,(1000 MT)
2,Beginning Stocks,2012/2013,Bangladesh,0,152,30,0,0,(1000 MT)
3,Beginning Stocks,2012/2013,Barbados,0,0,5,0,0,(1000 MT)
4,Beginning Stocks,2012/2013,Belarus,0,0,0,0,0,(1000 MT)
...,...,...,...,...,...,...,...,...,...
5551,Ending Stocks,2022/2023,Uzbekistan,0,0,3,0,19,(1000 MT)
5552,Ending Stocks,2022/2023,Venezuela,0,0,2,0,0,(1000 MT)
5553,Ending Stocks,2022/2023,Vietnam,39,0,392,0,0,(1000 MT)
5554,Ending Stocks,2022/2023,Zambia,8,0,32,0,0,(1000 MT)


Since every oilseed crop is in 1000 metric tons (MT) then this column can be dropped

In [10]:
df = df.drop(columns="Unit Description")
df.head()

Unnamed: 0,Attribute,Year,Country,"Oilseed, Peanut","Oilseed, Rapeseed","Oilseed, Soybean","Oilseed, Soybean (Local)","Oilseed, Sunflowerseed"
0,Beginning Stocks,2012/2013,Argentina,269,0,14338,2679,369
1,Beginning Stocks,2012/2013,Australia,0,572,3,0,0
2,Beginning Stocks,2012/2013,Bangladesh,0,152,30,0,0
3,Beginning Stocks,2012/2013,Barbados,0,0,5,0,0
4,Beginning Stocks,2012/2013,Belarus,0,0,0,0,0


Next we can look at the number of unique values in each column using the nunique function. This shows that there are 6 attributes and 85 countries represented in the dataset, while soybeans are the most frequently used oilseed in the dataset. 

In [11]:
unique_values = df.nunique(0)
unique_values

Attribute                      6
Year                          11
Country                       85
Oilseed, Peanut              667
Oilseed, Rapeseed            621
Oilseed, Soybean            1169
Oilseed, Soybean (Local)     111
Oilseed, Sunflowerseed       514
dtype: int64

Next we should check to see what data types we have before manipulating the data. Apparently everything is in "object" data type, which means it is in string/text type. To use the data properly we need to convert the years to date time and the oilseeds to integers.

In [16]:
df.dtypes

Attribute                   object
Year                        object
Country                     object
Oilseed, Peanut             object
Oilseed, Rapeseed           object
Oilseed, Soybean            object
Oilseed, Soybean (Local)    object
Oilseed, Sunflowerseed      object
dtype: object

In [18]:
df = df.infer_objects()
df.dtypes

Attribute                   object
Year                        object
Country                     object
Oilseed, Peanut             object
Oilseed, Rapeseed           object
Oilseed, Soybean            object
Oilseed, Soybean (Local)    object
Oilseed, Sunflowerseed      object
dtype: object