# Global Oilseed Supply from 2012-2022

This data was pulled from the United States Department of Agriculture Foreign Agriculture Service (USDA FSA) Production, Supply and Distribution (PSD) online database using the custom query to create an Excel document that was then converted to a CSV file. This data can be found at the USAID FSA PSD at https://apps.fas.usda.gov/psdonline/app/index.html#/app/home.

In [61]:
import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns

%matplotlib inline

Reading the CSV file and printing it shows that there are over 5000 rows and 9 columns for the 10 years.

In [62]:
df = pd.read_csv("global_oilseed_2012-2022.csv")
df

Unnamed: 0,Attribute,Year,Country,"Oilseed, Peanut","Oilseed, Rapeseed","Oilseed, Soybean","Oilseed, Soybean (Local)","Oilseed, Sunflowerseed",Unit Description
0,Beginning Stocks,2012/2013,Argentina,269,0,14338,2679,369,(1000 MT)
1,Beginning Stocks,2012/2013,Australia,0,572,3,0,0,(1000 MT)
2,Beginning Stocks,2012/2013,Bangladesh,0,152,30,0,0,(1000 MT)
3,Beginning Stocks,2012/2013,Barbados,0,0,5,0,0,(1000 MT)
4,Beginning Stocks,2012/2013,Belarus,0,0,0,0,0,(1000 MT)
...,...,...,...,...,...,...,...,...,...
5551,Ending Stocks,2022/2023,Uzbekistan,0,0,3,0,19,(1000 MT)
5552,Ending Stocks,2022/2023,Venezuela,0,0,2,0,0,(1000 MT)
5553,Ending Stocks,2022/2023,Vietnam,39,0,392,0,0,(1000 MT)
5554,Ending Stocks,2022/2023,Zambia,8,0,32,0,0,(1000 MT)


Since every oilseed crop is in 1000 metric tons (MT) then this column can be dropped

In [63]:
df = df.drop(columns="Unit Description")
df.head()

Unnamed: 0,Attribute,Year,Country,"Oilseed, Peanut","Oilseed, Rapeseed","Oilseed, Soybean","Oilseed, Soybean (Local)","Oilseed, Sunflowerseed"
0,Beginning Stocks,2012/2013,Argentina,269,0,14338,2679,369
1,Beginning Stocks,2012/2013,Australia,0,572,3,0,0
2,Beginning Stocks,2012/2013,Bangladesh,0,152,30,0,0
3,Beginning Stocks,2012/2013,Barbados,0,0,5,0,0
4,Beginning Stocks,2012/2013,Belarus,0,0,0,0,0


Next we can look at the number of unique values in each column using the nunique function. This shows that there are 6 attributes and 85 countries represented in the dataset, while soybeans are the most frequently used oilseed in the dataset. 

In [64]:
unique_values = df.nunique(0)
unique_values

Attribute                      6
Year                          11
Country                       85
Oilseed, Peanut              667
Oilseed, Rapeseed            621
Oilseed, Soybean            1169
Oilseed, Soybean (Local)     111
Oilseed, Sunflowerseed       514
dtype: int64

Next we should check to see what data types we have before manipulating the data. Apparently everything is in "object" data type, which means it is in string/text type. To use the data properly we need to convert the years to date time and the oilseeds to integers.

In [65]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5556 entries, 0 to 5555
Data columns (total 8 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Attribute                 5556 non-null   object
 1   Year                      5556 non-null   object
 2   Country                   5556 non-null   object
 3   Oilseed, Peanut           5556 non-null   int64 
 4   Oilseed, Rapeseed         5556 non-null   int64 
 5   Oilseed, Soybean          5556 non-null   int64 
 6   Oilseed, Soybean (Local)  5556 non-null   int64 
 7   Oilseed, Sunflowerseed    5556 non-null   int64 
dtypes: int64(5), object(3)
memory usage: 347.4+ KB


In [66]:
df[["Year", "Year2"]] = df["Year"].astype(str).str.split("/", expand=True)
df.head()

Unnamed: 0,Attribute,Year,Country,"Oilseed, Peanut","Oilseed, Rapeseed","Oilseed, Soybean","Oilseed, Soybean (Local)","Oilseed, Sunflowerseed",Year2
0,Beginning Stocks,2012,Argentina,269,0,14338,2679,369,2013
1,Beginning Stocks,2012,Australia,0,572,3,0,0,2013
2,Beginning Stocks,2012,Bangladesh,0,152,30,0,0,2013
3,Beginning Stocks,2012,Barbados,0,0,5,0,0,2013
4,Beginning Stocks,2012,Belarus,0,0,0,0,0,2013


In [67]:
df = df.drop(columns="Year2")
df.head()

Unnamed: 0,Attribute,Year,Country,"Oilseed, Peanut","Oilseed, Rapeseed","Oilseed, Soybean","Oilseed, Soybean (Local)","Oilseed, Sunflowerseed"
0,Beginning Stocks,2012,Argentina,269,0,14338,2679,369
1,Beginning Stocks,2012,Australia,0,572,3,0,0
2,Beginning Stocks,2012,Bangladesh,0,152,30,0,0
3,Beginning Stocks,2012,Barbados,0,0,5,0,0
4,Beginning Stocks,2012,Belarus,0,0,0,0,0
