# Load Datamine Data Locally

This workbook will demonstrate using the datamine package to source all your datamine data from CME Group and save it into local folders.

This is a working example for reference and is a starting point for more advanced use cases.  This workbook will demonstrate the following workflow
1. Credentialling into Datamine
2. Retrieving your data catalog of items
3. Downloading your data items to your local directories

This package does take some basic shortcuts to keep things simple.  They may be improved over time.  This includes
1. Package is not cacheing your data catalog locally.  It downloads the catalog each time; if you have a lot of data items (i.e. > 10,000 this may take some time.)
2. Package is always downloading all data from Datamine for a given data collection.  It will overwrite local copies of the data.  



In [None]:
import datamine.io as dm
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib import style
style.use('fivethirtyeight')

%matplotlib inline

In [None]:
#Establish an object to interact with CME Datamine.
#Supply Credentials per Documentation: http://www.cmegroup.com/market-data/datamine-api.html
myDatamine = dm.DatamineCon(username='', password='^', path='./data/')

## Get Your Catalog Of Data

Datamine hosts all your subscriptions for download which you can see in the GUI.  https://datamine.cmegroup.com/

The following code will download a catalog of all your items.  We will turn that into a Pandas DataFrame to see what data we have in our account.


In [None]:
#Get the first 1,000 items for Crytocurrencies
myDatamine.get_catalog(dataset='CRYPTOCURRENCY', limit=1000)

In [None]:
myDatamine.get_catalog()

In [None]:
# Review one of the data catalog items as supplied in dict format.  
myDatamine.data_catalog.popitem()

In [None]:
# We can view the data catalog easier in a Pandas Dataframe
dataCatalogDF = pd.DataFrame.from_dict(myDatamine.data_catalog,).T
dataCatalogDF.head()

In [None]:
# We can see how many data products we can access
dataCatalogDF.dataset.value_counts()

## Using The Data 

The following will show you example of how the Datamine python module can be used to copy down the data from the cloud to your local computer.  

This routine will download the data to the /data/* folder for each specific data set.  This routine copies everything down and overwrites anything locally.  This routine will take time depending upon the amount of data on the cloud that your trying to pull down.  The pulls are multithreaded to speed them up; you can adjust this by adjusting the processes in the MyDatamine object.

```myDatamine.processes = 4```

### Bitcoin & Crypto Currencies

In [None]:
# Load the bitcoin data from datamine cloud and structure into dataframe of myDatamine.bitcoin_DF.  
# Will return 0 if successful
myDatamine.crypto_load(False)

In [None]:
#look at the data frame
myDatamine.crypto_DF.head()

In [None]:
#There are many values in the file; one that is every second as 'BRTI and one that is daily as 'BRR'
myDatamine.crypto_DF.symbol.value_counts()

### Bitcoin Index Rate

In [None]:
indexValue = myDatamine.crypto_DF.loc[myDatamine.crypto_DF['symbol'] =='BRTI','mdEntryPx'].plot(figsize=[15,5]);
plt.title('Historical Bitcoin Intraday Reference Rate')
plt.xlabel('Date')
plt.ylabel('USD/BTC')
plt.style.use('fivethirtyeight')
plt.show()

## Bitcoin End of Day Value




In [None]:
myDatamine.crypto_DF.loc[myDatamine.crypto_DF['symbol'] =='BRR','mdEntryPx'].plot(figsize=[15,5])
plt.title('Historical Bitcoin Daily Value')
plt.xlabel('Date')
plt.ylabel('$/btc')
plt.style.use('fivethirtyeight')
plt.show()

## Tick Data / Time and Sales

Tick data is considered Time and Sales.  This is data that represents the time that a specific product was traded between two parties at a given price.  The following will download the data local and load it into a Pandas DataFrame for analysis.

In [None]:
#update my catalog with Tick Data
myDatamine.get_catalog(dataset='TICK', limit=1000, refresh=True)

In [None]:
#download and load my data
#Tick Data can be a lot of files and can take time to load into a Pandas Dataframe...
myDatamine.time_sales_load(False)

In [None]:
ts = myDatamine.time_sales_DF
ts.head()

In [None]:
#Review the Symbols we have loaded.
ts.ticker_symbol.value_counts()

In [None]:
#Plot Histogram of lot size of a given trade for Crude Oil
ts.loc[ts.ticker_symbol =='CL','trade_quantity'].hist(bins=20).set_yscale('log')

## Orbital Insights



In [None]:
#Get some Orbital Insights Data
myDatamine.get_catalog(dataset='ORBITALINSIGHT', limit=1000, refresh=True)


In [None]:
myDatamine.orbital_insights_load(False)

In [None]:
myDatamine.orbital_insights_DF.head()

In [None]:
myDatamine.orbital_insights_DF.loc[myDatamine.orbital_insights_DF['location'] 
                                   == 'USA_permian','storage_capacity_estimate'
                                  ].dropna().plot(figsize=[15,5])
plt.title('Historical Permium Base Estimated Storage Capacity')
plt.xlabel('Date')
plt.ylabel('Barrels (M)')
plt.style.use('fivethirtyeight')
plt.show()

# Tellus Labs

In [None]:
myDatamine.get_catalog(dataset='TELLUSLABS', limit=5000, refresh=True)
myDatamine.tellus_labs_load(False)

In [None]:
myDatamine.tellus_labs_DF.head()

In [None]:
#There are two measures for Tellus: SATTELNDVI and TELLUSCHIN
myDatamine.tellus_labs_DF.measure.value_counts()

In [None]:
myDatamine.tellus_labs_DF.loc[(myDatamine.tellus_labs_DF['geo_display_name']  == 'IOWA')
                              & (myDatamine.tellus_labs_DF['measure']  == 'TELLUSCHIN')
                              & (myDatamine.tellus_labs_DF['crop']=='corn')
                              & (myDatamine.tellus_labs_DF['geo_level'] == 'level_2')
                              ,'value'
                                  ].dropna().plot(figsize=[15,5])
plt.title('Tellus Historical United States TELLUSCHIN Measure')
plt.xlabel('Date')
plt.ylabel('NDVI')
plt.style.use('fivethirtyeight')
plt.show()