# Download files from remote data services

This notebook demonstrates a few ways to download files from NCI's THREDDS server.

* download a single file
    - requests.get
    - urllib
    - wget
    
* download multiple files
    - requests.get
    - urllib

This notebook is licenced under the [Creative Commons Attribution 4.0 International license](https://creativecommons.org/licenses/by/4.0/)

### Single file download

First, let's define a THREDDS endpoint url:

In [None]:
url = 'https://thredds.nci.org.au/thredds/fileServer/iv65/Geoscience_Australia_Geophysics_Reference_Data_Collection/national_geophysical_compilations/Gravmap2016/Gravmap2016-grid-grv_ir.nc'

1. `request.get`

In [None]:
import os
import requests
def download_file(in_filename, out_filename):
    if not os.path.exists(out_filename):
        print("Downloading", in_filename)
        response = requests.get(in_filename)
        with open(out_filename, 'wb') as f:
            f.write(response.content)

### create output directory
outdir = './output'
if not os.path.exists(outdir):
    os.mkdir(outdir)

output_path = os.path.join(outdir, 'IR1.nc')

download_file(url, output_path)

2. `urllib`

In [None]:
from urllib import request

output_path_urllib = os.path.join(outdir, 'IR2.nc')

request.urlretrieve(url, output_path_urllib)

3. `wget`

In [None]:
!wget $url -O ./output/IR3.nc

### Bulk download

1. `request.get`

First, get all the file names. 

In [None]:
from siphon.catalog import TDSCatalog
url='https://thredds.nci.org.au/thredds/catalog/fj9/http/Cobar_NSW/Cobar_3D_MGA55/Cobar_Local_Data/Gocad_objects_mga55/catalog.xml'
cat = TDSCatalog(url)
print("\n".join(cat.datasets.keys()))

In [None]:
import requests 
for filename in cat.datasets.keys():
    if filename.endswith('.gp'):
        url = 'http://thredds.nci.org.au/thredds/catalog/fj9/http/Cobar_NSW/Cobar_3D_MGA55/Cobar_Local_Data/Gocad_objects_mga55/'+ str(filename)
        r = requests.get(url, allow_redirects = True)
        open('./output/'+str(filename), 'wb').write(r.content)

Alternatively, you can use thredds crawler to get all the end points.

In [None]:
from thredds_crawler.crawl import Crawl
url= 'https://thredds.nci.org.au/thredds/catalog/fj9/http/Cobar_NSW/Cobar_3D_MGA55/Cobar_Local_Data/Gocad_objects_mga55/catalog.xml'
c = Crawl(url)
c.datasets[:5]

In [None]:
urls_download = [s.get("url") for d in c.datasets for s in d.services if s.get("service").lower() == "httpserver"]
urls_download[:5]

In [None]:
import requests 
for url in urls_download:
    if url.endswith('.gp'):
        r = requests.get(url, allow_redirects = True)
        filename = url.split('/')[-1]
        with open('./output/' + filename, 'wb') as f:
            f.write(r.content)

2. `urllib`

In [None]:
from urllib import request
for filename in cat.datasets.keys():
    url = 'http://thredds.nci.org.au/thredds/catalog/fj9/http/Cobar_NSW/Cobar_3D_MGA55/Cobar_Local_Data/Gocad_objects_mga55/'+ str(filename)
    request.urlretrieve(url,'./output/'+filename)