# 2. Downloading Gaia DR1 data from Gaia's site

Gaia DR1 data is available at  
`http://1016243957.rsc.cdn77.org/Gaia/gaia_source/`.  
There are 3 types of data format (CSV, fits, and votable) there.
In the following, we use the data files in the CSV format.

There are total of 5,231 CSV files!
However, unfortunately, it seems that the ftp site would not be available currently.
We need to download them by clicking the links on the above site by hand one by one?
Fortunately, we can do it automatically using Python.

All modules used in this section are `urllib`, `sys`, and `os`.
Enter the following codes in the cell below and run the cell to import them.

```
import urllib.request
import sys
import os.path
```

Here, let's save the downloaded Gaia DR1 files in the local folder located at `./orig_data`.

Before download them, make sure the folder to exist.
If it doesn't exist, create it.
To do this, the functions `os.path.isdir()` and `os.mkdir()` can be used.
The function `os.path.isdir(dir)` confirms if the directory '`dir`' exists.
(Here, we use the word 'directory' for the same meaning as 'folder'.)
The function `os.mkdir(dir)` creates the directory '`dir`'.

The following code first check wherether if the directory '`dest_dir`' already exists or not,
and then create it if it is absent. Enter the following codes in the cell below and run it.
```
dest_dir = './orig_data'
if not os.path.isdir(dest_dir):
    os.mkdir(dest_dir)
```

Here is the main part of this section.
To download a file at URL of '`url`' into a local folder with
a file name of '`fn`',
`urllib.request.urlretrieve(url, fn)` funciton can be used as already seen in the previous section.

Here we download the Gaia DR1 files in the CSV format.
Actually, the CSV files are compressed by the gzip
and their file names are given in the following format:  
`GaiaSource_000-XXX-YYY.csv.gz`,  
where XXX means the first index ranging from 0 to 20
and YYY means the second index ranging from 0 to 255.  
The Gaia DR1 data files in the gzipped CSV format are located at `http://1016243957.rsc.cdn77.org/Gaia/gaia_source/csv/`.

Let's define a function to download a data file of the first index `i1` and the second index `i2`.
Enter the following codes in the cell below and run it.
```
def download_file(i1, i2):
    url = 'http://1016243957.rsc.cdn77.org/Gaia/gaia_source/csv/'
    fn = 'GaiaSource_000-{0:03d}-{1:03d}.csv.gz'.format(i1,i2)
    print (fn, end=' ')
    fno = "{0}/{1}".format(dest_dir,fn)
    if os.path.exists(fno):
        print('Skippng...')
        return
    print('Downloading...', end='')

    while True:
        try:
            urllib.request.urlretrieve(url+fn, fno)
            print('Completed')
            return
        except urllib.error.HTTPError:
            print('File not found')
            return
        except:
            pass
```

As a test of this function, let's download a file with `i1=0` and `i2=0` using it.
Enter the following code in the cell below and run it. It will take a few minutes (of course, it depends on the network condition).
```
download_file(0,0)
```

Has the file been downloaded successfully?

Finally, download some 5 data files for the next section.
Enter the following codes in the cell below and run it.
Here, we choose `i1=0`.
(you can choose another value of `i1` afterward.)
```
i1 = 0
for i2 in range(5):
    download_file(i1,i2)
```
It will take a few minites to complete to download the all 5 files.

Check if the files have been downloaded successfully in your destination folder.
That's all for this section.

To download all the data files of Gaia DR1, the following code can be used:
```
def download_all():
    for i1 in range(21):
        for i2 in range(256):
            download_file(i1, i2)
```

However, it will take a very long time to complete, because there are 5,231 files.
Thus, we don't do it in this workshop.