<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Preparation" data-toc-modified-id="Preparation-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Preparation</a></span><ul class="toc-item"><li><span><a href="#Parameters-(user-input)" data-toc-modified-id="Parameters-(user-input)-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Parameters (user input)</a></span></li><li><span><a href="#Function" data-toc-modified-id="Function-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Function</a></span></li></ul></li><li><span><a href="#School-locations" data-toc-modified-id="School-locations-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>School locations</a></span></li><li><span><a href="#Crimes" data-toc-modified-id="Crimes-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Crimes</a></span></li><li><span><a href="#Safe-Passage-routes" data-toc-modified-id="Safe-Passage-routes-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Safe Passage routes</a></span></li><li><span><a href="#Census-block-boundaries" data-toc-modified-id="Census-block-boundaries-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Census block boundaries</a></span></li><li><span><a href="#Implementation-of-Safe-Passage-program" data-toc-modified-id="Implementation-of-Safe-Passage-program-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Implementation of Safe Passage program</a></span></li></ul></div>

**Description**: This notebook can be used to download all the used data files which exist on the Data Portal of the City of Chicago. However, all of the datasets except for the crime data are already provided in the `data/raw` folder in the GitHub repository. The crime data set is over 1.5 GB in size and therefore too large to host on GitHub. However, it should not change much for the years used in this analysis (2009-2016) and therefore a download from the original source should work. 

This notebook will therefore by default only download the necessary data on crimes. After running it, you will have all the data needed in the correct folders to continue with the other notebooks.

For a more detailed description of the data used in this project, see the section "Data" in the [Appendix](https://github.com/binste/chicago_safepassage_evaluation/tree/master/reports/appendix/Appendix.pdf).

---

In [1]:
import math
import zipfile
from pathlib import Path

import requests
from tqdm import tqdm_notebook

# Preparation

## Parameters (user input)

If `re_download` is set to `True`, all the data sources (except the one obtained via FOIA request) will be downloaded and can replace existing files. If it is set to `False`, no files will be replaced.

In [2]:
re_download = False

## Function

The following code cell contains the function used for downloading the individual files.

In [3]:
def download_file(url,
                  filename,
                  data_raw_path='../../data/raw',
                  sub_path='',
                  force_download=True):
    """Downloads file from the url and saves it to specified location
    
    If the file does not already exist, it is downloaded from the given url
    and saved to the specified location which consists of data_raw_path the optional
    sub_path as well as the filename.
    
    Parameters
    ----------
    url : str
        Url from which file should be downloaded
    
    filename : str, format = "filename.fileending"
        Filename under which it should be saved (including file ending)
        
    data_raw_path : str, optional (default='../../data/raw')
        Default path to the raw data folder
        
    sub_path : str, optional (default='')
        Default behaviour puts file in data raw path. If a sub folder
        should be used, specify its name with this argument.
        
    force_download : bool, optional (default=False)
        If True, forces download even if file already exists
        
    Returns
    -------
    bool
        If download was completed (can be used for follow up actions such as unzipping)
    
    Example
    -------
    >>> download_file('https://test.com/school_location_1415.csv', 'sch_location_1415.csv', 
                      sub_path='school_location')
                      
    Notes
    -----
    This function heavily leans on the following two stackoverflow threads:
    * https://stackoverflow.com/questions/45978295/saving-a-downloaded-csv-file-using-python
    * https://stackoverflow.com/questions/37573483/progress-bar-while-download-file-over-http-with-requests/37573701
    """
    # Prepare full path to file
    file_path = Path(data_raw_path + '/' + sub_path) / Path(filename)
    # check if file already exists
    if not file_path.is_file() or force_download:
        print(f'Downloading {filename}')
        r = requests.get(url, stream=True)
        # Check if the response is ok (200)
        if r.status_code == 200:
            # Get total file size in bytes. If download server supports it,
            # this will be a compressed version and the total file size on your
            # hard disk may be larger!
            total_size = int(r.headers.get('content-length', 0))
            block_size = 1024
            wrote = 0
            # Open file and write the content
            with open(file_path, 'wb') as f:
                for block in tqdm_notebook(
                        r.iter_content(block_size),
                        total=math.ceil(total_size // block_size),
                        unit='KB',
                        unit_scale=True):
                    wrote = wrote + len(block)
                    f.write(block)
            if total_size != 0 and wrote != total_size:
                raise Exception(f'Could only download {wrote}/{total_size}')
            download_success = True
        else:
            raise Exception(f'Could not download file. Request status code: {r.status_code}')
    else:
        print(
            'File already exists. Nothing was downloaded. Set force_download=True',
            'to download the file anyway and replace the existing one.')
        download_success = False
    return download_success

# School locations
| Name of file | Description | Date of download | Source | Application process | Costs |
| --- | --- | --- | --- | --- | --- |
| See code below | Contains location and additional information of all Chicago Public School's for a given school year. | 2018-07-12 | See code below | None, can be downloaded without registration. | None |

A list of all available school years can be found on the [official data portal of the City of Chicago](https://data.cityofchicago.org/browse?q=school+locations&sortBy=relevance).

In [4]:
school_loc_sources = {
    # SY1314
    'Units2013_14.csv':
    'https://data.cityofchicago.org/api/views/dgq3-i7xm/rows.csv?accessType=DOWNLOAD',
    # SY1415
    'CPS_School_Locations_SY1415.csv':
    'https://data.cityofchicago.org/api/views/mntu-576c/rows.csv?accessType=DOWNLOAD',
    # SY1516
    'CPS_School_Locations_SY1516.csv':
    'https://data.cityofchicago.org/api/views/mb74-gx3g/rows.csv?accessType=DOWNLOAD',
}

In [5]:
if re_download:
    for f_name, f_url in school_loc_sources.items():
        download_file(f_url, f_name, sub_path='school_locations')

File already exists. Nothing was downloaded. Set force_download=True to download the file anyway and replace the existing one.
File already exists. Nothing was downloaded. Set force_download=True to download the file anyway and replace the existing one.
File already exists. Nothing was downloaded. Set force_download=True to download the file anyway and replace the existing one.


# Crimes

| Name of file | Description | Date of download | Source | Application process | Costs |
| --- | --- | --- | --- | --- | --- |
| See code below | *"This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days."* - [Chicago Data Portal](https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2) | 2018-01-25 | See code below | None, can be downloaded without registration. | None |

In [6]:
download_file('https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.csv?accessType=DOWNLOAD',
             'Crimes_-_2001_to_present.csv')

File already exists. Nothing was downloaded. Set force_download=True to download the file anyway and replace the existing one.


False

# Safe Passage routes
| Name of file | Description | Date of download | Source | Application process | Costs |
| --- | --- | --- | --- | --- | --- |
| See code below | Contains location and additional information on Safe Passage routes for a given school year | 2018-01-25 | See code below | None, can be downloaded without registration. | None |


The unzipped shapefiles for the Safe Passage routes have a unique name for each download request. Therefore you might need to adjust the names in the notebook `../1_prepare_data/1.0-binste-routes.ipynb` if you redownload the files below.

In [7]:
routes_sources = {'Chicago Public Schools - Safe Passage Routes SY1314.zip':
                 'https://data.cityofchicago.org/api/geospatial/b4yy-ytgy?method=export&format=Shapefile',
                 'Chicago Public Schools - Safe Passage Routes SY1415.zip':
                 'https://data.cityofchicago.org/api/geospatial/4s9i-vyw7?method=export&format=Shapefile',
                  'Chicago Public Schools - Safe Passage Routes SY1516.zip':
                  'https://data.cityofchicago.org/api/geospatial/adhw-m4zi?method=export&format=Shapefile',
                 }

In [8]:
if re_download:
    for f_name, f_url in routes_sources.items():
        success_flag = download_file(f_url, f_name, sub_path='routes')
        if success_flag:
            zip_path = Path('../../data/raw/routes/')
            with zipfile.ZipFile(zip_path / f_name, 'r') as zip_ref:
                zip_ref.extractall(zip_path / Path(f_name).stem)

File already exists. Nothing was downloaded. Set force_download=True to download the file anyway and replace the existing one.
File already exists. Nothing was downloaded. Set force_download=True to download the file anyway and replace the existing one.
File already exists. Nothing was downloaded. Set force_download=True to download the file anyway and replace the existing one.


# Census block boundaries
| Name of file | Description | Date of download | Source | Application process | Costs |
| --- | --- | --- | --- | --- | --- |
| See code below | Contains 2010 Census block boundaries | 2018-04-09 | See code below | None, can be downloaded without registration. | None |

In [9]:
if re_download:
    success_flag = download_file(
        'https://data.cityofchicago.org/api/geospatial/mfzt-js4n?method=export&format=Shapefile',
        'Boundaries - Census Blocks - 2010.zip')
    if success_flag:
        zip_path = Path('../../data/raw/')
        f_name = 'Boundaries - Census Blocks - 2010.zip'
        with zipfile.ZipFile(zip_path / f_name, 'r') as zip_ref:
            zip_ref.extractall(zip_path / Path(f_name).stem)

File already exists. Nothing was downloaded. Set force_download=True to download the file anyway and replace the existing one.


# Implementation of Safe Passage program
| Name of file | Description | Date of download | Source | Application process | Costs |
| --- | --- | --- | --- | --- | --- |
| `Safe_Passage_Schools_By_Implementation_Year_8.12.16.xlsx` | Contains information on when (school year) the Safe Passage program was implemented at what school. | 2018-05-01 | Chicago Public Schools | Obtained through a Freedom of Information Act request to the Chicago Public School body. The initial request can be found on the [official CPS FOIA website](https://cps.edu/About_CPS/Departments/Law/Pages/FOIARequest.aspx) by going to the "Archive" and searching for my name, Stefan Binder. | None |
