# Download Recent Data from Open Data NYC

The following Jupyter notebook downloads data from [Open Data NYC](https://opendata.cityofnewyork.us/).

The two data sets (which are updated daily) that will be downloaded are

(a) ['NYC Open Data Rat Sightings'](#https://data.cityofnewyork.us/Social-Services/Rat-Sightings/3q43-55fe/about_data)

(b) ['NYC Open Data on Rat Inspections'](#https://data.cityofnewyork.us/Health/Rodent-Inspection/p937-wjvj/about_data) 


The data on Rat Sightings is saved to the folder "rat_sightings_data" as the csv titled "Rat_Sightings_NYC.csv". 

The data on Rat Inspections is saved to the folder "split_up_rat_inspection_data" and is split into 15mb sized .csv files which the follow the format of "rodent_insepection_*.csv" where * a whole number.

In [1]:
import requests 
import os
import shutil
from pathlib import Path


In [2]:
# We downloads the rat sightings data to a folder.

import requests

url = "https://data.cityofnewyork.us/api/views/3q43-55fe/rows.csv?accessType=DOWNLOAD"

response = requests.get(url)
response.raise_for_status()

with open("rat_sightings_data/Rat_Sightings_NYC.csv", "wb") as f:
    f.write(response.content)

In [3]:
# We downloads the rat inspection data to the folder.
# We make sure to split it up so that our files are not too large.

out_dir = Path("split_up_rat_inspection_data")

if out_dir.exists():
    shutil.rmtree(out_dir)

out_dir.mkdir(parents=True, exist_ok=True)

URL = "https://data.cityofnewyork.us/api/views/p937-wjvj/rows.csv?accessType=DOWNLOAD"
OUT_DIR = "split_up_rat_inspection_data"
MAX_BYTES = 15 * 1024 * 1024  # 15 MB

os.makedirs(OUT_DIR, exist_ok=True)

with requests.get(URL, stream=True) as r:
    r.raise_for_status()
    lines = r.iter_lines(decode_unicode=True)

    header = next(lines) + "\n"

    part = 1
    size = 0
    f = open(f"{OUT_DIR}/rodent_inspection_{part}.csv", "w", encoding="utf-8")
    f.write(header)
    size += len(header.encode())

    for line in lines:
        line = line + "\n"
        line_bytes = len(line.encode())

        if size + line_bytes > MAX_BYTES:
            f.close()
            part += 1
            size = len(header.encode())
            f = open(f"{OUT_DIR}/rodent_inspection_{part}.csv", "w", encoding="utf-8")
            f.write(header)

        f.write(line)
        size += line_bytes

    f.close()

In [4]:
# We download the catch basin data to a folder.

import requests

url = "https://data.cityofnewyork.us/api/views/2w2g-fk3i/rows.csv?accessType=DOWNLOAD"

response = requests.get(url)
response.raise_for_status()

with open("storm_catch_basin_data/NYC_catch_basin_citywide.csv", "wb") as f:
    f.write(response.content)