## Getting the data

The `PLUTO dataset` is distributed online by the New York City Department of City Planning as a part of its "Bytes of the Big Apple" platform. You can visit the page [here](http://www1.nyc.gov/site/planning/data-maps/open-data.page), or download the data directly [here](www1.nyc.gov/assets/planning/download/zip/data-maps/open-data/nyc_pluto_16v1.zip). Then unzip the data locally&mdash;to keep things tidy I did this in the `data/` subdirectory.

In [None]:
%ls data

The `CSV` files here, one for each borough, are the components of the PLUTO dataset. The `PDF` files are support files&mdash;so-called data dictionaries explaning how the data was generated and what it means:

In [None]:
from IPython.display import Image

Image("figures/PLUTO Data Dictionary Screen Grab.png")

`CSV` is short for "comma-seperated values", and is a popular and very simple format for storing data, where each entry is seperated by commas.

Every programming language has a utility for reading `CSV` data. In scientific Python that's the `pandas` library (most often shortened to `pd` in practice):

In [None]:
import pandas as pd

manhattan = pd.read_csv("data/MN.csv")

In [None]:
# r = requests.get('http://www1.nyc.gov/assets/planning/download/zip/data-maps/open-data/nyc_pluto_16v1.zip')
# pluto_filenames = ['MN.csv', 'BK.csv', 'BX.csv', 'SI.csv', 'QN.csv']

# pluto_data = dict()
# for filename in pluto_filenames:
#     with zipfile.ZipFile(io.BytesIO(r.content)) as ar:
#         with open(filename, "w") as f:
#             f.write(ar.open(filename))

In [None]:
manhattan.head(5)

(toggling on an option to add more entries to the display)

In [None]:
pd.set_option("max_columns", 100)
manhattan.sample(5)

These elements of the dataset are all things you can study!

In [None]:
manhattan.columns

## Exercise: Counting buildings

As a simple exercise, let's take a look at which borough has what number of buildings.

Renowned Manhattan Project scientist Enrico Fermi famously loved to ask his physics students at UChicago to count seemingly nonsensical things&mdash;an exercise in creative thinking now best known as a "Fermi approximation problem" (heads up, Microsoft *loves* to ask these questions on interviews!). One particularly famous variant asked: how many piano tuners are there in the city of Chicago?

We challenge you to try it out yourself. Without any outside help, try to guess the answer to the following question: how many buildings are there in Manhattan? Group up and thinking about it!

Bous question: which borough has the most buildings?

### Answer

This information is encoded on a lot level in the `NumBuildings` attribute of the PLUTO dataset; if we extract this column and sum across it we can get an answer!

In [None]:
manhattan['NumBldgs']

In [None]:
sum(manhattan['NumBldgs'])

In [None]:
brooklyn = pd.read_csv("data/BK.csv")
queens = pd.read_csv("data/QN.csv")
bronx = pd.read_csv("data/BX.csv")
staten_island = pd.read_csv("data/SI.csv")

In [None]:
sum(brooklyn['NumBldgs'])

In [None]:
sum(queens['NumBldgs'])

In [None]:
sum(bronx['NumBldgs'])

In [None]:
sum(staten_island['NumBldgs'])

With a little bit more work we can get this into a neater representation, and then, from there, into a plot:

In [None]:
borough_counts = pd.Series({
                    'Manhattan': sum(manhattan['NumBldgs']),
                    'Brooklyn': sum(brooklyn['NumBldgs']),
                    'Bronx': sum(bronx['NumBldgs']),
                    'Staten Island': sum(staten_island['NumBldgs']),
                    'Queens': sum(queens['NumBldgs'])
                 })

In [None]:
borough_counts

In [None]:
%matplotlib inline 

borough_counts.plot(kind='bar')

## What about number of floors per building, for example?

In literally one line of code:

In [None]:
manhattan['NumFloors'].astype(int).value_counts().sort_index().plot(kind='barh', figsize=(10, 24), fontsize=16)

In [None]:
# manhattan[manhattan['NumFloors'].isin(manhattan['NumFloors'].sort_values(ascending=False)[:5])][['Address', 'OwnerName', 'NumFloors']]

### Going further: a look at more advanced geospatial uses of this data

* Andrew Hill from the civic tech community put together a [worldwind tour](http://andrewxhill.com/cartodb-examples/scroll-story/pluto/#0).