# The Urbanizational Development of New York City 

***The City That Never Sleeps - From Seed To Apple***


# Table of Content

1. [Motivation](#Motivation)
2. [Basic Statistics](#Base_stats)
    - [Dataset 1: Landuse in New York City](#Dataset_of_the_buildings)
    - [Dataset 2: The population evolution of New York City](#Dataset_of_the_population)
    - [Dataset 3: The Racial Population in New York City Neighboorhoods](#Dataset_of_the_segregation)
3. [Data Analysis](#Data_Analysis)
4. [Genre](#Genre)
5. [Visualization](#Visualization)
    - [Figure 1: Geographical Distribution of Residential and Commercial Construction over Time](#Figur_1)
    - [Figure 2: Constructed Square Feet by Decade](#Figur_2)
    - [Figure 3: Constructed Square Feet by Decade and Borough](#Figur_3)
    - [Figure 4: Population by Borough](#Figur_4) 
    - [Figure 5: Foreign Born Population by Borough from 1900-2010](#Figur_5) 
    - [Figure 6: An Interactive Look at New York City's Racial Composition](#Figur_6)
    - [Figure 7: Racial Composition By Census Tract](#Figur_7)
6. [Discussion](#Discussion)
    - [Findings](#Findings)
    - [For Further Studies](#For_Further_Studies)
7. [Contribution](#Controbution)
8. [References](#References)

The notebook is a behind-the-scenes look at the data wrangling behind the story on *The Urbanizational Evolution of New York City - From Seed to Apple* presented on https://esbenbl.github.io/ass_b/

To run this notebook you first need to load the packages in the next section (see `requirements.txt` in the Github-repo for dependencies).

In [68]:
import geopandas as gpd
import requests 
import matplotlib.pyplot as plt 
%matplotlib inline
plt.rcParams["font.family"] = "Garamond"
plt.rcParams['axes.facecolor'] = "#FFF6E9"
import numpy as np
import pandas as pd
import seaborn as sns
import requests
import folium 
from folium import plugins
import warnings
warnings.filterwarnings("ignore")
import matplotlib.colors as mcolors
import plotly.graph_objs as go
import json
from jinja2 import Template
from folium.map import Layer
from branca.element import Template, MacroElement, Figure

# **1 Motivation** <a id="Motivation"></a>


<div>
    <p>
    Let's paint the picture! A boat filled with tired but hopeful families glides across the still morning waters. They have all left their old homes for the promises and opportunities of the new world. In the distance, through the mist, a statue the size of a skyscraper breaks through the morning fog. The mood among the seaworn travellers brightens, realizing that they have finally arrived at their destination. Soon after, a skyline of actual skyscrapers paints the horizon and the bustling sounds of city life fill the air. Sounds familiar? If not, here's a picture to help!
    <p>
<div style="text-align:center;">
    <h4>A Hazy View of Travellers Greeting the Statue of Liberty</h4>
    <img src="https://lavocedinewyork.com/wp-content/uploads/2018/04/italiani-emigrati-negli-usa.png">
</div>
<br>
<div>
    <p><b>New York City</b> is undeniably one of - if not <i>the</i> most influential and renowned city in western society. In many ways, the city encapsulates the idea of The American Dream, with its buzzing streets, opportune business life, and rich cultural scene, spawning sayings like <i>"if you can make it here, you can make it anywhere"</i>. Pop cultural references to New York City is omnipresent in western movies, music, and art and each year <a href="https://en.wikipedia.org/wiki/List_of_cities_by_international_visitors" target="_blank">millions of turists</a> from all over the world flock to the city see all the <i>"familiar"</i> places in real life. <i>The City That Never Sleeps, the Capital of the World, the Big Apple</i> - New York City has earned itself many nicknames. The city has the <a href="http://www.citymayors.com/statistics/richest-cities-2020.html" target="_blank">second highest GDP in the world</a>, and continues to be <a href="https://www.newyorkfed.org/medialibrary/media/research/epr/05v11n2/0512glae.pdf" target="_blank">the largest city in the US for over 200 years</a> with a population of around 8.5 million people today. Professor of economics at Harvard University, Edward L. Glaeser, states that <i>"While Boston's history is one of ongoing crises and reinvention (Glaeser 2005), New York's is one of almost unbroken triumph"</i> (Glaeser 2005:1). But what could be the explanation for New York City's urbanizational triumph over the past 200 years?
    <br><br>
    Answering this question is a complex task - way too complex for this short data story. Rather, what we seek is to gain a better understanding of New York City's so-called <i>unbroken triumph</i> as a city by mapping the urbanisational development in New York City's journey from a small seed to the Big Apple using open data, visualizations and statistics. First, we'll examine the history of how New York City took the physical form that we know of today with its bustling streets and 100+ story skyscrapers. Then, we turn to those who the skyscrapers were built by and for, namely the inhabitants of New York City. How did New York City come to be the continuously most populous city in the US and what role did immigration play in this part of New York City's triumph? Subsequently, we take a contemporary look at New York City and its neighborhood from the perspective of racial segregation. In a multicultural melting pot such as New York City, which geographical patterns emerge when one observe the city through a racial lens? Ultimatly, we'll synthesise the insights that we gained throughout the three abovementioned sections and discuss where to go from here. Let's dive right in!

The above introduction is also the introduction on our website before we *"dive right in"* to our data narrative. In this notebook, instead, we explain our thoughts behind the project, our datasets, the data analysis, and our visualizations of choice. We start by introducing some `Basic Statistics` about the three datasets that we constructed for our analysis, which are:
 1. [The Primary Land Use Tax Output (PLUTO)](https://www.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page) dataset, containing information about the residential versus commercial use of land lots in New York City since 1652. We use this dataset for Figure 1, Figure 2 and Figure 3 on the website. 
 2. [The Population](https://www.nyc.gov/site/planning/planning-level/nyc-population/historical-population.page) dataset, containing information about the development of New York City's total and foreign born population from 1900-2040 (including projections). We use this dataset for Figure 4 and Figure 5 on the website. 
 3. [The American Community Survey 5-Year Estimates](https://www.census.gov/data/developers/data-sets/acs-5year.html) dataset, containing information about racial characteristics of residents within New York City neighborhoods on US census tract level in 2021. We use this dataset for Figure 6 (and Figure 7).

Then, in `Data Analysis`, we go over some of the things we learned about these datasets from the analysis presented on the website. Subsequently, we reflect upon the narrative tools we used to tell our data story in the section `Genre`. Thereafter, we explain the narrative role of each of the visualisations used in the analysis in the section `Visualizations`. Finally, we conclude with a `Discussion`, where we present concluding thoughts on the findings of our project, as well as recommendations for further studies.




# **2 Basic Statistics** <a id="base_stats"></a>

In this project, we use multiple datasets - briefly presented in the above introduction - to get the most thorough picture of New York City's urbanizational development over the previous 200+ years. In this section, we present the three datasets that we have constructed for the project as well as the data processing behind our analysis.

First thing on the agenda, however, is to load in this geojson file from a [Github Repo](https://github.com/codeforgermany/click_that_hood/blob/main/public/data/new-york-city-boroughs.geojson) containing the geometry of the five boroughs of New York City.

In [69]:
# NYC Borough GeoJson
new_york_boroughs_map = gpd.read_file("https://raw.githubusercontent.com/codeforgermany/click_that_hood/main/public/data/new-york-city-boroughs.geojson")

### **Dataset 1: Land Use in New York <a id="Dataset_of_the_buildings"></a>**

An essential dataset for this project is the Primary Land Use Tax Lot Output (PLUTO) data from <a href="https://www.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page" target="_blank">NYC Planning</a>, containing information about the primary use of land lots in New York City since 1652. With this dataset, we'll be able to tell when a building was constructed (`yearbuilt`), where the building was constructed (`latitude` and `longitude`), what the primary use of the building was (`landuse`), and how big the building was (`bldgarea`). We will also be able to discern between how much of an individual building was used for commercial space (`comarea`) and how much was used for residential space (`resarea`). Disregarding the very early years due to data uncertainty and building inactivity, we are thus able to map the constructional development of New York City from 1800 all the way to 2020.

In the following code, we load, clean, and process the PLUTO dataset, making it ready for analysis. The dataset is too big for a Github Repo, so in order to replicate our findings, one would have to download the dataset from <a href="https://www.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page" target="_blank">this link</a>.

In [88]:
# PLUTO Data from https://www.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page
columns_subset = ["borough","cd", "latitude", "longitude", "yearbuilt", 'landuse', "assesstot", "numbldgs",
                  "numfloors", "unitstotal", "bldgarea", "comarea", "resarea"]

land_use_dataaset = pd.read_csv("Exam_datasets/pluto_22v3_1.csv")[columns_subset]
print(f"Observations in PLUTO before cleaning: {land_use_dataaset.shape[0]}")

Observations in PLUTO before cleaning: 858619


Uncleaned, the PLUTO dataset contains 858.619 observations on land lot use. 

Next, we'll clean it a bit, removing nan-values, converting to appropriate dtypes, grouping observations into decades, applying labels the data values, and removing observations before 1800 (uncertainty and inactivity) and after 2019 (the decade of 2020s is naturally incomplete).

In [89]:
# drop nan for PLUTO
land_use_dataaset = land_use_dataaset.dropna().copy() # drop na 
land_use_dataaset.yearbuilt = land_use_dataaset.yearbuilt.astype(int) # from float to int 
land_use_dataaset_trimmed = land_use_dataaset.query("yearbuilt >= 1800 & yearbuilt < 2020").copy() # removed before 1800 due to uncertainty/inactivity and after 2019 because decade is not finished

# Construct decade bins 
ten_year_bins = [0]+[year for year in range(1899,2029,10)]
ten_year_labels = ["Before 1900"] + [str(year)+"s" for year in range(1900,2020,10)]

# Into Bins
land_use_dataaset_trimmed["yearbuilt_intervals"] = pd.cut(land_use_dataaset_trimmed.yearbuilt, bins = ten_year_bins, labels = ten_year_labels)

# Dicts to convert values  
landuse_key = {1:"One & Two Family Building",
                2:"Multi-Family Walk-Up Buildings",
                3:"Multi-Family Elevator Buildings",
                4:"Mixed Residential & Commercial Buildings",
                5:"Commerical & Office Buildings",
                6:"Industrial & Manufacturing Buildings",
                7:"Transportation & Utility",
                8:"Public Facilities & Institutions",
                9:"Open Space & Outdoor Recreation",
                10: "Parking Facilities",
                11:"Vacant Land"}

borough_key = {"BK":"Brooklyn",
               "QN":"Queens",
               "MN":"Manhattan",
               "BX":"Bronx",
               "SI":"Staten Island"}

# Map dicts
land_use_dataaset_trimmed.borough = land_use_dataaset_trimmed.borough.apply(lambda x: borough_key[x])
land_use_dataaset_trimmed["landuse_label"] =  land_use_dataaset_trimmed.landuse.apply(lambda x: landuse_key[x])

print(f"We end with a final dataset with a total of {land_use_dataaset_trimmed.shape[0]} observations")

We end with a final dataset with a total of 805354 observations


After cleaning and processing, our land use dataset consists of 805.354 land lots. This means we have filtered out approximately 50.000 observations. 

The dataset is now ready for Figure 2 and Figure 3. However, Figure 1 is going to be a temporal heatmap of the geographical distribution of construction in New York City including a distinction between residential land use and commercial/non-residential land use. Below, we wrangle the PLUTO data further to allow for such visualization.

First, we classify buildings labelled as `Mixed Residential & Commercial Buildings` as either `Mostly Residential` or `Mostly Commercial` depending on whether most of its area is used for residential (`resarea`) or commercial (`comarea`) use. 

In [90]:
# For mixed landuse, clasify as either Mostly Commerical or Mostly Residential based on square feet 
land_use_dataaset_trimmed["land_use_for_heatmap"] = np.where((land_use_dataaset_trimmed.landuse_label == "Mixed Residential & Commercial Buildings") &\
                                                                 (land_use_dataaset_trimmed.comarea > land_use_dataaset_trimmed.resarea),
                                                                 "Mostly Commerical",
                                                                  land_use_dataaset_trimmed.landuse_label)

land_use_dataaset_trimmed["land_use_for_heatmap"] = land_use_dataaset_trimmed["land_use_for_heatmap"].replace("Mixed Residential & Commercial Buildings", "Mostly Residential")

The exact year of construction is associated with a certain degree of uncertainty in the PLUTO dataset. We will talk more about this later in `Thoughts on Dataset 1`. For now, to even out the uncertainty, we group the construction years into 5 year bins, so it is structured like this: 1800-1804, 1805-1809, 1810-1814 etc. 

In [91]:
# group year to the nearest multple of 5 (rounding down, e.g. 1997->1995)
land_use_dataaset_trimmed['yearbuilt_grouped'] = land_use_dataaset_trimmed['yearbuilt'] // 5 * 5

We split PLUTO into a residentail and a commercial/non-residential based on primary land use

In [92]:
## Subset based on residential and non-residential land-use 
residential_buildings_and_landuse = land_use_dataaset_trimmed[land_use_dataaset_trimmed.land_use_for_heatmap.isin(['One & Two Family Building',
                                                                                                                    "Mostly Residential",
                                                                                                                    'Multi-Family Walk-Up Buildings',
                                                                                                                    'Multi-Family Elevator Buildings',
                                                                                                                    ])].copy()

commercial_buildings_and_landuse = land_use_dataaset_trimmed[land_use_dataaset_trimmed.land_use_for_heatmap.isin(['Mostly Commercial',
                                                                                                                    "Commerical & Office Buildings",
                                                                                                                    'Industrial & Manufacturing Buildings',
                                                                                                                    'Transportation & Utility',
                                                                                                                    "Parking Facilities",
                                                                                                                    "Public Facilities & Institutions"
                                                                                                                    ])].copy()

In [93]:
print(f"Lots used primarily for commercial/industrial purposes: {commercial_buildings_and_landuse.shape[0]}")
print(f"Lots used primarily for residential purposes: {residential_buildings_and_landuse.shape[0]}")

Lots used primarily for commercial/industrial purposes: 46818
Lots used primarily for residential purposes: 750402


Simply counting buildings, see that there have been far more residential construction (~750.000) compared to commercial/industrial (~47.000). Let's look at the relationship when looking at the number of constructed of square feet. 

In [94]:
print(f"Billion square feet for commercial/industrial use: {commercial_buildings_and_landuse.bldgarea.sum() / 1_000_000_000}")
print(f"Billion square feet for residential use: {residential_buildings_and_landuse.bldgarea.sum() / 1_000_000_000}")

Billion square feet for commercial/industrial use: 1.553761395
Billion square feet for residential use: 3.710573608


Judging by square feet, we see that the are about 2.5 times more residential space than commercial/industrial. 

Next, we need to divide the observations into their associated 5-year time bin. 

In [95]:
# Define data range
data_range = range(min(residential_buildings_and_landuse.yearbuilt.min(), commercial_buildings_and_landuse.yearbuilt.min()),
                   max(residential_buildings_and_landuse.yearbuilt.max(), commercial_buildings_and_landuse.yearbuilt.max())+1, 5)

# List of lists with coordinate - the indices of the outer-lists represents 5-year bins
residential_list_of_lists_with_coordinates = [residential_buildings_and_landuse.query(f"yearbuilt_grouped == {year}")[["latitude", "longitude"]].values.tolist() for year in data_range]
commercial_list_of_lists_with_coordinates = [commercial_buildings_and_landuse.query(f"yearbuilt_grouped == {year}")[["latitude", "longitude"]].values.tolist() for year in data_range]

The PLUTO data has now been divided based on primary land use and converted into a list of lists with coordinates, where the indices of the outer-lists represents 5-year bins between 1800 and 2019. The data is now fit for the temporal heatmap in Figure 1. 

### **Thoughts on Dataset 1**

The PLUTO dataset is not without its limitations. One limitation is that the dataset only contains information on buildings and land lots that still exists today. Thus, we do not have information on e.g. demolished buildings or whether the land was once residential and then changed to commercial. A second limitation is regarding the variable `yearbuilt`. In the PLUTO documentation, it says that the relationship between the actual year of construction and the reported year of construction is associated with a certain degree of uncertainty (Pluto Documentation: 35). In Figure 1, we noticed a significantly higher degree of reported construction on "even years" such as 1900, 1910, and 1920 or 1935, 1945, and 1955, especially during the first half of the 20th century. As previously mentioned, we try to accommodate for this by binning the construction years into 5-year groups. Acknowledging these limitations, we still argue that the data conveys an adequately accurate representation of New York City urbanizational development over time.  

### **Dataset 2: Population Development and Foreign Born Population of New York<a id="Dataset_of_the_population"></a>**

The second dataset used in our analysis concerns the development in the population of New York City from 1900 to 2040. To construct this dataset, we have used data from <a href="https://www.nyc.gov/site/planning/planning-level/nyc-population/historical-population.page" target="_blank">the official government website of New York City</a> on the total and foreign born population of New York City and of its five boroughs for each decade between 1900 and 2010 (Figure 4 and 5). For data on the population in 2020, we used the <a href="https://www.census.gov/data/developers/data-sets/decennial-census.html" target="_blank">US Census Bureau's Decinnal Census</a> (Figure 4). Lastly, for the population projections for 2030 and 2040, we used <a href="https://www.nyc.gov/assets/planning/download/pdf/planning-level/nyc-population/projections_report_2010_2040.pdf" target="_blank">NYC Planning's Population estimates for 2030-2040</a> (Figure 4). 

Let's go over the data cleaning and processing for dataset 2.

### **Dataset 2.1: Total Population**

Loading and preparing the datasets

In [96]:
population_by_borough = pd.read_excel("https://www.nyc.gov/assets/planning/download/office/planning-level/nyc-population/historical-population/nyc_total_pop_1900-2010.xlsx", skiprows = 3, index_col = 0)
population_by_borough = population_by_borough.dropna().reset_index(names = ["decade"])

We add population data for 2020 from <a href="https://www.census.gov/data/developers/data-sets/decennial-census.html" target="_blank">US Census Bureau's Decinnal Census</a>. To retrieve the data, we use their API.

In [97]:
# US Census parameters 
API_KEY = "ffe97aa3a40b95750950c76a41624538483d4731"

NY_STATE = ','.join(["36"])

COUNTIES = ','.join(["047", "061", "005", "081", "085"])

COUNTY_TO_BOROUGH = {"081": "Queens",
                     "085": "Staten Island",
                     "047": "Brooklyn",
                     "005": "Bronx",
                     "061": "Manhattan"}

POP_LABEL = {"P1_001N":"population"}

Call US Census API

In [99]:
url = f"https://api.census.gov/data/2020/dec/pl?get=NAME,P1_001N&for=county:{COUNTIES}&in=state:{NY_STATE}&key={API_KEY}"
resp = requests.get(url).json()

NYC_pop_2020 = pd.DataFrame(resp[1:], columns = resp[0]) # resp to df

# Clean data 
NYC_pop_2020["BOROUGH"] = NYC_pop_2020.county.map(COUNTY_TO_BOROUGH) # map borough labels
NYC_pop_2020 = NYC_pop_2020.rename(columns = POP_LABEL) # rename columns
NYC_pop_2020["population"] = NYC_pop_2020["population"].astype(float)  # to float
NYC_pop_2020 = NYC_pop_2020.drop(["NAME", "state", "county"], axis = 1) # drop needless columns
NYC_pop_2020["decade"] = 2020 # define decade
NYC_pop_2020 = NYC_pop_2020.pivot(index = "decade", values= "population", columns = "BOROUGH") # pivot df
NYC_pop_2020["New York City"] = NYC_pop_2020.values.sum() # calculate total pop in NYC
NYC_pop_2020 = NYC_pop_2020.reset_index()

Concatenate 2020 data on 1900-2010 data

In [100]:
population_by_borough = pd.concat([population_by_borough,NYC_pop_2020]).set_index("decade")

Get population estimates from NYC Planning. We manually extracted them [from this PDF](https://www.nyc.gov/assets/planning/download/pdf/planning-level/nyc-population/projections_report_2010_2040.pdf).

In [101]:
NYC_pop_projection = {"New York City":{2030:8_821_027,
                                       2040:9_025_145},
                      "Bronx":{2030:1_518_998, 
                               2040:1_579_245},
                      "Brooklyn":{2030:2_754_009,
                                  2040:2_840_525},
                      "Manhattan":{2030:1_676_720,
                                   2040:1_691_617},
                      "Queens":{2030:2_373_551,
                                2040:2_412_649},
                      "Staten Island":{2030:497_749,
                                       2040:501_109}}

NYC_pop_projection = pd.DataFrame(NYC_pop_projection)

# Concat 2020 data to make the lines in the plot have the same offset 
NYC_pop_2020 = population_by_borough.loc[2020:]
NYC_pop_projection = pd.concat([NYC_pop_2020, NYC_pop_projection])

We now have all we need to construct Figure 4.

### **Dataset 2.2: Foreign Born Population of New York**  

Load data on foreign born population

In [103]:
foreign_born_pop = pd.read_excel("https://www.nyc.gov/assets/planning/download/office/planning-level/nyc-population/historical-population/nyc_fb_pop_1900-2010.xlsx", skiprows = 3, index_col = 0)
foreign_born_pop = foreign_born_pop.dropna().reset_index(names = ["decade"])
foreign_born_pop.decade = foreign_born_pop.decade.apply(lambda x: str(x).replace("*","")).astype(int) # remove "*" and convert to int 
foreign_born_pop = foreign_born_pop.set_index("decade")

Calculate the share of foreign born population within each borough since 1900

In [104]:
share_of_foreign_born_by_decade = foreign_born_pop / population_by_borough.loc[1900:2010] * 100
share_of_foreign_born_by_decade = share_of_foreign_born_by_decade.drop("New York City", axis =1 )

We now have all we need to construct Figure 5.

### **Thoughts on Dataset 2**

Because of the simplicity of both datasets there is not much cleaning or preparation to be done. 

Looking into the documentation, the construction of these datasets seems to have been a rather complicated endeavour with data originating from multiple sources. For the purpose of this project, we trust the New York City government officials and the quality of the data. If one would like to further scrutinise the construction of the population datasets, this could be their point of departure: <a href="https://www.nyc.gov/assets/planning/download/pdf/planning-level/nyc-population/historical-population/nyc_total_pop_1900-2010.pdf" target="_blank">Documentation for population development</a> and <a href="https://www.nyc.gov/assets/planning/download/pdf/planning-level/nyc-population/historical-population/nyc_fb_pop_1900-2010.pdf" target="_blank">Documentation for foreign born population development</a>.

A smaller limitation of `Dataset 2.1` is that the population projections for 2030 and 2040 were made in 2013. This obviously makes them more prone to inaccuracy, compared to if they had been made in more recent time. We keep this in mind, but would not say that it is anywhere near a critical issue. 




### **Dataset 3: The Racial Population in New York City Neighboorhoods**<a id="Dataset_of_the_segregation"></a>

The third and final dataset contains information about racial characteristics of residents within New York City neighborhoods on US census tract level in 2021. The dataset is from <a href="https://www.census.gov/data/developers/data-sets.html" target="_blank">the United States Census Bureau</a> and gathered as part of the <a href="https://www.census.gov/programs-surveys/acs" target="_blank">American Community Survey (ACS)</a>, a survey used to help and inform government officials. ACS have four survey based datasets: <a href="https://www.census.gov/programs-surveys/acs/guidance/estimates.html" target="_blank">1-year estimates, 1-year supplemental estimates, 3-year estimates, and 5-year estimates</a>. For this project, we use the <a href="https://www.census.gov/data/developers/data-sets/acs-5year.html" target="_blank">5-year estimates</a> as it gives the most reliable and granular data. The 5-year estimates dataset is a very big survey covering the entirety of the United States, while at the same time being very granular. It contains residential information on US census tract level, which we will use to investigate neighborhood segregation in contemporary New York City. We are specifically interested in the `race` variables of the dataset.

The following code presents how we attained and processed this dataset. 

### **Dataset 3.1: Data on Race**

We start by defining the overall variables which we are using to construct the dataset. The available variable of the API and their labels are found [here](https://api.census.gov/data/2021/acs/acs5/subject/variables.html).

In [118]:
# US Census Bureau API parameters. 
API_KEY = "ffe97aa3a40b95750950c76a41624538483d4731"

NY_STATE = ','.join(["36"])

COUNTIES = ','.join(["047", "061", "005", "081", "085"])

COLUMN_LABELS = {"DP05_0001E": "total_pop",
                 "DP05_0037PE": "pct_white_one_race",
                 "DP05_0038PE": "pct_black_one_race",
                 "DP05_0044PE": "pct_asian_one_race",
                 "DP05_0071PE": "pct_hispanic_or_latino_any"}

COUNTY_TO_BOROUGH = {"081": "Queens",
                     "085": "Staten Island",
                     "047": "Brooklyn",
                     "005": "Bronx",
                     "061": "Manhattan"}



POP_VAR = ["DP05_0001E"]
RACE_VAR = ["DP05_0037PE", "DP05_0038PE", "DP05_0044PE", "DP05_0071PE"]

QUERY = ",".join(POP_VAR+RACE_VAR)

Call the API and process response.

In [119]:
url = f"https://api.census.gov/data/2021/acs/acs5/profile?get=NAME,{QUERY}&for=tract:*&in=county:{COUNTIES}&in=state:{NY_STATE}&key={API_KEY}"
resp = requests.get(url).json()
us_census_df = pd.DataFrame(resp[1:], columns = resp[0]) # resp to df 

# Clean response 
us_census_df = us_census_df.rename(columns = COLUMN_LABELS) # map column names 
us_census_df["BOROUGH"] = us_census_df.county.map(COUNTY_TO_BOROUGH) # map boroughs

us_census_df = us_census_df.query("tract!='990100'") # Remove "errorneus" tracts
us_census_df[list(COLUMN_LABELS.values())] = us_census_df[list(COLUMN_LABELS.values())].astype(float) # Convert to float
us_census_df = us_census_df.replace(-666666666, np.nan) # Convert nan values 

### In areas where noone lives (based on the "total population") replace Nan-value in the other columns with 0. 
for col in list(COLUMN_LABELS.values()):
    us_census_df[col] = np.where((us_census_df["total_pop"] == 0 & pd.isna(us_census_df[col])),
                                 0,
                                 us_census_df[col])
    
print(f"Neighborhoods/census tracts in New York City: {us_census_df.shape[0]}")

Neighborhoods/census tracts in New York City: 2324


We now have information on the racial composition on all 2324 "neighborhoods"/census tracts in New York City in 2021. Next, we need to add some geographical information.  

### **Dataset 3.2: Geodata on Census Tract Level**

Load GeoJson with Census Tract geometry from [arcgis](https://www.arcgis.com/index.html).

In [120]:
census_tracts_geo = gpd.read_file("https://services5.arcgis.com/GfwWNkhOj9bNBqoJ/arcgis/rest/services/NYC_Census_Tracts_for_2020_US_Census/FeatureServer/0/query?where=1=1&outFields=*&outSR=4326&f=pgeojson")

Align with the dataset from the US Census Bureau

In [121]:
## Match US Census data tract IDs to the census tract geo data IDs
US_CENSUS_KEY = census_tracts_geo[["BoroName", "BoroCode"]].drop_duplicates().set_index("BoroName").to_dict()["BoroCode"]
us_census_df["borough_id"] = us_census_df.BOROUGH.map(US_CENSUS_KEY).astype("str")
us_census_df["tract_id"] = us_census_df["borough_id"] + us_census_df["tract"]

## Subset columns and rename 
census_tracts_geo = census_tracts_geo[["NTAName", "CDTANAME", "BoroCT2020", "geometry"]].rename(columns={"BoroCT2020":"tract_id"}).copy()

Merge geometry on Census data

In [122]:
census_tract_data = pd.merge(census_tracts_geo, us_census_df, on = "tract_id", how = "outer", indicator = True)

Sanity Check on merge

In [123]:
census_tract_data._merge.value_counts() # All tracts with demographic data are in the merge 

both          2324
left_only        1
right_only       0
Name: _merge, dtype: int64

Examine the tract that could not be merged

In [124]:
census_tract_data.query("_merge == 'left_only'")

Unnamed: 0,NTAName,CDTANAME,tract_id,geometry,NAME,total_pop,pct_white_one_race,pct_black_one_race,pct_asian_one_race,pct_hispanic_or_latino_any,state,county,tract,BOROUGH,borough_id,_merge
1391,Hoffman & Swinburne Islands,SI95 Great Kills Park-Fort Wadsworth (JIA 95 A...,5990100,"MULTIPOLYGON (((-74.05314 40.57771, -74.05406 ...",,,,,,,,,,,,left_only


`Hoffman & Swinburne Islands` are two small and uninhabited island off the coast of Staten Island. Thus, filtering these out will not affect the remaining demographic analysis.  

Keep only observations in both dataset 

In [125]:
census_tract_data = census_tract_data.query("_merge == 'both'")
print(f"The final shape of our third dataset is {census_tract_data.shape[0]} rows and {census_tract_data.shape[1]} columns")

The final shape of our third dataset is 2324 rows and 16 columns


We now have our dataset with racial characteristics of residents within New York City 2324 neighborhoods on US census tract level in 2021. This will be used to plot Figure 6 (and 7).

### **Thoughts on Dataset 3**
An important consideration with regard to dataset 3 is that the dataset consists of self-reported observations. When asking people questions regarding their racial attributes one should keep in mind two general weaknesses of surveys. The first weakness is an issue called differential item functioning, concerning whether the respondents have the same interpretation of the survey questions as other respondents or as us, the researchers using the survey data (Jæger 2006:62). Reportings on self-reported racial affiliation could be influenced by such mechanisms. The second weakness that could influence the dataset is social desirability bias. This implies that people choose the survey answer they think is the most desirable (Krumpal 2011:2026). This is especially a problem when asking people controversial questions regarding topics such as stereotypes, sexual preferences, extreme opinions or race. Lastly, the ACS allow respondents to choose multiple racial affiliations to accommodate for mixed-race citizens. Thus, the reported share can in principle sum to more than 100% within a neighborhood. 

The above considerations are intrinsic problems in surveys. Nevertheless, as the purpose of this project is not about strict statistical causality but rather descriptive story-telling, we trust in the ACS's ability to give an indicative impression of the residential distribution of racial groups within neighborhoods of New York City.


### **Summary of Basic Statistics**

In this section we have presented the construction of the three datasets used in our analysis. These are:
 1. [The Primary Land Use Tax Output (PLUTO)](https://www.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page) dataset, containing information about the residential versus commercial use of land lots in New York City since 1652. We use this dataset for Figure 1, Figure 2 and Figure 3 on the website. 
 2. [The Population](https://www.nyc.gov/site/planning/planning-level/nyc-population/historical-population.page) dataset, containing information about the development of New York City's total and foreign born population from 1900-2040 (including projections). We use this dataset for Figure 4 and Figure 5 on the website. 
 3. [The American Community Survey 5-Year Estimates](https://www.census.gov/data/developers/data-sets/acs-5year.html) dataset, containing information about racial characteristics of residents within New York City neighborhoods on US census tract level in 2021. We use this dataset for Figure 6 (and Figure 7).

# **3 Data Analysis <a id="Data_Analysis"></a>**

In this section we describe our data, and what we learned about the data and New York City. The full data story is on the website, in this section we only give the main parts of the analysis and what we have learned. We do this by commenting on each visualization. 

In <b>`Figure 1`</b>, we plot a temporal heatmap of the geographical development of New York City's constructional body, dating all the way from 1800 up until 2019. The red markers represent construction for <b>primarily residential use</b> while the blue markers indicate <b>commercial/industrial construction</b>. We have also outlined the five boroughs that jointly constitutes New York City: <b>Manhattan, Brooklyn, Bronx, Queens, and Staten Island</b>.
<br><br>
Activating both the residential- and commercial <i>"layer"</i> in the map and pressing play, one can see how New York City's journey from seed to apple started with residential- and industrial life making their way <b>from Manhattan out into Brooklyn</b> during the 19th century. Quite quickly, starting around 1900, the map becomes rather cluttered and at times one can hardly spot any of the underlying map simply due to the excessive degree of construction. Both residential- and industrial life spread rapidly <b>out into Queens, Bronx, and Staten Island</b> throughout the early 20th century, ultimately taking up nearly all available space in all five boroughs. 

<b>`Figure 2`</b> displays the accumulated number of square feet constructed within each decade between 1800-2020, with all construction before 1900 grouped together. 

The relation between residential and commercial construction seems to be quite steady over time, with the highest share of commercial construction during the 1990s where approximately half of all construction was meant for commercial use. In the opposite end, the 1940s have the highest share of residential construction compared to commercial. 

The immediatly most notable decade with regard to overall construction is the <b>1920s</b> - a decade appropriately known as "the Roaring Twenties" or <a href="https://www.encyclopedia.com/history/encyclopedias-almanacs-transcripts-and-maps/prosperity-decade-1921-1928-overview" target="_blank">"the Prosperity Decade"</a>. More than 800 million square feet (roughly 74 square kilometers) of building space was constructed in New York City during that time. Nearing the end of a decade characterised by wealth, progression, and prosperity in the wake of World War I, the Roaring Twenties concluded with <a href="https://en.wikipedia.org/wiki/Wall_Street_Crash_of_1929" target="_blank">the Wall Street Crash in October of 1929</a>, hurling not just New York City's economy, but the global economy as a whole into an economic depression called <a href="https://en.wikipedia.org/wiki/Great_Depression" target="_blank">the Great Depression</a>. Being a time of recession, this might be part of the explanation for the relatively lower degree of construction in the <b>1930s</b> compared to its preceding decade. Looking at the <b>1940s</b> (one of the least constructional decades in 200 years) and the noticeably lower amount of construction compared to its pre- and succeeding decades, it seemes obvious to mention World War II from 1939-1945 as the main reason behind this decrease. Moving on to the <b>1950s and 1960s</b>, both decades are rather productive with regard to construction in New York City. Business was once again booming during the <a href="https://www.localize.city/blog/what-are-the-most-diverse-neighborhoods-in-nyc/" target="_blank">"the prosperous 50s"</a> and <a href="https://www.venstre.dk/partiet/skoleweb/politisk-historie/velfaerd-og-vaelgerskred/de-glade-tressere" target="_blank">"the happy 60s"</a> (a danish nickname for the decade (Venstre 2023)), and both periods are characterised by economic growth. Old habits die hard and - once again - a period of growth is followed by a period of recession. From a constructional standpoint, the <b>1970s, 1980s, and 1990s</b> stand out as being among the least productive in shaping the physical fabric of New York City - especially with regard to residential construction. From a broader, historical standpoint, the 1970s hosted a <a href="https://en.wikipedia.org/wiki/1970s_energy_crisis" target="_blank"> global energy crisis</a> while both the 1970s and 1980s mark a particularly challenging period in New York City's history with regard to social challenges and crime prevention (Sterbenz 2013). The resemblance between New York City and Batman's crime-ridden city of Gotham does not come out of nothing. However, New York City was not the only metropolis with challenges and almost every larger cities during this period fared even worse than New York City (Glaeser 2005:20).  
                
All in all, the constructional history of New York City tells a story about the world around it. Economic prosperity/depression, social tensions, energy crises, and world wars are all manifested in Figure 2. Now, let's nuance the picture even further and examine the history of residential- and commercial construction within each of the five boroughs of New York City. 

<b>`Figure 3`</b> presents the accumulated number of square feet constructed within each decade - just like Figure 2 - but subdivided into the boroughs of New York City. 
<br>
    <p>
<b>Brooklyn and Manhattan</b> stand out as the boroughs with the highest amount of construction during a single decade. For Manhattan, that decade is unsurprisingly the Roaring Twenties, while Brooklyn actually peaks during the Great Depression of the 1930s. Compared to the other boroughs, commercial construction has clearly been prioritized in <b>Manhattan</b> over the years. Today, Midtown Manhattan houses New York City's <a href="https://en.wikipedia.org/wiki/Central_business_district" target="_blank">Central Business District</a> (CBD), which also happens to be the largest CBD in the world. Based on Figure 3, the foundation for this epicenter of capitalism was primarily laid during the 1920s, a period defined by commercial transformation in Manhattan (Miller 2014).

From Figure 3, we find that not all boroughs are effected equally by the abovementioned social, economic, and political events - or at least that the events spark different reactions. For instance, while both Brooklyn, Manhattan, Bronx, and - to some extent - Staten Island all seem to be noticably effected by World War II during the 1940s, <b>Queens</b> retains a rather high degree of residential construction during this time period.  

Lastly, it is apparent that <b>Staten Island</b> has considerably fewer constructed square feet compared to the other boroughs. Contrary to the others, Staten Island experienced its most constructive decades during the later half of the 20th century. This could probably be related to the fact that other boroughs are getting more and more cramped for space, while Staten Island becomes more infrastructurally connected with the rest of New York City. For instance, the renowned <a href="https://en.wikipedia.org/wiki/Verrazzano-Narrows_Bridge" target="_blank">Verrazzano-Narrows Bridge</a> was built between Brooklyn and Staten Island in 1964, allowing for easier commuting between Staten Island and the many square feet of commercial space in Manhattan - maybe. This highlights the importance of critical infrastructure in managing urban space and opening up new areas for residential inflow. 

<b>`Figure 4`</b> displays the development in the population of New York City from 1900 to 2020 within each of the five boroughs and in New York City in total. Population projections for the estimated population in 2030 and 2040 are presented with dotted lines.

In Figure 4, we see that the population of New York City grew most rapidly during the early 20th century, roughly <b>doubling in size from 1900 to 1950</b>. Subsequently, however, the population growth staggered and even <b>decreased between 1970 and 1980</b>. Since then, the overall population has been steadily increasing for each decade, and based on the population projection, this growth will continue in the coming decades. 
        <br><br>
Looking at the individual boroughs, <b>Brooklyn and Queens</b> have undergone the most significant population growth, with Brooklyn overtaking Manhattan as the most populous borough during the 1920s, and Queens clocking in on second place sometime during the 1950s. Recalling <b>Figure 3</b>, we learned that construction in Brooklyn and Queens was mostly focused on residential buildings, whereas Manhattan to a higher degree prioritized commerical construction. We now see, that while <b>Manhattan</b> started off as the most populous borough in 1900 - housing nearly half of New York City's population at the time - it's population has steadily decreased throughout the 20th century and will continue to do so judging bý the population projections. Soon, <b>Bronx</b> will probably also overtake Manhattan. The decreasing population in Manhattan is most likely a direct result of the borough's evolution into being the <a href="https://en.wikipedia.org/wiki/Central_business_district" target="_blank">Central Business District</a> of the world, thus <i>pushing</i> citizens out into the other boroughs. 

Furthermore, we also recall from Figure 3 that construction generally descreased significantly during the 1940s in all boroughs aside from <b>Queens</b>. In Figure 4, we see that while Brooklyn, Manhattan, Bronx, and Staten Island hardly had any population inflow, the population in Queens actually increased quite noticably during this time. We also see, that the decrease in New York City's overall population during the challenging 1970s is primarily caused by a decrease in the population in Brooklyn and Bronx, as well as a stagnation in Queens's population growth. Though Glaeser (2003) argues that New York City did relatively well during this period of time, living in New York City in the 1970s was not <i>"all beer and skittles"</i>. To make matters worse, the already ongoing <a href="https://en.wikipedia.org/wiki/History_of_New_York_City_(1946%E2%80%931977)" target="_blank">fiscal crisis</a> was amplified by a large movement of middle-class residents out of the city into the suburbs, leaving a hole in the city's tax revenue reserve. This movement is apparent in Figure 4. 

Also in <b>Staten Island</b>, we see that the population is tightly linked to the constructional activity of the borough. Staten Island has a relatively small population compared to the other boroughs, but is steadily increasing its population - mostly so during the later half of the 20th century. As mentioned in Figure 3, this is probably related to the construction of critical infrastructure, such as bridges, connectiong Staten Island to the rest of New York City. 

<b>`Figure 5`</b> consists of two parts; On top, it presents the foreign born population of New York City and its boroughs from 1900 to 2010. Below, the share of foreign born relative to the total population is plotted. 

In <b>Figure 5</b>, it is apparant that especially <b>Manhattan</b> consisted of a rather large share of foreign borns at the entrance to the 20th century, with almost half (corresponding to roughly 1 million people) of Manhattan's population being born in a different country in <b>1910</b>. However, throughout the following decades, all the way up <b>until 1970</b>, Manhattan has seen a constant decrease both in the total number of foreign borns and their relative share of the total population. In <b>1970</b>, only about 20% - or 500.000 people - of Manhattans population were foreign born. In fact, all of the boroughs - aside from Queens - saw a decrease in both the total and relative number of foreign borns from around <b>1930 to 1970</b> - recall from <b>Figure 4</b> that New York City generally experienced a population growth during this period. We now know that this growth cannot be attributed to an inflow of foreign born migration.  The drop in both the total number and the share indicate that the decrease in the foreign born population is not due to a general migration out of the boroughs of New York City, but rather because specifically the foreign born population either moves out or - well - dies. In <b>1921</b>, the United States enforced a new immigration policy called the <a href="https://en.wikipedia.org/wiki/Emergency_Quota_Act" target="_blank">Emergency Quota Act</a>, which <b>restricted immigration quite heavily</b> based on the nationalities of those already residing in the US. The Emergency Quota Act is a central part of US immigration history and is inevitably reflected in Figure 5. 

From <b>1970 and onward</b>, all boroughs have undergone a steady increase in foreign born residents - both in total and in the share of the population. In 1965, the Immigration and Nationality Act of 1965 was passed, abolishing the Emergency Quota Act of 1921 in favor of a less restrictive - and less discriminating - immigration policy (Cohn 2015). The effects of this law are obvious in Figure 5. The increasing foreign born population could also have something to do with the abovementioned movement of middle-class residents out of the city and into the suburbs, leaving space for foreign born residents to take up. Especially <b>Queens</b> have had a noticable inflow of foreign born residents. Almost inversely mirroring the development in Manhattan, Queens have gone from housing hardly any foreign born residents in 1900, to having roughly 1 million foreign born inhabitants in 2010, constituting nearly half of the boroughs population. 

<b>`Figure 6`</b> show the respective share of white-, black-, asian- and hispanic/latino residents within each <a href="https://en.wikipedia.org/wiki/Census_tract#United_States" target="_blank">neighborhood</a> of New York City. <i><b>Switch between racial groups using the drop-down menu</b></i> in the plot. The data is based on self-reported racial affiliation in the <a href="https://www.census.gov/programs-surveys/acs" target="_blank">American Community Survey</a> where respondents are able to choose multiple races to accommodate for mixed-race citizens. Thus, the reported share can in principle sum to more than 100 within a neighborhood.

Overall, <b>Figure 6</b> shows that the neighborhoods of New York City are highly segregated with regard to racial characteristics. Some neighborhoods are almost inhabited entirely by residents belonging to the same single racial group. <b>The white population</b> is the most prevalent in New York City, mainly clustered on Manhattan, Brooklyn and Staten Island. <b>The asian population</b> seem to live rather close to the white population. Notice for instance the small enclave in the southern tip of Manhattan (Chinatown) or the one in the western part of Brooklyn. Other than that, the asian population is mostly concentrated in the northern part of Queens. <b>The black population</b> is more prevalent than the asian population and is mainly concentrated over two larger areas in the southern parts of Brooklyn and Queens. Bronx also houses a noticable share of the black population. This borough, however, is predominantly inhabited by <b>the hispanic/latino population</b>, who also appear to have a community in the northern part of Queens. For a side-by-side comparison of the distribution of all four racial groups, click <a href=" https://raw.githubusercontent.com/EsbenBL/EsbenBL.github.io/main/assets/exam_assets/race_dist_by_tract.png" target="_blank">this link</a>.

Zooming in on a particular neighborhood, Jackson Heights in the north-western part of Queens has been described as an ideal example of the diversified <i>"melting pot"</i> of people and cultures that New York City is often praised for being (localize 2020). The neighborhood has a high share of foreign born residents, 100+ languages spoken, and allegedly affordable housing prices - at least compared to Manhattan. In Figure 6, we see that Jackson Heights does contain a rather diverse mix of white-, asian- and hispanic/latino residents. However, the neighborhood only have a modest share of black citizens, ranging between 1 and 3 percent.

Maybe the idea of New York City as a highly diversified melting pot is not exactly the most fitting concept in relation to contemporary residential patterns of the different racial groups in New York City - definitely not generally speaking. Rather, Figure 6 displays a city characterized by a rather remarkable degree of racial segregation, with various enclaved communities scattered across the city and a noticable predomination of white residents on Staten Island and Manhattan. Recalling the question posed in the introduction to this section, a general characteristic of the contemporary Manhattanite would be that he/she is white...generally. Nearing the end of our data driven story about New York City's <i>"triumphant"</i> journey from seed to apple, we invite the reader to interact with Figure 6, exploring specific neighborhoods and their racial composition on their own.  

`Concluding Thoughts and What's Next`

So here we are, at the end of our tour de force through New York City's journey from small seed to Big Apple. What have we learned? New York City's urbanizational triumph over the past 200 years? 

The buildings in New York City tells a story about economic prosperity/depression, social tensions, energy crises, and war (figure 2). We the the clear and influential consequences of national immigration legislation on the population of the so-called "city built by immigrant".

An exaustive mapping of all the numerous factors that have played a part in creating New York City is too extensive and complex for this format. We have some ideas though, on where one could continue.

For further investigation of how New York City came to be a renowned metropolis, we find ti interesting to look at the social consequences of the construction of critical infrastructure over time. We have already touched upon it a bit with the construction of Verrazzano-Narrows Bridge or the story of overpasses unfit for busses, but it could be interesting to dive even deeper into how single structure reflects the social climate of a given time. 
Another point of departure for future examinations, could be to look at the development in housing prices, and the social concequences thereof.

# **4 Genre <a id="Genre"></a>**

When constructing a narrative and visualization-based story it is essential to have a well thought out plan, and to do this we take inspiration from Segel and Heer (2010). Segel & Heer temselves are inspired by the work of artists and journalists as to further their own understanding of narrative visualizations and storytelling through visualizations (Segel & Heer 2010:2). Segel & Heers paper is mostly focused on stories told through a single visualization, it is a bit different from this project as we use multiple figures spanning multiple genres in this project. But if one were to use Seges & Heers terminology for the whole project, we would be closest to the magazine visualization genre, because we have chosen to use quite a lot of text and still figures (Segel & Heer 2010:7). As we are seeking to make a not too uptight but still linear and scientific data blogpost we concluded that the magazine genre presented us with the clearest way to formulate a coherent scientific argument using our visualizations. On a final keynote we also take to heart one of their key findings regarding narrative structure and messaging, they state that a pattern in the articles which shows an under-utilization of common narrative messaging techniques, such as commentaries, repetition, multimessaging, and annotations to emphasize key observations (Segel & Heer 2010:8). They hypothesize that this under-utilization makes the visualizations feel more like a “story” and less like a data tool. Keeping this in mind we have tried to walking a fine line, as to keep the website understandable and short, while also thouroughly explaining key points and findings. In the following we go through the figures, and give our thoughts regarding the visual and narrative structure of each.

`Figure 1` is an interactive plot, using Segel & Heer terms one would categorize this figure as within the film/video/animation genre, with some annotated graph/map elements. Its visual narrative elements are a timebar which helps the reader to understand the overall historical angle of this project. Figure 1 has multiple interactive elements for the reader: zooming, motion, an interactive timebar, and feature distinction. The narrative structure of this visualization is linear, but it still has some elements which present an open exploration for the reader, as the figure has the possibility for the reader to look more into their own areas and timeslots of interest.  

`Figure 2, 3, 4, and 5` are all stills and within the magazine genre. They all have quite similar structured visual narratives, as we present more concrete numbers and findings for the reader. This results in a bit of a heavier narrative structure, but we tried to take measures as to keep it as accessible as possible for the readers: One way to we do thus is by keeping the timebar to help the overall visual narrative of the article as a project which investigate the historical development of NYC. Another way we also seek to make it accessible is by using familiar objects between the graphs, so figure 2 & 3 share characteristics and figure 4 & 5 share characteristics. We also keep the different boroughs the same color in the whole project, so it is easier to recognize for the reader.  

`Figure 6` is an interactable map, which we choose because we after the broader and more linear narrative structure of the last five figures. We now want to open data story a  the open of the by giving the reader the opportunity to look more into what they find interesting. Therefore we also introduce some of the interactivity options in narrative structures defined by Hegel & Seer, such as hover highlighting and filtering. 

`Figure 7` is a visualization consisting of four still maps and within the magazine genre. We have chosen not to directly place within the data story, but instead as an ekstra if the reader wish to compare the four maps of racial beside each other. We have chosen to keep this figure simple with a minimal amount of interaction, as this figures purpose is the overall perspective of the highly segregated areas of NYC and the five boroughs. 

# **5 Visualizations<a id="Visualizations"></a>**

In this section, we plot and explain the seven visualizations we have chosen to include in our data story. For each visualization, we start by presenting the code needed for the given vizualization, which naturally concludes with plotting the visualization. Then, we present our reasoning behind our choice of visualization and its contribution to the overall narrative. 

### **Visualization 1: The Temporal and Geographical Development of New York City's Residential and Commercial Body<a id="Figur_1"></a>**

First, we define a function that creates a colormap that is compatible with Plotly's `HeatMapWithTime`. Our goal with this, is to distinguish between residential and commercial/industrial construction using colors.

In [None]:
def color_gradient_heatmap(hexcode, n_gradients):
    ''' Create colormap from 'White' to the specified hexcode with n_gradients ''' 
    _cmap = mcolors.LinearSegmentedColormap.from_list(
    "Custom",["#FFFFFF", hexcode], N=n_gradients
    )

    gradient_dict = {i/n_gradients:mcolors.rgb2hex(_cmap(i)) for i in range(n_gradients+1)}

    return gradient_dict

As the `HeatMapWithTime` module does not have an inbuilt method for distinguishing between two cateogries/groups of datapoints (residential vs. non-residential) we have to tweak the module a bit. Below, we define the class `HeatMapWithTimeAdditional` which allows us to add an extra layer in the `HeatMapWithTime`, where both the main and additional layer share the same control bar / slider. Thanks to @Conengmo for providing a solution to this, in the following github-issue: https://github.com/python-visualization/folium/issues/1062

In [None]:
class HeatMapWithTimeAdditional(Layer):
    _template = Template("""
        {% macro script(this, kwargs) %}
            var {{this.get_name()}} = new TDHeatmap({{ this.data }},
                {heatmapOptions: {
                    radius: {{this.radius}},
                    minOpacity: {{this.min_opacity}},
                    maxOpacity: {{this.max_opacity}},
                    scaleRadius: {{this.scale_radius}},
                    useLocalExtrema: {{this.use_local_extrema}},
                    defaultWeight: 1,
                    {% if this.gradient %}gradient: {{ this.gradient }}{% endif %}
                }
            }).addTo({{ this._parent.get_name() }});
        {% endmacro %}
    """)

    def __init__(self, data, name=None, radius=15,
                 min_opacity=0, max_opacity=0.6,
                 scale_radius=False, gradient=None, use_local_extrema=False,
                 overlay=True, control=True, show=True):
        super(HeatMapWithTimeAdditional, self).__init__(
            name=name, overlay=overlay, control=control, show=show
        )
        self._name = 'HeatMap'
        self.data = data

        # Heatmap settings.
        self.radius = radius
        self.min_opacity = min_opacity
        self.max_opacity = max_opacity
        self.scale_radius = 'true' if scale_radius else 'false'
        self.use_local_extrema = 'true' if use_local_extrema else 'false'
        self.gradient = gradient

The tweaking is not over yet, though. Aside from `HeatMapWithTimeAdditional`, we also need to do some HTML/CSS tweaking to add a legend to the plot. Thanks to the Tile Mile Documentation here: https://tilemill-project.github.io/tilemill/docs/guides/advanced-legends/

In [None]:
legend_template = """
{% macro html(this, kwargs) %}

<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>jQuery UI Draggable - Default functionality</title>
  <link rel="stylesheet" href="//code.jquery.com/ui/1.12.1/themes/base/jquery-ui.css">

  <script src="https://code.jquery.com/jquery-1.12.4.js"></script>
  <script src="https://code.jquery.com/ui/1.12.1/jquery-ui.js"></script>
  
  <script>
  $( function() {
    $( "#maplegend" ).draggable({
                    start: function (event, ui) {
                        $(this).css({
                            right: "auto",
                            top: "auto",
                            bottom: "auto"
                        });
                    }
                });
});

  </script>
</head>
<body>

 
<div id='maplegend' class='maplegend' 
    style='position: absolute; z-index:9999; border:2px solid grey; background-color:rgba(255, 255, 255, 0.8);
     border-radius:6px; padding: 10px; font-size:14px; left: 730px; top: 430px;'>
     
<div class='legend-title'>Primary Lot Use</div>
<div class='legend-scale'>
  <ul class='legend-labels'>
    <li><span style='background:#FF2B2B;opacity:0.7;height:15px;width:15px;border-radius:50%';display:block;float:left;align-items:center;></span>Residential Construction</li>
    <li><span style='background:#0097FF;opacity:0.7;height:15px;width:15px;border-radius:50%';display:block;float:left;align-items:center;></span>Commercial/Industrial Construction</li>
    <li><strong>To Start:</strong> Turn on the layers you want to display through <br> the layer-control in the top right corner and press play to<br>initiate the temporal animation.</li>
  </ul>
</div>
</div>
 
</body>
</html>

<style type='text/css'>
  .maplegend .legend-title {
    text-align: left;
    margin-bottom: 5px;
    font-weight: bold;
    font-size: 90%;
    }
  .maplegend .legend-scale ul {
    margin: 0;
    margin-bottom: 5px;
    padding: 0;
    float: left;
    list-style: none;
    }
  .maplegend .legend-scale ul li {
    font-size: 80%;
    list-style: none;
    margin-left: 0;
    line-height: 18px;
    margin-bottom: 2px;
    }
  .maplegend ul.legend-labels li span {
    display: block;
    float: left;
    height: 16px;
    width: 30px;
    margin-right: 5px;
    margin-left: 0;
    border: 1px solid #999;
    }
  .maplegend .legend-source {
    font-size: 80%;
    color: #777;
    clear: both;
    }
  .maplegend a {
    color: #777;
    }
</style>
{% endmacro %}"""

Finally, we are mostly setup. Below, we plot the geographical and temporal development of New York City's residential and commercial/industrial body. The red markers represent residential construction while the blue markers are commercial/industrial construction. We have also outlined the fvie boroughs of New York City: Manhattan, Brooklyn, Bronx, Queens, and Staten Island. 

**Please Note**: Per default, the plot is paused and the `layers` controlling whether the residential and/or commercial construction are visible are both turned off. Turn the layers on through the layer-control in the top right corner and press play to initiate the temporal animation. The division into `layers` allows the reader to explore either the geographical construction of residential or commerical buildings over time either respectively or combined.

In [None]:
## Params 
radius = 5
opacity = 0.8
blur = 0.8
show = False
overlay = True
width = 1050
height = 600

# Base fig
fig = Figure(width=width, height=height)

# Create NYC map 
NY_coord = [40.690610, -73.935242]
NY_map = folium.Map(NY_coord,
                    width=width, 
                    height=height,
                    tiles =None,
                    zoom_start = 10,
                    min_zoom = 10,
                    max_lat = 41,
                    min_lat = 40.3,
                    max_lon = -73.3,
                    min_lon = -74.5,
                    max_bounds = True)

# Name base layer
folium.TileLayer('cartodbpositron', name='NYC Buildings').add_to(NY_map)

# # Cmap for commercial buildings 
commercial_gradient = color_gradient_heatmap("#0097FF", 100)
residential_gradient = color_gradient_heatmap("#FF2B2B", 100)

## Residential Heatmap
plugins.HeatMapWithTime(residential_list_of_lists_with_coordinates,
                        index = [f"Years: {year}-{year+4}" for year in data_range],
                        auto_play = False,
                        max_opacity=opacity,
                        radius = radius,
                        blur = blur,
                        overlay = overlay,
                        show = show,
                        name = "Residential Buildings",
                        gradient = residential_gradient,
                        ).add_to(NY_map)

## Add Commercial Heatmap 
HeatMapWithTimeAdditional(commercial_list_of_lists_with_coordinates,
                            radius = radius,
                            max_opacity=1,
                            overlay = overlay,
                            show = show, 
                            name = "Commerical Buildings",
                            gradient = commercial_gradient).add_to(NY_map)



# # add neighborhoods
folium.GeoJson(new_york_boroughs_map[["geometry", "name"]],
                name = "Borough",
                tooltip=folium.GeoJsonTooltip(fields=['name'], aliases=['Borough:']),
                style_function =  lambda x: {"fillColor":"#18B406" if x["properties"]["name"]=="Staten Island" else \
                                                            "#A15C03" if x["properties"]["name"]=="Queens" else \
                                                            "#FFAA00" if x["properties"]["name"]=="Brooklyn" else \
                                                            "#FF0000" if x["properties"]["name"]=="Manhattan" else \
                                                            "#2BAAFF" if x["properties"]["name"]=="Bronx" else "",
                                                            "color":"black",
                                                            "weight":0.5}).add_to(NY_map)

folium.LayerControl(collapsed=False).add_to(NY_map)


# add map to fig 
fig.add_child(NY_map)


# Add legend from template
macro = MacroElement()
macro._template = Template(legend_template)
fig.get_root().add_child(macro)

# Show plot 
fig

In [None]:
fig.save("Plots/construction_heatmap.html") # Saving the figure for use on the website

### **Thoughts on Visualization 1**
The heatmap displays the geographical distribution of residential vs. commercial/industrial construction over time, which is meant to give a historical introduction to how New York City has developed physically over the past 200-odd years. Furthermore, it provides an overall geographical understanding of New York City and the positional relation between its five boroughs. Again, **remember turn the layers on** through the layer-control in the top right corner and press play to initiate the animation.

From a more meta-plot perspective, we are using two different types of encodings, namely `position` where longitudinal and latitudinal coordinates are used to locate the building lots on a map, and `color` to distinguish between the two categories, residential and commercial/industrial construction. We are aware that these encodings, especially color, are not best suited for quantitative and comparative impressions of the data. However, for geographical exploration and for overall introductory purposes, we find the temporal heatmap to be really well suited.

Activating both the residential- and commercial-layer and pressing play, one can see how New York City's journey from seed to apple started with residential and industrial life making their way from Manhattan out into Brooklyn during the 19th century, rapidly spreading throughout the early 20th century, ultimately taking up all available space in the five boroughs. Quite quickly, starting around 1900, the heatmap becomes rather cluttered and at times one can hardly spot any of the underlying map simply due to the excessive degree of construction. The plot is interactive, inviting the reader to zoom in and explore the development in areas of their own interest, switching between residential and commercial construction (or both jointly) in varies periods of time.

One has to keep in mind the limitations of this dataset (See Section: [*Dataset 1: Land Use in New York*](#Dataset_of_the_buildings)), the data for Figure 1 has some uncertainty associated with the apparent "boom" in construction starting in 1900 - either because of faulty records prior to this point, or because uncertainty regarding build year was handled by recording the construction year as some-round-number e.g. 1900.


### **Figure 2: The Amount of Construction Per Decade<a id="Figur_2"></a>**

In [None]:
fig, ax  = plt.subplots(figsize = (10,6))
fig.patch.set_facecolor("#FFF6E9")

residential_squarefeet_by_decade = land_use_dataaset_trimmed.groupby("yearbuilt_intervals").resarea.sum().sort_index()
commercial_squarefeet_by_decade = land_use_dataaset_trimmed.groupby("yearbuilt_intervals").comarea.sum().sort_index()

residential_squarefeet_by_decade.plot.bar(ax=ax, color = "#E9655C",  alpha = 0.8, width = 0.8)
commercial_squarefeet_by_decade.plot.bar(ax=ax, bottom = residential_squarefeet_by_decade, alpha = 0.8, color = "skyblue", width = 0.8)

ax.yaxis.get_offset_text().set_visible(False) # Removes the scientific notation
#ax.set_title("Constructed Square Feet by Decade", size = 20)
ax.set_ylabel("Square Feet (in hundred millions)", size = 20)
ax.set_xlabel("", size = 20)
ax.legend(["Residential", "Commercial"],loc='upper center', 
          bbox_to_anchor=(0.5, -0.15),
          ncol = 2,
          fontsize="xx-large")

plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.savefig("Plots/squarefeet_by_decade.png", dpi = 300, bbox_inches = "tight")

plt.tight_layout()
plt.show()

### **Thoughts on Figure 2**

Having provided a geographical introduction to the physical construction of New York City over time in Figure 1, we now want to get a more quantified and comperative understanding of how New York City grew to look as we know it today. Talking about encodings, we therefore switch to `length` rather than `position` in Figure 2, visualizing the sum of square feet constructed within each decade in a stacked barplot distinguishing between residential and commercial construction. We stack the bars to make it easier to compare the total amount of construction between the decades. We are still in the beginning of our data narrative, wherefore the scope of the plot is still rather broad, i.e. spanning 100+ years and not distinguishing between boroughs, neighborhoods or census tracts.  

### **Figure 3: Decadal Construction Within the Boroughs**

We start by making a dictionary of the different colours for the boroughs. This is also to make sure the boroughs have the same colours as in Figure 1 throughout the project. 

In [None]:
BOROUGH_COLORS = {"Bronx":"#2BAAFF",
                  "Brooklyn":"#FFAA00",
                  "Manhattan":"#FF0000",
                  "Queens":"#A15C03",
                  "Staten Island":"#18B406"}

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, axs = plt.subplots(nrows=1, ncols=5, figsize=(20, 5), sharey=True)
fig.patch.set_facecolor("#FFF6E9")
#fig.suptitle('Constructed Square Feet by Decade and Borough', fontsize=22, y=1.05)

categories = land_use_dataaset_trimmed['borough'].unique()

for i, category in enumerate(categories):
    ax = axs[i]
 
    residential_squarefeet_by_decade = land_use_dataaset_trimmed[land_use_dataaset_trimmed['borough'] == category].groupby("yearbuilt_intervals").resarea.sum().sort_index()
    commercial_squarefeet_by_decade = land_use_dataaset_trimmed[land_use_dataaset_trimmed['borough'] == category].groupby("yearbuilt_intervals").comarea.sum().sort_index()

    ax.bar(residential_squarefeet_by_decade.index, residential_squarefeet_by_decade.values, color="#E9655C", alpha=0.8, width=0.8)
    ax.bar(commercial_squarefeet_by_decade.index, commercial_squarefeet_by_decade.values, bottom=residential_squarefeet_by_decade.values, color="skyblue", alpha=0.8, width=0.8)
    ax.yaxis.get_offset_text().set_visible(False) # Removes the scientific notation

    ax.set_title(category, color = BOROUGH_COLORS[category], size=25)
    ax.set_xlabel("")
    ax.set_xticks(np.arange(0, len(residential_squarefeet_by_decade.index), 1))
    ax.set_xticklabels(residential_squarefeet_by_decade.index, rotation=90, fontsize=15)

    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)


    if i==0:
        ax.set_ylabel("Square Feet (in hundred million)", size=20)
        ax.set_yticks(ticks = ax.get_yticks(), labels = ax.get_yticks()/100_000_000, size=15)
    else:
        ax.tick_params(
            axis='y',      
            which='both',      # both major and minor ticks are affected
            left=False,       # ticks along the top edge are doff
            labelright=False  # labels along the bottom edge are off)
        )

fig.legend(["Residential", "Commercial"], loc="upper center", ncol=2, fontsize="xx-large", bbox_to_anchor=(0.5, 0))

plt.tight_layout()

plt.savefig("Plots/squarefeet_by_borough.png", dpi = 300, bbox_inches = "tight")
plt.show()


### **Thoughts on Figure 3**
Recalling the geographical awareness that the reader gained in Figure 1 and the general impression of New York City's physical evolution over the decades in Figure 2, we now go a step deeper in Figure 3. Here, we examine the constructional development of each of the five boroughs in New York City over time. We use the same stacked barplot presentation as in Figure 2 to help the reader intuitively understand that we are still analysing the same topic of residential/commercial land lot use. Furthermore, the five boroughs share the same y-axis, making them more visually comparable.

### **Figure 4: The Population of New York City and Its Boroughs<a id="Figur_4"></a>**

In [None]:
fig, ax = plt.subplots(figsize = (10,6))
fig.patch.set_facecolor("#FFF6E9")

colors = ["#000000", "#2BAAFF", "#FFAA00", "#FF0000", "#A15C03", "#18B406"] # The colors we use for each borough

population_by_borough.plot(color= colors, ax=ax, legend = False, alpha = 0.7)
NYC_pop_projection.plot(color = colors, style="--", legend = False, ax=ax, alpha = 0.7)

#ax.set_title("Population by Borough", size = 20)
ax.set_ylabel("Population (in millions)", size = 18)
ax.set_xlabel("Year", size = 15)
plt.xticks(list(population_by_borough.index) + [2030, 2040], fontsize=15)
plt.yticks(fontsize=15)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.axvline(2020, linestyle = "--", linewidth = 1, c="gray", alpha = 0.5)

# Change y-ticks
ax.set_yticklabels([int(tick/1000000) for tick in ax.get_yticks()])

# Annotate
ax.text(2040, NYC_pop_projection.loc[2040,'New York City'], 'New York\n     City', size=12, color="#000000")
ax.text(2041, NYC_pop_projection.loc[2040,"Bronx"]-150_000, 'Bronx', size=12, color="#2BAAFF")
ax.text(2041, NYC_pop_projection.loc[2040,"Brooklyn"], 'Brooklyn', size=12, color="#FFAA00")
ax.text(2041, NYC_pop_projection.loc[2040,"Manhattan"]+150_000, 'Manhattan', size=12, color="#FF0000")
ax.text(2041, NYC_pop_projection.loc[2040,"Queens"], 'Queens', size=12, color="#A15C03")
ax.text(2041, NYC_pop_projection.loc[2040,"Staten Island"]-200_000, 'Staten\nIsland', size=12, color="#18B406")
# Text and Arrow for population projection 
ax.text(2023, 5_000_000, 'Population\nprojection', size=12, color='gray')
ax.annotate("", xy=(2038, 4_700_000), xytext=(2022.5, 4_700_000),
            arrowprops=dict(arrowstyle="->", color ="gray", alpha=0.7))

plt.savefig("Plots/population_by_borough.png", dpi = 300, bbox_inches = "tight")

plt.tight_layout()
plt.show()

### **Thoughts on Figure 4**

Having examined the constructional evolution of New York City and it's five boroughs from a residential versus commercial perspective, `Figure 4` turn to the development in inhabitants to gain a better understanding of the people who turned the city from a British settlement into the *Capital of the World*. The insights from `Figure 1-3` will be actively included to attain deeper knowledge about how New York City's inhabitants are affected by the physical construction of the city, or maybe vice versa. For instance, what happens when a city prioritizes to building commercial structures rather than residential, as is the case with Manhattan. We plot a simple lineplot over the developments in inhabitants from 1900 to 2020 within each of the 5 boroughs and in total. We have also visualised population projections for the estimated population in 2030 and 2040 with dotted lines, to make it easier for the reader to dicern between observed and estimated data. 

### **Figure 5: New York City's Foreign Born Population<a id="Figur_5"></a>**

In [None]:
import matplotlib.pyplot as plt

fig, axs = plt.subplots(nrows=2, ncols=1, figsize=(10, 10))
fig.patch.set_facecolor("#FFF6E9")

#fig.suptitle("Foreign Born Population by Borough from 1900-2010", size=20)
## Foreign born population
colors = ["#000000", "#2BAAFF", "#FFAA00", "#FF0000", "#A15C03", "#18B406"]
foreign_born_pop.plot(color=colors, ax=axs[0], legend=False, alpha=0.7)
axs[0].set_ylabel("Foreign Born Population (in millions)", size=18)
axs[0].set_xticks(share_of_foreign_born_by_decade.index)
axs[0].set_xlabel("", size=15)
axs[0].yaxis.get_offset_text().set_visible(False) # Removes the scientific notation
#axs[0].set_yticklabels(['%.1f' % float(tick/1000000)+ ' mil.' for tick in axs[0].get_yticks()])
axs[0].set_xticklabels("")
axs[0].tick_params(axis='both', which='major', labelsize=15)
axs[0].spines['top'].set_visible(False)
axs[0].spines['right'].set_visible(False)

# Annotate
axs[0].text(2011, foreign_born_pop.loc[2010,'New York City'], 'New York\n     City', size=12, color=colors[0])
axs[0].text(2011, foreign_born_pop.loc[2010,"Bronx"]+50_000, 'Bronx', size=12, color=colors[1])
axs[0].text(2011, foreign_born_pop.loc[2010,"Brooklyn"]-30_000, 'Brooklyn', size=12, color=colors[2])
axs[0].text(2011, foreign_born_pop.loc[2010,"Manhattan"]-50_000, 'Manhattan', size=12, color=colors[3])
axs[0].text(2011, foreign_born_pop.loc[2010,"Queens"], 'Queens', size=12, color=colors[4])
axs[0].text(2011, foreign_born_pop.loc[2010,"Staten Island"]-50_000, 'Staten\nIsland', size=12, color=colors[5])

## Share of foreignborn in population
colors = ["#2BAAFF", "#FFAA00", "#FF0000", "#A15C03", "#18B406"]
share_of_foreign_born_by_decade.plot(color=colors, ax=axs[1], legend=False, alpha=0.7)
axs[1].set_title("", size=20)
axs[1].set_ylabel("Share of Foreign Born Population (%)", size=18)
axs[1].set_xlabel("Year", size=15)
axs[1].set_xticks(share_of_foreign_born_by_decade.index)
axs[1].tick_params(axis='both', which='major', labelsize=15)
axs[1].spines['top'].set_visible(False)
axs[1].spines['right'].set_visible(False)

# Annotate
axs[1].text(2011, share_of_foreign_born_by_decade.loc[2010,"Bronx"], 'Bronx', size=12, color=colors[1-1])
axs[1].text(2011, share_of_foreign_born_by_decade.loc[2010,"Brooklyn"], 'Brooklyn', size=12, color=colors[2-1])
axs[1].text(2011, share_of_foreign_born_by_decade.loc[2010,"Manhattan"], 'Manhattan', size=12, color=colors[3-1])
axs[1].text(2011, share_of_foreign_born_by_decade.loc[2010,"Queens"], 'Queens', size=12, color=colors[4-1])
axs[1].text(2011, share_of_foreign_born_by_decade.loc[2010,"Staten Island"], 'Staten\nIsland', size=12, color=colors[5-1])

plt.savefig("Plots/foreign_And_share_population_by_borough.png", dpi = 300, bbox_inches = "tight")
plt.tight_layout()
plt.show()


### **Thoughts on Figure 5**
Having examined the general evolution of New York City's population across the 200+ years, we want to take a closer look at the cornerstone in history of not just how New York City came to be, but how all of the United States managed to become the worlds third most populous country - namely immigration. Thus, in `Figure 5` we visualize the amount of foreign born residents in New York City across time (the top plot) as well as the share that the foreign born inhabitants make up of the total population within each borough. We do this, by stacking two lineplots vertically in one figure, where both plots share the same x-axis. This makes it easier to compare between the absolute and relative degree of foreign born immigration at given points in time. Furthermore, we have choose to keep the same style for the lineplots in `Figure 5` as in `Figure 4`, to give the reader an intuitive understanding that we are still examining the population of NYC. We continue to use the same colors to dicern between the boroughs as in all the preceding plots.  

### **Figure 6: An Interactive Look at New York City's Racial Composition<a id="Figur_6"></a>**

We start by making a dictionary, which we use for hovertext in Figure 6

In [None]:
census_tract_data["hovertext"] = "<br>Neighborhood: " + census_tract_data.NTAName +\
                                 "<br>Borough: " + census_tract_data.BOROUGH +\
                                 "<br>Total population: " + census_tract_data.total_pop.apply(round).astype(str)

For the colorscale of the plot we make a function

In [None]:
def cmap_with_alpha(alpha=0.7, n_colors=100, plt_cmap='YlOrBr'):
    cmap_rgba_vals = plt.get_cmap(plt_cmap, n_colors) # get rgba values from the plt_cmap
    cmap_rgb_vals = [tuple(map(lambda x: x*255, cmap_rgba_vals(i)[:3])) for i in range(n_colors)][5:] # convert to rgb + slice to exclude white as lower color
    GEOMAP_CMAP = [f"rgba{rgb+tuple([alpha])}" for rgb in cmap_rgb_vals] # set alpha and convert to plotly-compatible format
    return GEOMAP_CMAP

Now we are ready to code and present the visualisation

In [None]:
label_dict = {'pct_white_one_race':"Pct. White Residents",
              'pct_black_one_race':"Pct. Black Residents",
              'pct_asian_one_race':"Pct. Asian Residents",
              'pct_hispanic_or_latino_any':"Pct. Hispanic & Latino Residents"}

# we need to add this to select which trace 
# is going to be visible
visible = np.array(list(label_dict.keys()))

# define traces and buttons at once
traces = []
buttons = []
for col, label in label_dict.items():

    traces.append(go.Choroplethmapbox(geojson=json.loads(census_tract_data.to_json()),
                                    z = census_tract_data[col],
                                    locations = census_tract_data.index,
                                    colorscale=cmap_with_alpha(),
                                    #colorbar_title = label,
                                    colorbar=dict(
                                        title=label,
                                        titleside='top',
                                        len = 0.8,
                                        orientation = "h",
                                        y = -0.17,
                                        tickfont = dict(size=15) # set the position of the colorbar title
                                    ),
                                    marker_line_width=0.1,
                                    visible= True if col==list(label_dict.keys())[0] else False,
                                    text = census_tract_data["hovertext"],
                                    hovertemplate= f"{label}: " + "%{z}%"+
                                                    "%{text}<extra></extra>"
                                    ))

    buttons.append(dict(label=label,
                        method="update",
                        args=[{"visible":list(visible==col)},
                              {"title":""}]))

updatemenus = [{"active":0,
                "buttons":buttons,
                "direction":'down',
                "showactive":True,
                "x":0.02,
                "y":0.98,
                "xanchor":"left",
                "yanchor":"top"}]


# Show figure
fig = go.Figure(data=traces,
                layout=dict(updatemenus=updatemenus))

fig.layout['title']['text'] = ""
                  


fig.update_layout(mapbox_style="carto-positron",
                    height = 600,
                    width = 1000,
                    autosize=True,
                    paper_bgcolor= "#FFF6E9",
                    margin={"r":0,"t":0,"l":0,"b":0},
                    mapbox=dict(center={"lat": 40.690610, "lon": -73.935242},zoom=9),
                    font_family = "Garamond",
                    font_size = 18
                    )

fig.show()

### **Thoughts on Figure 6**
With `Figure 1-5`, we now have a broad but solid impression of the evolution of both the physical and social development of New York City all the way from 1800 up until today. We now wish to take a more contemporary look at New York City and how its physical and social development manifests itself in the cities residential composition today. Concretely, we want to take a look at neighborhood segregation in New York City from a racial perspective. In a multicultural melthing pot such as New York City, which geographical patterns emerges when one observe it through a racial lens. For instance, how is the relatively high degree of commercial construction on Manhattan during the 20th centuery reflected in the characteristics of contemporary Manhattanites? 

To examine New York City's contemporary racial composition, we once again opt for a map-visualization in `Figure 6`. More specifically, we have chosen a choropleth map over the respective share of white, black, asian, and hispanic/latino residents in the given neighborhoods. Thus, we will make use of the same encodings as in (`Figure 1`), i.e. `position` and `color`. As we want to utilize the granularity of the data and allow the reader to interact with the data and to some degree takes into consideration the weaknes of, we make the choropleth map interactive, allowing the reader to zoom in on specific neighborhoods with a drop-down menu to switch between racial categories. Furthermore, we have added hover-text, presenting the exact share of residents belonging to a given racial category in a neighborhood, the total number of residents in the neighborhood, the name of the neighborhood, and the borough within which the neighborhoods lies. This information appears when hovering the cursor over a particular neighborhood. This also to some degree takes into the consideration the weakness of using color in ones figure, as we also considered in the `Figure 1`. 

 ### **Figure 7: New York City's Racial Composition Today<a id="Figur_7"></a>**

In [None]:
fig, ax = plt.subplots(2,2, figsize = (10,10))
fig.patch.set_facecolor("#FFF6E9")

# White Residents
census_tract_data.plot(column='pct_white_one_race', cmap='YlOrBr', edgecolor = "grey", linewidth=0.0, ax = ax[0][0])
new_york_boroughs_map.plot(facecolor = "none", edgecolor = "black", linewidth = 0.1, ax = ax[0][0])
ax[0][0].set_axis_off()

# Add a colorbar
cbar = ax[0][0].get_figure().colorbar(ax[0][0].collections[0], shrink = 0.5, location='bottom', pad = 0)
cbar.set_label('Pct. White Residents', size=15, labelpad = 5)

# Black Residents 
census_tract_data.plot(column='pct_black_one_race', cmap='YlOrBr', ax = ax[0][1])
new_york_boroughs_map.plot(facecolor = "none", edgecolor = "black", linewidth = 0.1, ax = ax[0][1])
ax[0][1].set_axis_off()
# Add a colorbar
cbar = ax[0][1].get_figure().colorbar(ax[0][1].collections[0], shrink = 0.5, location='bottom', pad = 0)
cbar.set_label('Pct. Black Residents', size=15, labelpad = 5)

# Asian Residents 
census_tract_data.plot(column='pct_asian_one_race', cmap='YlOrBr', ax = ax[1][0])
new_york_boroughs_map.plot(facecolor = "none", edgecolor = "black", linewidth = 0.1, ax = ax[1][0])
ax[1][0].set_axis_off()
# Add a colorbar
cbar = ax[1][0].get_figure().colorbar(ax[1][0].collections[0], shrink = 0.5, location='bottom', pad = 0)
cbar.set_label('Pct. Asian Residents', size=15, labelpad = 5)

# Hispanic and Latino Residents 
census_tract_data.plot(column='pct_hispanic_or_latino_any', cmap='YlOrBr', ax = ax[1][1])
new_york_boroughs_map.plot(facecolor = "none", edgecolor = "black", linewidth = 0.1, ax = ax[1][1])
ax[1][1].set_axis_off()
# Add a colorbar
cbar = ax[1][1].get_figure().colorbar(ax[1][1].collections[0], shrink = 0.5, location='bottom', pad = 0)
cbar.set_label('Pct. Hispanic/Latino Residents', size=15, labelpad = 5)

# Show the map
plt.tight_layout()

# Save 
plt.savefig("Plots/race_dist_by_tract.png", dpi = 300, bbox_inches = "tight")

plt.show()

### **Thoughts on Figure 7**
In `Figure 7`, we plot the same data as in `Figure 6` using the same style of visualisation - a choropleth map -, thus also using the same encodings - `position` and `color`. In `Figure 7` we use color intensity to distinguish between the neighborhoods' share of residents with a specific racial characteristic. Though color intensity can make it difficult the convey an immediately exact and accurate impression of the magnitudinal relation between the data points - as also mentioned earlier - we think it does really well at making the highly racially concentrated neighborhoods stand out in the choropleth map, in spite of the loss of exact accuracy associated with this encoding. By showing all four maps of New York City alongside each other in the figure, it is easier for the reader to quickly gain a comparative macro understanding of the racial segregation between the neighborhood or even broader areas.

(SKAL NOK UD)
Though `Figure 6` and `Figure 7` are generally similar, they convey two different messages. In `Figure 6` the geographical distribution of all four racial categories are presented simultaniously giving the reader a immediatly comperative understanding of how residential composition is highly clustered between neighborhoods as well as which parts of the city that are mainly inhabited by which racial groups. `Figure 7` displays only a single racial category at a time and is meant for more detailed exploration of the individual areas/neighborhoods and racial categories, inviting the reader to examine and unveil their own insights. Having had a quite broad scope throughout `Figure 1-6` spanning multiple decades, millions of people, hundred of thousands of building we want `Figure 7` to allow for a higher degree of granularity.  

# **6 Discussion<a id="Discussion"></a>**

### **Findings**<a id="Findings"></a>

### **For Further Studies**<a id="For Further Studies"></a>

In this section we present some considerations for further studies and ways in which to improve following this project. 

Early in this project we cited Ayn Rand who said: “The Greatest Sunset for a Sight of New York’s Skyline”. For further studies one could look at what is behind and underneath the skyline. We argue it would be relevant to look more in depth at the “clockwork” of New York City. We found Staten Island’s construction boom in the 1970s was around the same time the borough got connected through bridges with the other boroughs, highligtning the importance of critical infrastructure in managing urban space and opening up new areas for residential inflow. In conjuntion we also considered the role of busses within the history of racial segregation, also showing how some historically has closed areas for certain reason or groups. It would be highly interesting and relevant to look more into this phenomenon of transportation and the infrastructural development of New York City, as relevant methods for this would be a network scientific angle, examining the infrastructural connection between neighborhoods while comparing population development through time. 

The share of foreign born in New York City plays an essential role for our analysis. But this also results in broad analysis with the concept of *foreign born* being an umbrella term for the many different immigrant cultures which has arrived in New York City throughout time. If it were possible to acquire data on a more granular scale it would be possible to combine Figure 1 with Figure 6. Thus it would be possible to see the immigrational development in combination with the urbanisational development on a granular scale.

The PLUTO dataset present some exceptional opportunities, but it also has some limitations which we explained in the ["Basic Statistic"](#Dataset_of_the_buildings) section. For further studies it would be highly relevant to construct a dataset with none of the limitations from the PLUTO dataset. If one were to find records on demolished buildings of NYC and put these into a usable dataset, then it would be interesting seeing what buildings were demolished and what buildings were allowed to stay. We concluded that the rise in construction in 1920 was a result of the booming twenties, but maybe the high amount of construction in the dataset is because the buildings from the 20s just are the buildings which have been allowed to stay throughout time. (SKAL IKKE IND PÅ HJEMMESIDEN)

# **7 Contributions<a id="Contributions"></a>**
1. [Motivation](#Motivation)
2. [Basic Statistics](#Base_stats)
    - [Dataset 1: Landuse in New York City](#Dataset_of_the_buildings)
    - [Dataset 2: The population evolution of New York City](#Dataset_of_the_population)
    - [Dataset 3: The Racial Population in New York City Neighboorhoods](#Dataset_of_the_segregation)
3. [Data Analysis](#Data_Analysis)
4. [Genre](#Genre)
5. [Visualization](#Visualization)
    - [Figure 1: Geographical Distribution of Residential and Commercial Construction over Time](#Figur_1)
    - [Figure 2: Constructed Square Feet by Decade](#Figur_2)
    - [Figure 3: Constructed Square Feet by Decade and Borough](#Figur_3)
    - [Figure 4: Population by Borough](#Figur_4) 
    - [Figure 5: Foreign Born Population by Borough from 1900-2010](#Figur_5) 
    - [Figure 6: New York City's Racial Composition](#Figur_6)
    - [Figure 7: Racial Composition By Census Tract](#Figur_7)
6. [Discussion](#Discussion)
    - [Findings](#Findings)
    - [For Further Studies](#For_Further_Studies)
7. [Contribution](#Controbution)
8. [References](#References)


# **8 References <a id="References"></a>**
In this section we present all the references which we have used for either the website or the explainer notebook. 


#### Articles
- Campanella, Thomas J. 2017: "Robert Moses and His Racist Parkway, Explained", Bloomberg, Available at: <a href="https://www.bloomberg.com/news/articles/2017-07-09/robert-moses-and-his-racist-parkway-explained" target="_blank">https://www.bloomberg.com/news/articles/2017-07-09/robert-moses-and-his-racist-parkway-explained</a> 

- citymayors 2020: "The 150 richest cities in the world by GDP in 2020", Available at: - <a href="http://www.citymayors.com/statistics/richest-cities-2020.html" target="_blank">http://www.citymayors.com/statistics/richest-cities-2020.html</a> 

- Cohn, D´vera 2015: "How U.S. immigration laws and rules have changed through history", pewresearch.org, Available at: <a href="https://www.pewresearch.org/short-reads/2015/09/30/how-u-s-immigration-laws-and-rules-have-changed-through-history/" target="_blank">https://www.pewresearch.org/short-reads/2015/09/30/how-u-s-immigration-laws-and-rules-have-changed-through-history/</a> 

- DiNapoli, Thomas P. and Kenneth B. Bleiwas 2016: "The Role of Immigrants in the New York City Economy", Report 7-2016, State of New York Comptroller, Available at: <a href="https://www.osc.state.ny.us/files/reports/osdc/pdf/report-7-2016.pdf" target="_blank">https://www.osc.state.ny.us/files/reports/osdc/pdf/report-7-2016.pdf</a>

- exploros 2023: "Economy in the 1950", Available at: <a href="https://www.exploros.com/summary/Economy-in-the-1950s" target="_blank">https://www.exploros.com/summary/Economy-in-the-1950s</a>

- Glaeser, Edward L. 2005: "Urban Colussus: Why is New York America´s Largest City", Economy Policy Review. Available at: <a href="https://www.newyorkfed.org/medialibrary/media/research/epr/05v11n2/0512glae.pdf" target="_blank">https://www.newyorkfed.org/medialibrary/media/research/epr/05v11n2/0512glae.pdf</a>

- Jæger, M.M. 2006: “Description as Choice”, Oxford Economic Papers, Vol. 32(3)

- Krumpal, Ivar 2011: "Determinants of social desirability bias in sensitive surveys: a literature review" Qual Quant (2013) 2025-2047, Springer Science+Business Media

- localize 2020:"What are the most diverse neighborhoods in NYC?", Available at: <a href="https://www.localize.city/blog/what-are-the-most-diverse-neighborhoods-in-nyc/" target="_blank">https://www.localize.city/blog/what-are-the-most-diverse-neighborhoods-in-nyc/</a>

- Miller, Donald 2014: "Build for Business: Midtown Manhattan in the 1920s", entrepeneur, Available at: <a href="https://www.entrepreneur.com/growing-a-business/built-for-business-midtown-manhattan-in-the-1920s/239257" target="_blank">https://www.entrepreneur.com/growing-a-business/built-for-business-midtown-manhattan-in-the-1920s/239257</a>

- Segel, Edward and Jeffrey Heer 2010: "Narrative Visualization: Telling Stories with Data", Stanford University, Available at: <a href="http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf" target="_blank">http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf</a> 

- Sterbenz, Christina 2013: "New York City Used To Be A Terrifying Place", businessinsider, Available at: <a href="https://www.businessinsider.com/new-york-city-used-to-be-a-terrifying-place-photos-2013-7?r=US&IR=T" target="_blank">https://www.businessinsider.com/new-york-city-used-to-be-a-terrifying-place-photos-2013-7?r=US&IR=T</a> 

- Venstre 2023, "De Glade Tressere", Available at: <a href="https://www.venstre.dk/partiet/skoleweb/politisk-historie/velfaerd-og-vaelgerskred/de-glade-tressere" target="_blank">https://www.venstre.dk/partiet/skoleweb/politisk-historie/velfaerd-og-vaelgerskred/de-glade-tressere</a> 



#### Wikipedia pages
- <a href="https://en.wikipedia.org/wiki/Central_business_district" target="_blank">Central Business District</a> 
- <a href="https://en.wikipedia.org/wiki/Emergency_Quota_Act" target="_blank">Emergency Quota Act</a>
- <a href="https://en.wikipedia.org/wiki/History_of_New_York_City_(1946%E2%80%931977)" target="_blank">Fiscal crisis in the 1970s New York City<a>
- <a href="https://en.wikipedia.org/wiki/List_of_cities_by_international_visitors" target="_blank">Millions of turists visit New York City</a>
- <a href="https://en.wikipedia.org/wiki/Emergency_Quota_Act" target="_blank">National restiction law</a>
- <a href="https://en.wikipedia.org/wiki/Census_tract#United_States" target="_blank">Neighborhood for Census Tract</a>
- <a href="https://en.wikipedia.org/wiki/1970s_energy_crisis" target="_blank"> The global energy crisis in the 1970s</a>
- <a href="https://en.wikipedia.org/wiki/Great_Depression" target="_blank">The Great Depression</a>
- <a href="https://www.encyclopedia.com/history/encyclopedias-almanacs-transcripts-and-maps/prosperity-decade-1921-1928-overview" target="_blank">The Prosperity Decade</a>
- <a href="https://en.wikipedia.org/wiki/Wall_Street_Crash_of_1929" target="_blank">The Wall Street Crash in October of 1929</a>
- <a href="https://en.wikipedia.org/wiki/Verrazzano-Narrows_Bridge" target="_blank">Verrazzano-Narrows Bridge</a>




#### Special thanks to these Githubs: 
- https://github.com/python-visualization/folium/issues/1062

- https://tilemill-project.github.io/tilemill/docs/guides/advanced-legends/


#### Datasets

Dataset 1: Land Use in New York (Figure 1, 2 and 3)
- PLUTO <a href="https://www.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page" target="_blank">Dataset</a>
- PLUTO <a href="https://s-media.nyc.gov/agencies/dcp/assets/files/pdf/data-tools/bytes/PLUTODD.pdf" target="_blank">Documentation</a>

Dataset 2: Population Development and Foreign Born Population of New York (Figure 4 and Figure 5)
- Data on the <a href="https://www.nyc.gov/assets/planning/download/office/planning-level/nyc-population/historical-population/nyc_total_pop_1900-2010.xlsx" target="_blank">NYC population</a>
- Data on the <a href="https://www.nyc.gov/assets/planning/download/office/planning-level/nyc-population/historical-population/nyc_total_pop_1900-2010.xlsx" target="_blank">foreign share of the NYC population</a> 
- Poplutation estimates from NYC for 2030-2040, manually extracted <a href="https://www.nyc.gov/assets/planning/download/pdf/planning-level/nyc-population/projections_report_2010_2040.pdf" target="_blank">from this PDF</a>

Dataset 3: The Racial Population in New York City Neighboorhoods (Figure 6)
- <a href="https://www.census.gov/data/developers/data-sets.html" target="_blank">API Documentation</a>
- <a href="https://www.census.gov/programs-surveys/acs" target="_blank">American Community Survey (ACS)</a>
- <a href="https://www.census.gov/data/developers/data-sets/acs-5year.html" target="_blank">American Community Survey 5-Year Data Documentation</a>
- <a href="https://www.census.gov/topics/population/race.html" target="_blank">Documantation regarding the *race* variable</a>