# Data Description

This dataset provides information about school projects currently under construction in New York City, including new schools (Capacity) and Capital Improvement Projects (CIP).

The data is collected and maintained by the School Construction Authority (SCA) and is updated quarterly. It has been publicly available since October 9, 2011.

# Columns name  - Dictionary Column

| **Column Name**       | **Description**                                              | **API Field Name**      | **Data Type**      |
|----------------------|----------------------------------------------------------|----------------------|-------------------|
| **School Name**           | Name of the school                                      | `name`                 | Text             |
| **BoroughCode**           | Borough code where the school is located              | `boro`                 | Text             |
| **Geographical District** | District where the school is located                  | `geo_dist`             | Number           |
| **Project Description**   | Description of the construction work                  | `projdesc`             | Text             |
| **Construction Award**    | Value of the prime construction contract              | `award`                | Number           |
| **Project Type**         | Identifies whether the project is **CIP** or **Capacity** | `constype`             | Text             |
| **Building ID**           | Unique identifier of the building                     | `buildingid`           | Text             |
| **Building Address**      | Address of the building under construction            | `building_address`     | Text             |
| **City**                 | City where the project is located                      | `city`                 | Text             |
| **Borough**              | Name of the borough where the school is located       | `borough`              | Text             |
| **Latitude**             | Latitude of the site location                         | `latitude`             | Number           |
| **Longitude**            | Longitude of the site location                        | `longitude`            | Number           |
| **Community Board**      | NYC community district associated with the site      | `community_board`      | Number           |
| **Council District**     | NYC City Council district where the site is located  | `community_council`    | Number           |
| **BIN**                 | Building Identification Number (BIN)                  | `bin`                  | Number           |
| **BBL**                 | Borough, Block, and Lot number (BBL)                  | `bbl`                  | Number           |
| **Census Tract (2020)**  | Census tract where the site is located (Census 2020) | `census_tract`         | Number           |
| **Neighborhood Tabulation Area (NTA) (2020)** | NYC Neighborhood Tabulation Area (Census 2020) | `nta`                  | Text             |
| **Location 1**           | System-generated column for mapping representation   | `location_1`           | Location         |




In [None]:
import pandas as pd
import requests
from io import StringIO
from concurrent.futures import ThreadPoolExecutor

In [None]:
# Base parameters
BASE_URL = "https://data.cityofnewyork.us/resource/8586-3zfm.csv?$query="
QUERY_TEMPLATE = """SELECT name AS school_name, latitude, longitude
                    WHERE latitude IS NOT NULL AND longitude IS NOT NULL
                    ORDER BY school_name ASC"""
LIMIT = 5000  # Number of records per request

# Function to fetch data with pagination
def fetch_data(offset):
    url = f"{BASE_URL}{QUERY_TEMPLATE} LIMIT {LIMIT} OFFSET {offset}"

    try:
        response = requests.get(url)
        response.raise_for_status()
        df = pd.read_csv(StringIO(response.text))
        return df if not df.empty else None
    except Exception as e:
        print(f"Error fetching data at offset {offset}: {e}")
        return None

# Main function to retrieve and consolidate the data
def get_all_data():
    print("Fetching **ALL** available school construction projects data...")

    # Fetch the first batch to verify if there is data available
    initial_df = fetch_data(0)
    if initial_df is None:
        print("No data found in the dataset.")
        return None

    # List to store results
    all_data = [initial_df]
    total_records = len(initial_df)

    print(f"First batch retrieved: {total_records} records.")

    # Create a list of offsets for pagination
    offsets = list(range(LIMIT, 1000000, LIMIT))  # Up to 1 million records

    # Fetch data in parallel
    with ThreadPoolExecutor(max_workers=10) as executor:
        results = list(executor.map(fetch_data, offsets))

    # Add only the results that contain data
    for result in results:
        if result is not None:
            all_data.append(result)

    # Combine all data into a single DataFrame
    data = pd.concat(all_data, ignore_index=True)
    print(f"Total records retrieved: {len(data)}")

    return data

# Fetch ALL available data
all_data = get_all_data()

Fetching **ALL** available school construction projects data...
First batch retrieved: 1138 records.
Total records retrieved: 1138


In [None]:
all_data

Unnamed: 0,school_name,latitude,longitude
0,3K CENTER @ 1010 THIRD AVENUE - MANHATTAN,40.762278,-73.966071
1,3K CENTER @ 104-72 ROOSEVELT AVENUE - QUEENS,40.750445,-73.859660
2,3K CENTER @ 10 ELM STREET - STATEN ISLAND,40.641761,-74.114913
3,3K CENTER @ 129 VAN BRUNT STREET - BROOKLYN,40.684170,-74.005560
4,3K CENTER @ 1450 PLIMPTON AVENUE - BRONX,40.843998,-73.922008
...,...,...,...
1133,X352 SPED - BRONX,40.846711,-73.889815
1134,X352 SPED - BRONX,40.822719,-73.889435
1135,X721 SPED - BRONX,40.842289,-73.838949
1136,X721 SPED - BRONX,40.842289,-73.838949


In [None]:
# Save
all_data.to_csv("school.csv", index=False)

In [None]:
import folium

# Create a map centered on an average location of the data
map_center = [all_data['latitude'].mean(), all_data['longitude'].mean()]
mymap = folium.Map(location=map_center, zoom_start=12)

# Add markers for each point in filtered_data_clean
for _, row in all_data.iterrows():
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        popup=row['school_name'],
        tooltip=row['school_name']
    ).add_to(mymap)

# Display the map
mymap
