# Dataset Description

This dataset provides information on 311 service requests in New York City from 2010 to the present. It contains details on each request, including the creation date, complaint type, location, responsible agency, and request status. It is updated daily and provided by the New York City 311 agency.

# Dictionary

| **Column Name**       | **Description**                                              | **API Field Name**            | **Data Type**             |
|----------------------|----------------------------------------------------------|---------------------------|------------------------|
| **Unique Key**         | Unique identifier of a service request                | `unique_key`               | Text                   |
| **Created Date**       | Date the service request was created                   | `created_date`             | Floating Timestamp     |
| **Closed Date**        | Date the service request was closed                    | `closed_date`              | Floating Timestamp     |
| **Agency**            | Acronym of the responding government agency             | `agency`                   | Text                   |
| **Agency Name**       | Full name of the government agency                      | `agency_name`              | Text                   |
| **Complaint Type**    | Type of complaint or reported issue                     | `complaint_type`           | Text                   |
| **Descriptor**        | Additional details about the type of complaint          | `descriptor`               | Text                   |
| **Incident Zip**      | Incident location ZIP code                              | `incident_zip`             | Text                   |
| **Incident Address**  | Address of the reported incident                        | `incident_address`         | Text                   |
| **Street Name**       | Street name where the incident occurred                 | `street_name`              | Text                   |
| **Cross Street 1**    | First nearby cross street                               | `cross_street_1`           | Text                   |
| **Cross Street 2**    | Second nearby cross street                              | `cross_street_2`           | Text                   |
| **City**             | City where the incident occurred                        | `city`                     | Text                   |
| **Borough**          | Borough where the incident occurred                     | `borough`                  | Text                   |
| **Latitude**         | Latitude based on the geographic location               | `latitude`                 | Number                 |
| **Longitude**        | Longitude based on the geographic location              | `longitude`                | Number                 |
| **Status**           | Current status of the service request                   | `status`                   | Text                   |
| **Resolution Date**  | Date when the request was resolved                      | `resolution_action_updated_date` | Floating Timestamp |

In [1]:
import requests
import pandas as pd
from io import StringIO
from concurrent.futures import ThreadPoolExecutor

# Base URL del API
BASE_URL = "https://data.cityofnewyork.us/resource/fed5-ydvq.csv?$query="

# Consulta SQL para extraer todas las columnas con filtro de fecha
QUERY_TEMPLATE = """SELECT created_date, complaint_type,	descriptor, incident_address, longitude, latitude
WHERE created_date BETWEEN '{start_date}' AND '{end_date}'
ORDER BY created_date DESC"""

LIMIT = 5000  # Número de registros por solicitud

# Función para obtener datos con paginación y filtrado de fechas
def fetch_data(offset, start_date, end_date):
    query = QUERY_TEMPLATE.format(start_date=start_date, end_date=end_date)
    url = f"{BASE_URL}{query} LIMIT {LIMIT} OFFSET {offset}"

    try:
        response = requests.get(url)
        response.raise_for_status()
        df = pd.read_csv(StringIO(response.text))
        return df if not df.empty else None
    except Exception as e:
        print(f"Error obteniendo datos en el offset {offset}: {e}")
        return None

# Función principal para obtener y consolidar los datos filtrados
def get_filtered_data(start_date, end_date):
    print(f"Obteniendo datos desde {start_date} hasta {end_date}...")

    # Obtener el primer lote de datos
    initial_df = fetch_data(0, start_date, end_date)
    if initial_df is None:
        print("No se encontraron datos en el rango de fechas especificado.")
        return None

    # Lista para almacenar los resultados
    all_data = [initial_df]
    total_records = len(initial_df)

    print(f"Primer lote recuperado: {total_records} registros.")

    # Crear lista de offsets para paginación
    offsets = list(range(LIMIT, 1000000, LIMIT))  # Hasta 1 millón de registros

    # Obtener datos en paralelo
    with ThreadPoolExecutor(max_workers=10) as executor:
        results = list(executor.map(lambda offset: fetch_data(offset, start_date, end_date), offsets))

    # Agregar solo los resultados que contienen datos
    for result in results:
        if result is not None:
            all_data.append(result)

    # Combinar todos los datos en un solo DataFrame
    data = pd.concat(all_data, ignore_index=True)
    print(f"Total de registros recuperados: {len(data)}")

    return data

# Uso con el rango de fechas especificado
start_date = "2025-02-01"
end_date = "2025-03-06"
filtered_data = get_filtered_data(start_date, end_date)

Obteniendo datos desde 2025-02-01 hasta 2025-03-06...
Primer lote recuperado: 4780 registros.
Total de registros recuperados: 4780


In [2]:
filtered_data

Unnamed: 0,created_date,complaint_type,descriptor,incident_address,longitude,latitude
0,2025-03-05T23:59:32.000,Street Condition,Pothole,,-73.962643,40.696087
1,2025-03-05T23:18:51.000,Street Condition,Pothole,1031 MANHATTAN AVENUE,-73.954914,40.734116
2,2025-03-05T23:17:40.000,Street Condition,Pothole,905 MANHATTAN AVENUE,-73.954303,40.730440
3,2025-03-05T23:16:41.000,Street Condition,Pothole,598 MANHATTAN AVENUE,-73.950365,40.723064
4,2025-03-05T23:15:23.000,Street Condition,Pothole,,-73.957561,40.714500
...,...,...,...,...,...,...
4775,2025-02-01T08:26:17.000,Street Condition,Pothole,,-73.898441,40.826350
4776,2025-02-01T08:25:25.000,Street Condition,Pothole,,-73.898835,40.826358
4777,2025-02-01T08:24:33.000,Street Condition,Pothole,,-73.914474,40.811651
4778,2025-02-01T05:52:33.000,Street Condition,Pothole,86 STREET,,


In [3]:
# Convert the 'created_date' column to datetime format if it is not already
filtered_data['created_date'] = pd.to_datetime(filtered_data['created_date'])

# Filter the data for March 3, 2025
filtered_data_march = filtered_data[(filtered_data['created_date'].dt.year == 2025) &
                     (filtered_data['created_date'].dt.month == 3)][['complaint_type',	'descriptor', 'incident_address', 'latitude', 'longitude']]

# Remove rows where latitude or longitude are NaN
filtered_data_march_clean = filtered_data_march.dropna(subset=['latitude', 'longitude'])

filtered_data_march_clean

Unnamed: 0,complaint_type,descriptor,incident_address,latitude,longitude
0,Street Condition,Pothole,,40.696087,-73.962643
1,Street Condition,Pothole,1031 MANHATTAN AVENUE,40.734116,-73.954914
2,Street Condition,Pothole,905 MANHATTAN AVENUE,40.730440,-73.954303
3,Street Condition,Pothole,598 MANHATTAN AVENUE,40.723064,-73.950365
4,Street Condition,Pothole,,40.714500,-73.957561
...,...,...,...,...,...
866,Street Condition,Pothole,,40.849419,-73.911122
867,Street Condition,Pothole,,40.848639,-73.911679
868,Street Condition,Pothole,,40.804813,-73.884899
869,Street Condition,Pothole,831 HUNT'S POINT AVENUE,40.817525,-73.888530


In [5]:
# Save
filtered_data_march_clean.to_csv("pothole.csv", index=False)

In [None]:
import folium

# Create a map centered on an average location of the data
map_center = [filtered_data_march_clean['latitude'].mean(), filtered_data_march_clean['longitude'].mean()]
mymap = folium.Map(location=map_center, zoom_start=12)

# Add markers for each point in filtered_data_clean
for _, row in filtered_data_march_clean.iterrows():
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        popup=row['descriptor'],
        tooltip=row['descriptor']
    ).add_to(mymap)

# Display the map
mymap