# Notebook Title

## Setup Python and R environment
you can ignore this section

In [5]:
%load_ext rpy2.ipython
%load_ext autoreload
%autoreload 2

%matplotlib inline  
from matplotlib import rcParams
rcParams['figure.figsize'] = (16, 100)

import warnings
from rpy2.rinterface import RRuntimeWarning
warnings.filterwarnings("ignore") # Ignore all warnings
# warnings.filterwarnings("ignore", category=RRuntimeWarning) # Show some warnings

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display, HTML

In [6]:
%%javascript
// Disable auto-scrolling
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

<IPython.core.display.Javascript object>

In [7]:
%%R

# My commonly used R imports

require('tidyverse')

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors


Loading required package: tidyverse


## 👉 download your data

You can write code here to download your dataset. Or if you already have it, just leave the URL in the comments and just load it into a pandas or R (or both) dataframe.

In [8]:
%%R

#Import the csv file 311_Dogs_Data.csv
df <- read.csv('311_dogs_data_2023.csv')

#Show the first few rows of the dataframe
head(df)


  Unique.Key           Created.Date            Closed.Date Agency
1   59892199 12/31/2023 11:38:00 PM 01/05/2024 07:45:00 AM    DEP
2   59889491 12/31/2023 11:17:00 PM 01/05/2024 07:30:00 AM    DEP
3   59893098 12/31/2023 11:08:00 PM 01/05/2024 07:32:00 AM    DEP
4   59886784 12/31/2023 10:53:00 PM 01/03/2024 09:24:00 AM    DEP
5   59887730 12/31/2023 10:38:00 PM 01/11/2024 08:56:00 AM    DEP
6   59890331 12/31/2023 10:26:00 PM 01/17/2024 08:45:00 PM    DEP
                             Agency.Name Complaint.Type
1 Department of Environmental Protection          Noise
2 Department of Environmental Protection          Noise
3 Department of Environmental Protection          Noise
4 Department of Environmental Protection          Noise
5 Department of Environmental Protection          Noise
6 Department of Environmental Protection          Noise
                Descriptor Location.Type Incident.Zip          Incident.Address
1 Noise, Barking Dog (NR5)            NA        10011      200 WES

In [9]:
%%R

library(tigris)
library(sf)
library(dplyr)


To enable caching of data, set `options(tigris_use_cache = TRUE)`
in your R script or .Rprofile.
Linking to GEOS 3.12.1, GDAL 3.8.3, PROJ 9.3.1; sf_use_s2() is TRUE


#The following code would work on Python 3.10.3 - but we couldn't figure out how to change the environment to 3.10.3. Therefore, Aishi ran it on her system and gave me the file.

import pandas as pd 
import censusgeocode as cg
from concurrent.futures import ThreadPoolExecutor
from tqdm.notebook import tqdm
import glob
import json
import requests
import pandas as pd
from pprint import pprint
from tqdm import tqdm


import requests_cache
cache = requests_cache.CachedSession("geocode_cache", backend="filesystem")

def geocode(lat, lng):
    try:
        url = "https://geocoding.geo.census.gov/geocoder/geographies/coordinates"
        params = {
            "x": lng,
            "y": lat,
            "benchmark": "Public_AR_Census2020",
            "vintage": "Census2020_Census2020",
            "format": "json"
        }
        response = cache.get(url, params=params)
        response.raise_for_status()
        data = response.json()
        census = data['result']['geographies']['Census Blocks'][0]
        return census
    except Exception as e:
        print(f"Error geocoding ({lat}, {lng}): {e}")
        return None

def bulk_geocode(latitudes, longitudes):
    """
    Geocode a list of latitudes and longitudes in parallel (for speed).
    """

    with ThreadPoolExecutor() as tpe:
        latitudes = df['Latitude']
        longitudes = df['Longitude']
        mapped_results = tpe.map(geocode, latitudes, longitudes)
        data = list(tqdm(mapped_results, total=len(df)))

    return pd.DataFrame(data)

census_geos_df = bulk_geocode(df['Latitude'], df['Longitude']) 

to_keep = ['GEOID', 'STATE', 'COUNTY', 'TRACT', 'BLOCK']
census_geos_df = census_geos_df[to_keep]
census_geos_df

df_with_geos = pd.concat(
    [ 
        df.reset_index(drop=True),
        census_geos_df.reset_index(drop=True)
    ], 
    axis=1)


In [15]:
%%R

#Import the processed csv file 
df <- read.csv('311_processed.csv')

In [16]:
%%R

head(df)

  Unique.Key           Created.Date            Closed.Date Agency
1   59892199 12/31/2023 11:38:00 PM 01/05/2024 07:45:00 AM    DEP
2   59889491 12/31/2023 11:17:00 PM 01/05/2024 07:30:00 AM    DEP
3   59893098

 12/31/2023 11:08:00 PM 01/05/2024 07:32:00 AM    DEP
4   59886784 12/31/2023 10:53:00 PM 01/03/2024 09:24:00 AM    DEP
5   59887730 12/31/2023 10:38:00 PM 01/11/2024 08:56:00 AM    DEP
6   59890331 12/31/2023 10:26:00 PM 01/17/2024 08:45:00 PM    DEP
                             Agency.Name Complaint.Type
1 Department of Environmental Protection          Noise
2 Department of Environmental Protection          Noise
3 Department of Environmental Protection          Noise
4 Department of Environmental Protection          Noise
5 Department of Environmental Protection          Noise
6 Department of Environmental Protection          Noise
                Descriptor Location.Type Incident.Zip          Incident.Address
1 Noise, Barking Dog (NR5)            NA        10011      200 WEST   20 STREET
2 Noise, Barking Dog (NR5)            NA        11105           20-48 31 STREET
3 Noise, Barking Dog (NR5)            NA        11209           9101 SHORE ROAD
4 Noise, Barking Dog (NR5)          

## 👉 convert addresses --> lat/long 

See the [census-examples](https://github.com/data4news/census-examples) repository for examples. If you need help, try asking in the class slack channel. Chances are someone in the class is struggling with the same problem as you are so we might as well all learn together in the same slack channel! 

## 👉 convert lat/long to census geography codes 

(like 'GEOID', 'STATE', 'COUNTY', 'TRACT', 'BLOCK', etc...)

Same note as above, see [census-examples](https://github.com/data4news/census-examples) repository for examples or ask in the class slack channel if stuck.

## 👉 Output Data

Output your dataframe containing your data and the Census connector codes (like tract, block, etc...).