# What's In My Neighborhood

This note book gets data from the Minnesota Pollution Control Agency's [What's in My Neighborhood Sites](https://www.pca.state.mn.us/data/whats-my-neighborhood) database published to the [Minnestota Geospatial Commons,](https://gisdata.mn.gov/dataset/env-my-neighborhood) clips the data to Minneapolis boundaries, edits the entries for clarity by removing missing or mislabeled data and eliminating extraneaous information, and saves the file as a geojson. (hopefully)

**Downloaded Data Info:**

<!-- CRS: WGS84 - epsg:4326 -->

Size: 

**Saved Data Info:**

<!-- CRS: NAD83, UTM zone 15N -- epsg:26915
    
Size: 4.1mb -->

Source: https://gisdata.mn.gov/dataset/env-my-neighborhood

In [2]:
### Import Libraries

# File manipulation

import os # For working with Operating System
from sys import platform # Diagnose operating system
import urllib # For accessing websites
import zipfile # For extracting from Zipfiles
from io import BytesIO # For reading bytes objects

# Analysis

import numpy as np # For working with Arrays
import pandas as pd # Data Manipulation
import geopandas as gpd # Spatial Data Manipulation

# Visualization

from pprint import pprint # Pretty Printing
import matplotlib.pyplot as plt # Basic Plotting
import contextily # Base Map Visualization

import warnings
warnings.filterwarnings('ignore') # Ignores some warnings

In [3]:
### Definitions

files_before = os.listdir() # Get filenames in working directory so they aren't deleted at the end.
cwd = os.getcwd() # Current Working Directory

# Forward or back slashs for filepaths? <- Not sure here. Only know Windows & Linux

if platform == "linux" or platform == "linux2":
    slash = '/'
elif platform == 'win32':
    slash = '\\'

def extract_zip_from_url(urls=None):
    '''Extract a zipfile from the internet and unpack it in working directory.
    Takes a single url (string) or a list of urls.'''
    
    if type(urls) == str: # Single url
        url = urls
        response = urllib.request.urlopen(url) # Get a response
        zip_folder = zipfile.ZipFile(BytesIO(response.read())) # Read Response
        zip_folder.extractall() # Extract files
        zip_folder.close()
    
    elif type(urls) == list: # List of urls
        for url in urls:
            response = urllib.request.urlopen(url) # Get a response
            zip_folder = zipfile.ZipFile(BytesIO(response.read())) # Read Response
            zip_folder.extractall() # Extract files
            zip_folder.close()
            
    else:
        print('Error Extracting: Invalid Input')

def clip_to_extent(gdf):
    '''This function returns the dataset clipped to the boundaries of Minneapolis and the boundary itself.
    Warning: This function will access the geojson of Minneapolis from GitHub if it's not in the current working directory or local Boundary folder.
    See this link for more info: https://github.com/RwHendrickson/MappingGZ/blob/main/Prototype/Notebooks/CleaningData/Boundary/DefineBoundary.ipynb'''
    
    # Look for mpls_boundary
    
    cwd = os.getcwd() # Current working directory

    os.chdir('..') # Bump up one directory

    if 'mpls_boundary.shp' in os.listdir(cwd): # Boundary shapefile in the current working directory
        mpls_boundary = gpd.read_file(r'mpls_boundary.shp') # Load extent as GeoDataFrame

    elif 'mpls_boundary.geojson' in os.listdir(cwd): # Boundary json in the current working directory
        mpls_boundary = gpd.read_file(r'mpls_boundary.geojson') # Load extent as GeoDataFrame

    elif 'Boundary' in os.listdir(): # Boundary folder on computer
        if 'mpls_boundary.geojson' in os.listdir('Boundary'): # Is the json there?
            path = 'Boundary' + slash + 'mpls_boundary.geojson'
            mpls_boundary = gpd.read_file(path)

        else:
            print('''Can't find mpls_boundary.geojson. Accessing from GitHub.\n 
            See this link for more info: \n\nhttps://github.com/RwHendrickson/MappingGZ/blob/main/Prototype/Notebooks/CleaningData/Boundary/DefineBoundary.ipynb''')
            url = 'https://raw.githubusercontent.com/RwHendrickson/MappingGZ/main/Prototype/Notebooks/CleaningData/Boundary/mpls_boundary.geojson'
            mpls_boundary = gpd.read_file(url) # Load extent as GeoDataFrame
    else:
            print('''Can't find mpls_boundary.geojson. Accessing from GitHub.\n 
            See this link for more info: \n\nhttps://github.com/RwHendrickson/MappingGZ/blob/main/Prototype/Notebooks/CleaningData/Boundary/DefineBoundary.ipynb''')
            url = 'https://raw.githubusercontent.com/RwHendrickson/MappingGZ/main/Prototype/Notebooks/CleaningData/Boundary/mpls_boundary.geojson'
            mpls_boundary = gpd.read_file(url) # Load extent as GeoDataFrame


    os.chdir(cwd) # Go back to current working directory
    
    if gdf.crs != 'EPSG:26915': # Ensures gdf is in the same CRS
        gdf = gdf.to_crs('EPSG:26915')
    
    clipped = gpd.clip(gdf, mpls_boundary) # Clip
    
    return clipped, mpls_boundary

In [4]:
### Load Data

## What's In My Neighborhood

if 'csv_env_my_neighborhood.zip' not in os.listdir():
    print('''Downloaded current state data from MPCA (~40.6 mb)\n
    format: .csv''')
    url = 'https://resources.gisdata.mn.gov/pub/gdrs/data/pub/us_mn_state_pca/env_my_neighborhood/csv_env_my_neighborhood.zip'
    extract_zip_from_url(url)
    my_nabe_df = pd.read_csv("my_neighborhood_sites.csv") # Load as DataFrame

else:
    print('You already have my_neighborhood_sites.csv. Please skip to the next notebook.')

Downloaded current state data from MPCA (~40.6 mb)

    format: .csv


In [5]:
# cast dataframe as geodataframe 
my_nabe_gdf = gpd.GeoDataFrame(my_nabe_df, geometry=gpd.points_from_xy(my_nabe_df.longitude, my_nabe_df.latitude))

#set CRS to NAD 83/ UTM zone 15
my_nabe_gdf = my_nabe_gdf.set_crs('EPSG:4326')

# Check in
my_nabe_gdf.head()
my_nabe_gdf.crs


<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

* OG df did not contain crs info, and has shape info, wondering if treating this like point data in a gdf is best practice
* also, looks to be many diff coordinate collection methods and likely various degrees of accuracy.

In [6]:
#clip data set to MPLS boundary

my_nabe_clp, mpls_boundary = clip_to_extent(my_nabe_gdf)


In [11]:
#export clipped data to shapefile
my_nabe_clp.to_file('minneapolis_my_neighborhood_sites.shp')


In [12]:
# export clipped data to csv
my_nabe_clp.to_csv('minneapolis_my_neighborhood_sites.csv', index_label = 'index')