In [1]:
!jupyter nbconvert --to html prelim_MAPC.ipynb

[NbConvertApp] Converting notebook prelim_MAPC.ipynb to html
[NbConvertApp] Writing 616064 bytes to prelim_MAPC.html


## Deliverable 1 analysis for MAPC

Some quick notes
- MAPC is **super expansive**
- Some datasets are harder to acquire than others (scraping?)
- Some datasets are not there at all/some download links aren't working/No easy csv export
- No easy way to download each dataset, might have to individually click links for each data
- Does already have some spatial and demographic data, might be good gut checks when we get to it
- Layer files?
- What is the coords column?

**What datasets are important?**
- Food retailers
- Emergency food providers 
    - This data can be hard to obtain in particular as there is no easy export to csv or shp. Scrape?
- Schools
- MBTA rapid transit and bus routes


**What datasets can be useful later to validate our analysis?**
- Food access index
- Demographic data
    - Poverty, food access indexes, race statistics
    

**The catagorization of the datasets are relatively easy**
- Food retailers are under food access/maybe commercial?
- Farmer markets are under food access
- Schools are education

**Where we might want more data**
- Hospitals and city centers

**Big questions/limitations:**
- Is there any overlap with these datasets and the BU spark dataset? If so, what's an easy way to deal with it?
- We should try to standardize the structure of our data to make future merging easier
- Some of the data does not have an easy export to csv option, we will likely need to scrape the data


## Next steps

- Merging dataframes that are similar
    - ie: Farmers/food retailers
- Do a more thorough analysis of the BU spark dataset to see if overlap exists
- Scrape the harder to reach datasets
- Consider which datasets are important or not

***

## Code

In [2]:
import shapefile
import pandas as pd
import os

In [3]:
#https://stackoverflow.com/questions/55112771/read-shapefiles-into-dataframe
def read_shapefile(sf_shape):
    """
    Read a shapefile into a Pandas dataframe with a 'coords' 
    column holding the geometry information. This uses the pyshp
    package
    """

    fields = [x[0] for x in sf_shape.fields][1:]
    records = [y[:] for y in sf_shape.records()]
    #records = sf_shape.records()
    shps = [s.points for s in sf_shape.shapes()]
    df = pd.DataFrame(columns=fields, data=records)
    df = df.assign(coords=shps)
    return df

In [4]:
data = {}
for folder in os.listdir('data'):
    
    for item in os.listdir('data/' + folder):
        if('.csv' in item):
            print(item)
            data[item] = pd.read_csv('data/'+folder+'/' + item)
        
        #xml files are weird
        elif('.shp' in item and '.xml' not in item):
            print(item)
            print("data/" + folder + '/' + item)
            shape = shapefile.Reader("data/" + folder + '/' + item)
            data[item] = read_shapefile(shape)

FARMERSMARKETS_PT.shp
data/Farmers_market/FARMERSMARKETS_PT.shp
mapc.food_retailers_2017_pt.csv
SCHOOLS_PT.shp
data/Schools/SCHOOLS_PT.shp


In [5]:
for key, value in data.items():
    print(key)  

FARMERSMARKETS_PT.shp
mapc.food_retailers_2017_pt.csv
SCHOOLS_PT.shp


### Farmer's Market

In [6]:
df = data['FARMERSMARKETS_PT.shp']
df = df[['NAME','ADDR_1','COUPONS','LONGITUDE','LATITUDE','coords','TOWN']]
df['buisness_type'] = 'Food access'

#directory = os.path.dirname(os.path.abspath('dataset_clean'))
#os.path.join(directory,'farmers_sanitized.csv')
df[df['TOWN'] == 'Boston'].to_csv('farmer_sanitized.csv')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['buisness_type'] = 'Food access'


### Food retailers

In [7]:
df = data['mapc.food_retailers_2017_pt.csv']
df['buisness_type'] = 'Food access'
df = df[['name','municipal','zipcode','address','latitude','longitude','buisness_type']]
df[df['municipal'] == 'Boston'].to_csv('food_retailers_sanitized.csv')

In [8]:
directory = os.path.dirname(os.path.abspath('CS506Final'))
os.path.join(directory,'farmers_sanitized.csv')

'C:\\Users\\Darren Liu\\Dropbox\\BU\\CS506 Comp Tools for Data Sci\\Final Workspace\\farmers_sanitized.csv'

## Schools

In [9]:
df = data['SCHOOLS_PT.shp']
df = df[df['TOWN'] == 'BOSTON']
df['buisness_type'] = 'education'
df[['NAME','ZIPCODE','coords','buisness_type']].to_csv('schools_sanitized.csv')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['buisness_type'] = 'education'


## Don't worry about anything below this

In [10]:
shape = shapefile.Reader("data/Farmers_market/FARMERSMARKETS_PT.shp")
read_shapefile(shape)

Unnamed: 0,MARKET_ID,NAME,TYPE,ADDR_1,ADDR_2,TOWN,ZIP_CODE,DAY_TIME,DATES,UPDATE_DAT,YEAR_START,WEBSITE,EBT,WIC_CVV,COUPONS,LONGITUDE,LATITUDE,coords
0,606.0,Sustainable Nantucket/Wednesday,Farmers Markets,113 Pleasant St.,Next to Glidden's Seafood,Nantucket,02554,"Wednesday, 3:30 pm - 6:30 pm",July 6 to September 14,2016,2016,http://www.sustainablenantucket.org,EBT-SNAP Accepted,,WIC & Senior Coupons Accepted,-70.09361,41.27301,"[[317843.0842999965, 781295.2232000008]]"
1,607.0,Holden/Market on Main at Jed's,Farmers Markets,450 Main St.,Jed's Hardware and Garden,Holden,01520,"Friday, 3:30 pm - 7:00 pm",May 20 to October 28,2016,2016,,,,,-71.83454,42.33213,"[[172428.67920000106, 898013.9431999996]]"
2,535.0,West Newton,Farmers Markets,Elm Street,off Washington Street,West Newton,02465,"Saturday, 10:00 am - 2:00 pm",June 18 to October 8,2016,2014,http://www.newtonma.gov/gov/parks,EBT-SNAP Accepted,,WIC & Senior Coupons Accepted,-71.22932,42.34906,"[[222302.31149999797, 899875.7833000012]]"
3,538.0,Plainville,Farmers Markets,200 South Street,Old Wood School,Plainville,02762,"Sunday, 10:00 am - 2:00 pm",May 25 to November 2,2015,2014,http://www.plainvillefarmersmarket.com/,EBT-SNAP Accepted,,WIC & Senior Coupons Accepted,-71.33843,42.00941,"[[213383.81689999998, 862127.1283]]"
4,539.0,Worcester Art Museum,Farmers Markets,Lancaster St.,Worcester Art Museum,Worcester,01609,"Saturdays, 10:00 am - 1:00 pm",July 11 to August 29,2015,2014,,,,WIC & Senior Coupons Accepted,-71.80202,42.27328,"[[175085.67759999633, 891467.1953999996]]"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
296,578.0,Dorchester/Carney Hospital,Winter Markets,2100 Dorchester Ave.,"Carney Hospital, by Seton Building Entrance",Boston,02124,"Wednesday, 8:00 am - Noon",Year- Round,2015,2015,,EBT-SNAP Accepted,,,-71.06634,42.27898,"[[235770.3096999973, 892147.1444000006]]"
297,590.0,Mattapoisett,Farmers Markets,57 Fairhaven Rd.,Knights of Columbus Hall,Mattapoisett,02739,"Wednesday, 3:00 pm - 7:00 pm",Year-Round,2015,2015,,,,WIC & Senior Coupons Accepted,-70.82973,41.65840,"[[255828.3222000003, 823347.6334000006]]"
298,591.0,Mattapoisett Winter,Winter Markets,57 Fairhaven Rd.,Knights of Columbus Hall,Mattapoisett,02739,"Wednesday, 3:00 pm - 7:00 pm",Year-Round,2015,2015,,,,,-70.82973,41.65840,"[[255828.3222000003, 823347.6334000006]]"
299,592.0,Southbridge/Big Bunny,Farmers Markets,942 Main St.,Big Bunny Market,Worcester,01550,"Saturday, 9:00 am - 2:00 pm",May 28 to October 8,2016,2015,http://www.facebook.com/Big-Bunny-Farmers-Mark...,EBT-SNAP Accepted,,WIC & Senior Coupons Accepted,-72.04777,42.08311,"[[154677.76370000094, 870445.8894000016]]"


In [11]:
shape = shapefile.Reader(os.path.abspath("Farmers_market\FARMERSMARKETS_PT.shp"))

ShapefileException: Unable to open C:\Users\Darren Liu\Dropbox\BU\CS506 Comp Tools for Data Sci\Final Workspace\Farmers_market\FARMERSMARKETS_PT.dbf or C:\Users\Darren Liu\Dropbox\BU\CS506 Comp Tools for Data Sci\Final Workspace\Farmers_market\FARMERSMARKETS_PT.shp.

In [None]:
os.path.abspath("Farmers_market\FARMERSMARKETS_PT.shp")

In [None]:
open("Farmers_market/FARMERSMARKETS_PT.shp")