## Wrangling parks data

### Goals of the Task


Each row in the cultural spaces data set is a Seattle venue such as a cinema, art gallery, theatre or music venue.  

The aim of this task is to remove cultural spaces from the data that we don’t have location information against, plus any spaces that were closed before the period of our cycle hire data sample. Then, using the range of longitude and latitude found in the cycle hire data, you will eliminate cultural spaces that are not located near cycle hire stations. 

After this you will use an aggregation method in pandas to review the number of cultural spaces in your remaining data set, summarizing these venues by neighbourhood and cultural purpose. 

#### Step 1 : use pandas to read the cultural spaces csv as a data frame
- import pandas as pd 
- use pandas read_csv to create a culturalspaces data frame 
- ensure you are pointing at the correct file path for the data source (you may have to navigate in your notebook!) 


In [1]:
import pandas as pd

In [20]:
df = pd.read_csv('Seattle_Cultural_Space_Inventory.csv')

In [21]:
df

Unnamed: 0,Name,Phone,URL,Square Feet Total,Neighborhood,Organization Type,Dominant Discipline,Year of Occupation,Rent vs Own,Age of Current Building,...,Constituency over 50% one race,Specific Demographics and Community,Organization Leadership,Organization Artists,Closed Date,Closed?,Address,Location,Latitude,Longitude
0,10 Degrees,,www.10degreesseattle.com,750.0,Capitol Hill,N,Multi-use,2011.0,O,1970.0,...,,,,,,0.0,1312 E Union St,"(47.612905, -122.31477)",47.612905,-122.314770
1,Alki Community Center,,,,Alki,,Community Center,,,,...,,,,,,0.0,5817 SW Stevens St Seattle WA 98116,"(47.577271, -122.407364)",47.577271,-122.407364
2,Calypte Gallery,,,,,,Visual,,,,...,,,,,,0.0,1107 East Denny Way #A2 Seattle WA 98122,"(47.618587, -122.317818)",47.618587,-122.317818
3,Cassandria Blackmore Studio,,,,,,Visual,,,,...,,,,,,0.0,1115 East Pike St Seattle WA 98122,"(47.6139358, -122.3174302)",47.613936,-122.317430
4,Dynamic Sound Service,,,,,,Studios,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
939,Illumination Learning Studio,(206) 789-ARTS,www.ILSKids.com,2500.0,Phinney Ridge,N,Arts/Cultural Training or Education,2013.0,R,1990.0,...,,,,,,0.0,7720 Greenwood Ave N #101 Seattle WA 98103,"(47.68528, -122.354912)",47.685280,-122.354912
940,Serafina,,,,,,Music,,,,...,,,,,,0.0,"2043 Eastlake Ave E Seattle, WA 98102","(47.63803, -122.325924)",47.638030,-122.325924
941,Eight and Sand Gallery,(206) 304-0394,www.eightandsand.com,132.0,Georgetown,N,Visual,2013.0,R,1901.0,...,,,,,,0.0,5840 Airport Way S Suite 212 Seattle WA 98108,"(47.550962, -122.318612)",47.550962,-122.318612
942,University of Washington Mary Gates Hall,,,,,,Arts/Cultural Training or Education,,,,...,,,,,,0.0,Seattle WA 98195,"(47.657048, -122.305271)",47.657048,-122.305271


In [43]:
df = df.dropna(axis=0, subset=['Latitude', 'Longitude'])

In [44]:
df[df['Closed?'] ==1.0]

Unnamed: 0,Name,Phone,URL,Square Feet Total,Neighborhood,Organization Type,Dominant Discipline,Year of Occupation,Rent vs Own,Age of Current Building,...,Constituency over 50% one race,Specific Demographics and Community,Organization Leadership,Organization Artists,Closed Date,Closed?,Address,Location,Latitude,Longitude
48,Bop Street Records,,,,,,Music,,,,...,,,,,2020-06-01,1.0,2220 NW Market St Seattle WA 98107,"(47.66909, -122.385796)",47.66909,-122.385796
82,Heartland (CLOSED),,,,,,Music,,,,...,,,,,NaT,1.0,5306 Roosevelt Way NE Seattle WA 98105,"(47.66739015, -122.3170984)",47.66739,-122.317098
182,Barnes & Noble - Downtown,,,,Downtown,,Literary,,,,...,,,,,NaT,1.0,600 Pine St Suite 107 Seattle WA 98101,"(47.61287415, -122.3352762)",47.612874,-122.335276
207,Landmark Guild 45th,,,,,,Cinema,,,,...,,,,,2017-06-01,1.0,2115 N 45th St Seattle WA 98103,"(47.66112015, -122.3330128)",47.66112,-122.333013
276,Teatro ZinZanni,(206) 802-0015,www.zinzanni.com/seattle,28000.0,Uptown,Y,Performance,2006.0,R,2006.0,...,,,,,2017-03-05,1.0,222 Mercer St Seattle WA 98109,"(47.6249259, -122.3518443)",47.624926,-122.351844
283,Bell & Reed,,,,Pioneer Square,,Service/Supply,,,,...,,,,,NaT,1.0,15 Prefontaine Pl. S Suite 510 Seattle WA 98104,"(47.601719, -122.330269)",47.601719,-122.330269
297,Macha Monkey Productions,,,,,,Performance,,,,...,,,,,2017-05-31,1.0,2222 2nd Ave #200 Seattle WA 98121,"(47.613483, -122.344756)",47.613483,-122.344756
342,Fetherston Gallery (CLOSED),,,,,,Visual,,,,...,,,,,2016-03-14,1.0,"5701 6th Ave S, Seattle WA 98108","(47.5514664, -122.3283134)",47.551466,-122.328313
346,Eclectic Theater Company,(206) 679-3271,www.eclectictheatercompany.org,900.0,Capitol Hill,Y,Performance,2006.0,R,1910.0,...,,,,,2017-10-31,1.0,1214 10th Ave Seattle WA 98122,"(47.6126561, -122.3191571)",47.612656,-122.319157
372,Seattle Weekly,,,,Pioneer Square,,Service/Supply,,,,...,,,,,2017-10-25,1.0,307 3rd Ave S Second Floor Seattle WA 98104,"(47.599701, -122.33065)",47.599701,-122.33065


In [45]:
df['Closed Date']=pd.to_datetime(df['Closed Date'])

In [46]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 903 entries, 0 to 943
Data columns (total 37 columns):
 #   Column                                                       Non-Null Count  Dtype         
---  ------                                                       --------------  -----         
 0   Name                                                         903 non-null    object        
 1   Phone                                                        457 non-null    object        
 2   URL                                                          460 non-null    object        
 3   Square Feet Total                                            444 non-null    float64       
 4   Neighborhood                                                 493 non-null    object        
 5   Organization Type                                            481 non-null    object        
 6   Dominant Discipline                                          864 non-null    object        
 7   Year of Occupatio

In [47]:
df[df['Closed?'] ==1.0]

Unnamed: 0,Name,Phone,URL,Square Feet Total,Neighborhood,Organization Type,Dominant Discipline,Year of Occupation,Rent vs Own,Age of Current Building,...,Constituency over 50% one race,Specific Demographics and Community,Organization Leadership,Organization Artists,Closed Date,Closed?,Address,Location,Latitude,Longitude
48,Bop Street Records,,,,,,Music,,,,...,,,,,2020-06-01,1.0,2220 NW Market St Seattle WA 98107,"(47.66909, -122.385796)",47.66909,-122.385796
82,Heartland (CLOSED),,,,,,Music,,,,...,,,,,NaT,1.0,5306 Roosevelt Way NE Seattle WA 98105,"(47.66739015, -122.3170984)",47.66739,-122.317098
182,Barnes & Noble - Downtown,,,,Downtown,,Literary,,,,...,,,,,NaT,1.0,600 Pine St Suite 107 Seattle WA 98101,"(47.61287415, -122.3352762)",47.612874,-122.335276
207,Landmark Guild 45th,,,,,,Cinema,,,,...,,,,,2017-06-01,1.0,2115 N 45th St Seattle WA 98103,"(47.66112015, -122.3330128)",47.66112,-122.333013
276,Teatro ZinZanni,(206) 802-0015,www.zinzanni.com/seattle,28000.0,Uptown,Y,Performance,2006.0,R,2006.0,...,,,,,2017-03-05,1.0,222 Mercer St Seattle WA 98109,"(47.6249259, -122.3518443)",47.624926,-122.351844
283,Bell & Reed,,,,Pioneer Square,,Service/Supply,,,,...,,,,,NaT,1.0,15 Prefontaine Pl. S Suite 510 Seattle WA 98104,"(47.601719, -122.330269)",47.601719,-122.330269
297,Macha Monkey Productions,,,,,,Performance,,,,...,,,,,2017-05-31,1.0,2222 2nd Ave #200 Seattle WA 98121,"(47.613483, -122.344756)",47.613483,-122.344756
342,Fetherston Gallery (CLOSED),,,,,,Visual,,,,...,,,,,2016-03-14,1.0,"5701 6th Ave S, Seattle WA 98108","(47.5514664, -122.3283134)",47.551466,-122.328313
346,Eclectic Theater Company,(206) 679-3271,www.eclectictheatercompany.org,900.0,Capitol Hill,Y,Performance,2006.0,R,1910.0,...,,,,,2017-10-31,1.0,1214 10th Ave Seattle WA 98122,"(47.6126561, -122.3191571)",47.612656,-122.319157
372,Seattle Weekly,,,,Pioneer Square,,Service/Supply,,,,...,,,,,2017-10-25,1.0,307 3rd Ave S Second Floor Seattle WA 98104,"(47.599701, -122.33065)",47.599701,-122.33065


In [59]:
closed_places = df[df['Closed Date'] < '2014-09-30'].index

In [60]:
df.drop(closed_places, inplace=True)

In [62]:
df.drop(['index'], axis='columns', inplace=True)

In [52]:
df = df.reset_index()

In [65]:
df

Unnamed: 0,Name,Phone,URL,Square Feet Total,Neighborhood,Organization Type,Dominant Discipline,Year of Occupation,Rent vs Own,Age of Current Building,...,Constituency over 50% one race,Specific Demographics and Community,Organization Leadership,Organization Artists,Closed Date,Closed?,Address,Location,Latitude,Longitude
0,10 Degrees,,www.10degreesseattle.com,750.0,Capitol Hill,N,Multi-use,2011.0,O,1970.0,...,,,,,NaT,0.0,1312 E Union St,"(47.612905, -122.31477)",47.612905,-122.314770
1,Alki Community Center,,,,Alki,,Community Center,,,,...,,,,,NaT,0.0,5817 SW Stevens St Seattle WA 98116,"(47.577271, -122.407364)",47.577271,-122.407364
2,Calypte Gallery,,,,,,Visual,,,,...,,,,,NaT,0.0,1107 East Denny Way #A2 Seattle WA 98122,"(47.618587, -122.317818)",47.618587,-122.317818
3,Cassandria Blackmore Studio,,,,,,Visual,,,,...,,,,,NaT,0.0,1115 East Pike St Seattle WA 98122,"(47.6139358, -122.3174302)",47.613936,-122.317430
4,Eritrean Association in Greater Seattle,,,,,,Community Center,,,,...,,,,,NaT,0.0,1528 Valentine Pl S Seattle WA 98144,"(47.58879805, -122.3070771)",47.588798,-122.307077
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
898,Illumination Learning Studio,(206) 789-ARTS,www.ILSKids.com,2500.0,Phinney Ridge,N,Arts/Cultural Training or Education,2013.0,R,1990.0,...,,,,,NaT,0.0,7720 Greenwood Ave N #101 Seattle WA 98103,"(47.68528, -122.354912)",47.685280,-122.354912
899,Serafina,,,,,,Music,,,,...,,,,,NaT,0.0,"2043 Eastlake Ave E Seattle, WA 98102","(47.63803, -122.325924)",47.638030,-122.325924
900,Eight and Sand Gallery,(206) 304-0394,www.eightandsand.com,132.0,Georgetown,N,Visual,2013.0,R,1901.0,...,,,,,NaT,0.0,5840 Airport Way S Suite 212 Seattle WA 98108,"(47.550962, -122.318612)",47.550962,-122.318612
901,University of Washington Mary Gates Hall,,,,,,Arts/Cultural Training or Education,,,,...,,,,,NaT,0.0,Seattle WA 98195,"(47.657048, -122.305271)",47.657048,-122.305271


#### Step 2 : drop unusuable rows

- first, using the pandas dropna() method, remove from the culturalspaces data frame any rows which have null in either the Longitude or Latitude columns 
- next, drop rows of culturalspaces which were closed prior to the first date in our cycle hire data (Oct 2014) using any pandas filter approach 
- reset the index of your culturalspaces data frame after dropping rows 


https://www.geeksforgeeks.org/ways-to-filter-pandas-dataframe-by-column-values/

#### Step 3 : create a data frame from the cycle hire station data to get the ranges of longitude and latitude 

- use pandas read_csv to create a second data frame from seattle_cycles_station.csv file 
- use max() and min() to identify the range of longitude and latitude values in the cycle station data 

In [66]:
stn=pd.read_csv('Seattle_cycles_station.csv')

In [69]:
stn.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 58 entries, 0 to 57
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   station_id         58 non-null     object 
 1   name               58 non-null     object 
 2   lat                58 non-null     float64
 3   long               58 non-null     float64
 4   install_date       58 non-null     object 
 5   install_dockcount  58 non-null     int64  
 6   modification_date  17 non-null     object 
 7   current_dockcount  58 non-null     int64  
 8   decommission_date  4 non-null      object 
dtypes: float64(2), int64(2), object(5)
memory usage: 4.2+ KB


In [71]:
stn.describe()

Unnamed: 0,lat,long,install_dockcount,current_dockcount
count,58.0,58.0,58.0,58.0
mean,47.624796,-122.327242,17.586207,16.517241
std,0.019066,0.014957,3.060985,5.117021
min,47.598488,-122.35523,12.0,0.0
25%,47.613239,-122.338735,16.0,16.0
50%,47.618591,-122.328206,18.0,18.0
75%,47.627712,-122.316691,18.0,18.0
max,47.666145,-122.284119,30.0,26.0


In [89]:
stn['lat'].max()


47.666145

In [90]:
stn['lat'].min()

47.598488

In [91]:
stn['long'].max()

-122.284119

In [92]:
stn['long'].min()

-122.35523

In [93]:
stn

Unnamed: 0,station_id,name,lat,long,install_date,install_dockcount,modification_date,current_dockcount,decommission_date
0,BT-01,3rd Ave & Broad St,47.618418,-122.350964,10/13/2014,18,,18,
1,BT-03,2nd Ave & Vine St,47.615829,-122.348564,10/13/2014,16,,16,
2,BT-04,6th Ave & Blanchard St,47.616094,-122.341102,10/13/2014,16,,16,
3,BT-05,2nd Ave & Blanchard St,47.61311,-122.344208,10/13/2014,14,,14,
4,CBD-03,7th Ave & Union St,47.610731,-122.332447,10/13/2014,20,,20,
5,CBD-04,Union St & 4th Ave,47.609221,-122.335596,7/27/2015,18,,18,
6,CBD-05,1st Ave & Marion St,47.604058,-122.3358,10/13/2014,20,,20,
7,CBD-06,2nd Ave & Spring St,47.60595,-122.335768,10/13/2014,20,11/9/2015,18,
8,CBD-07,City Hall / 4th Ave & James St,47.603509,-122.330409,10/13/2014,20,,20,
9,CBD-13,2nd Ave & Pine St,47.610185,-122.339641,10/13/2014,18,,18,


In [102]:
lower_bound_long = stn['long'].min()
upper_bound_long = stn['long'].max()
lower_bound_lat = stn['lat'].min()
upper_bound_lat = stn['lat'].max()
cultural_spaces_inrange=df[(df['Longitude'] >= lower_bound_long) &
                           (df['Longitude'] <= upper_bound_long) &
                           (df['Latitude'] >= lower_bound_lat) &
                           (df['Latitude'] <= upper_bound_lat)]
cultural_spaces_inrange.reset_index(drop = True , inplace=True)




In [105]:
cultural_spaces_inrange.describe()

Unnamed: 0,Square Feet Total,Year of Occupation,Age of Current Building,Year Organization Founded,Number of Past Facilities,Stages and Theaters,Stage & Theater Seats,Gallery Square Feet,Available Parking,"Stability Index (5=very stable, 1=very uncertain)","Control Index (5=very in control, 1 = very out of control)",Closed?,Latitude,Longitude
count,233.0,238.0,224.0,236.0,233.0,233.0,200.0,192.0,193.0,239.0,16.0,432.0,488.0,488.0
mean,21240.690987,1995.504202,1939.540179,1983.792373,1.154506,1.0,186.18,2734.380208,22.487047,4.037657,3.75,0.05787,47.62064,-122.327429
std,57319.955515,24.169889,35.459295,30.911185,1.362107,6.334089,480.320985,16158.339501,104.593962,1.185807,1.238278,0.233769,0.020541,0.01457
min,90.0,1891.0,1889.0,1861.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,0.0,47.598572,-122.355169
25%,1500.0,1992.0,1910.0,1973.0,0.0,0.0,0.0,0.0,0.0,3.0,2.75,0.0,47.605016,-122.337033
50%,4000.0,2004.5,1927.0,1993.0,1.0,0.0,0.0,100.0,0.0,4.0,4.0,0.0,47.61414,-122.329938
75%,11300.0,2010.0,1962.0,2006.0,2.0,1.0,160.0,900.0,2.0,5.0,5.0,0.0,47.624036,-122.317245
max,400000.0,2020.0,2014.0,2020.0,7.0,96.0,3500.0,210000.0,1000.0,5.0,5.0,1.0,47.665938,-122.28545


#### step 4: drop rows which are outside of the cycle hire data range 

- use any pandas filter method to retain rows in the culturalspaces dataframe which are inside the relevant long/lat range of the cycle hire stations 
- reset the index of your culturalspaces dataframe after dropping rows 

#### step 5: simplify and fill gaps in location using python text analysis methods 

this data is quite text heavy, so you will now use methods you encountered in topic 8 to process the data 

- simplify the cultural space category by using any  pandas string method to keep the first word of the Dominant Discipline column, creating a new column 'purpose'
(eg 'Arts/Cultural Training or Education' will become simply 'Arts' in the purpose column )

- using a similar method, simplify the neighbourhood column, creating a neighbourhood_clean column, and amend any typos you see 
(eg 'University District and Laurelhurst/Sand Point' will be classed as 'University District') 

- fill in any gaps in the neighbourhood_clean column as 'unknown neighbourhood'

#### step 6: summarise the cultural spaces by neighbourhood and purpose 

- use the pandas group_by() method to summarise the number of spaces by (known) neighbourhood and cultural purpose 