## Wrangling parks data

### Goals of the Task


Each row in the cultural spaces data set is a Seattle venue such as a cinema, art gallery, theatre or music venue.  

The aim of this task is to remove cultural spaces from the data that we don’t have location information against, plus any spaces that were closed before the period of our cycle hire data sample. Then, using the range of longitude and latitude found in the cycle hire data, you will eliminate cultural spaces that are not located near cycle hire stations. 

After this you will use an aggregation method in pandas to review the number of cultural spaces in your remaining data set, summarizing these venues by neighbourhood and cultural purpose. 

#### Step 1 : use pandas to read the cultural spaces csv as a data frame
- import pandas as pd 
- use pandas read_csv to create a culturalspaces data frame 
- ensure you are pointing at the correct file path for the data source (you may have to navigate in your notebook!) 


In [1]:
import pandas as pd

In [20]:
df = pd.read_csv('Seattle_Cultural_Space_Inventory.csv')

In [21]:
df

Unnamed: 0,Name,Phone,URL,Square Feet Total,Neighborhood,Organization Type,Dominant Discipline,Year of Occupation,Rent vs Own,Age of Current Building,...,Constituency over 50% one race,Specific Demographics and Community,Organization Leadership,Organization Artists,Closed Date,Closed?,Address,Location,Latitude,Longitude
0,10 Degrees,,www.10degreesseattle.com,750.0,Capitol Hill,N,Multi-use,2011.0,O,1970.0,...,,,,,,0.0,1312 E Union St,"(47.612905, -122.31477)",47.612905,-122.314770
1,Alki Community Center,,,,Alki,,Community Center,,,,...,,,,,,0.0,5817 SW Stevens St Seattle WA 98116,"(47.577271, -122.407364)",47.577271,-122.407364
2,Calypte Gallery,,,,,,Visual,,,,...,,,,,,0.0,1107 East Denny Way #A2 Seattle WA 98122,"(47.618587, -122.317818)",47.618587,-122.317818
3,Cassandria Blackmore Studio,,,,,,Visual,,,,...,,,,,,0.0,1115 East Pike St Seattle WA 98122,"(47.6139358, -122.3174302)",47.613936,-122.317430
4,Dynamic Sound Service,,,,,,Studios,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
939,Illumination Learning Studio,(206) 789-ARTS,www.ILSKids.com,2500.0,Phinney Ridge,N,Arts/Cultural Training or Education,2013.0,R,1990.0,...,,,,,,0.0,7720 Greenwood Ave N #101 Seattle WA 98103,"(47.68528, -122.354912)",47.685280,-122.354912
940,Serafina,,,,,,Music,,,,...,,,,,,0.0,"2043 Eastlake Ave E Seattle, WA 98102","(47.63803, -122.325924)",47.638030,-122.325924
941,Eight and Sand Gallery,(206) 304-0394,www.eightandsand.com,132.0,Georgetown,N,Visual,2013.0,R,1901.0,...,,,,,,0.0,5840 Airport Way S Suite 212 Seattle WA 98108,"(47.550962, -122.318612)",47.550962,-122.318612
942,University of Washington Mary Gates Hall,,,,,,Arts/Cultural Training or Education,,,,...,,,,,,0.0,Seattle WA 98195,"(47.657048, -122.305271)",47.657048,-122.305271


#### Step 2 : drop unusuable rows

- first, using the pandas dropna() method, remove from the culturalspaces data frame any rows which have null in either the Longitude or Latitude columns 
- next, drop rows of culturalspaces which were closed prior to the first date in our cycle hire data (Oct 2014) using any pandas filter approach 
- reset the index of your culturalspaces data frame after dropping rows 


https://www.geeksforgeeks.org/ways-to-filter-pandas-dataframe-by-column-values/

In [22]:
df = df.dropna(axis=0, subset=['Latitude', 'Longitude'])

In [28]:
df[df['Closed?'] ==1.0]

Unnamed: 0,Name,Phone,URL,Square Feet Total,Neighborhood,Organization Type,Dominant Discipline,Year of Occupation,Rent vs Own,Age of Current Building,...,Constituency over 50% one race,Specific Demographics and Community,Organization Leadership,Organization Artists,Closed Date,Closed?,Address,Location,Latitude,Longitude
26,Capitol Club,,,,,,Music,,,,...,,,,,2013-09-14T00:00:00.000,1.0,414 E Pine St Seattle WA 98122,"(47.6153941, -122.3260485)",47.615394,-122.326048
48,Bop Street Records,,,,,,Music,,,,...,,,,,2020-06-01T00:00:00.000,1.0,2220 NW Market St Seattle WA 98107,"(47.66909, -122.385796)",47.66909,-122.385796
82,Heartland (CLOSED),,,,,,Music,,,,...,,,,,,1.0,5306 Roosevelt Way NE Seattle WA 98105,"(47.66739015, -122.3170984)",47.66739,-122.317098
120,Washington Ensemble Theatre (CLOSED),(206) 325-5105,www.washingtonensemble.org,800.0,Capitol Hill,Y,Performance,2003.0,R,,...,,,,,2014-07-31T00:00:00.000,1.0,608 19th Ave E Seattle WA 98112,"(47.6247086, -122.307185)",47.624709,-122.307185
174,Theater Schmeater,(206) 324-5801,www.schmeater.org,3000.0,Capitol Hill,Y,Performance,,R,2008.0,...,,,,,2013-11-01T00:00:00.000,1.0,1500 Summit Ave Seattle WA 98122,"(47.61344, -122.34325)",47.61344,-122.34325
182,Barnes & Noble - Downtown,,,,Downtown,,Literary,,,,...,,,,,,1.0,600 Pine St Suite 107 Seattle WA 98101,"(47.61287415, -122.3352762)",47.612874,-122.335276
190,Kind Hand Studio (CLOSED),(206) 618-1447,www.khspottery.com,158.0,Ballard,N,Arts/Cultural Training or Education,2012.0,R,,...,,,,,2013-12-01T00:00:00.000,1.0,4775 Ballard Ave NW Seattle WA 98107,"(47.6642404, -122.3810653)",47.66424,-122.381065
207,Landmark Guild 45th,,,,,,Cinema,,,,...,,,,,2017-06-01T00:00:00.000,1.0,2115 N 45th St Seattle WA 98103,"(47.66112015, -122.3330128)",47.66112,-122.333013
262,Room 104 Gallery (CLOSED) in Tashiro Kaplan,,,,,,Visual,,,,...,,,,,2014-08-01T00:00:00.000,1.0,306 S. Washington St #104 Seattle WA 98104,"(47.6010201, -122.3298371)",47.60102,-122.329837
276,Teatro ZinZanni,(206) 802-0015,www.zinzanni.com/seattle,28000.0,Uptown,Y,Performance,2006.0,R,2006.0,...,,,,,2017-03-05T00:00:00.000,1.0,222 Mercer St Seattle WA 98109,"(47.6249259, -122.3518443)",47.624926,-122.351844


In [30]:
df['Closed Date']=pd.to_datetime(df['Closed Date'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Closed Date']=pd.to_datetime(df['Closed Date'])


#### Step 3 : create a data frame from the cycle hire station data to get the ranges of longitude and latitude 

- use pandas read_csv to create a second data frame from seattle_cycles_station.csv file 
- use max() and min() to identify the range of longitude and latitude values in the cycle station data 

In [32]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 912 entries, 0 to 943
Data columns (total 37 columns):
 #   Column                                                       Non-Null Count  Dtype         
---  ------                                                       --------------  -----         
 0   Name                                                         912 non-null    object        
 1   Phone                                                        460 non-null    object        
 2   URL                                                          463 non-null    object        
 3   Square Feet Total                                            447 non-null    float64       
 4   Neighborhood                                                 496 non-null    object        
 5   Organization Type                                            484 non-null    object        
 6   Dominant Discipline                                          873 non-null    object        
 7   Year of Occupatio

In [34]:
df[df['Closed?'] ==1.0]

Unnamed: 0,Name,Phone,URL,Square Feet Total,Neighborhood,Organization Type,Dominant Discipline,Year of Occupation,Rent vs Own,Age of Current Building,...,Constituency over 50% one race,Specific Demographics and Community,Organization Leadership,Organization Artists,Closed Date,Closed?,Address,Location,Latitude,Longitude
26,Capitol Club,,,,,,Music,,,,...,,,,,2013-09-14,1.0,414 E Pine St Seattle WA 98122,"(47.6153941, -122.3260485)",47.615394,-122.326048
48,Bop Street Records,,,,,,Music,,,,...,,,,,2020-06-01,1.0,2220 NW Market St Seattle WA 98107,"(47.66909, -122.385796)",47.66909,-122.385796
82,Heartland (CLOSED),,,,,,Music,,,,...,,,,,NaT,1.0,5306 Roosevelt Way NE Seattle WA 98105,"(47.66739015, -122.3170984)",47.66739,-122.317098
120,Washington Ensemble Theatre (CLOSED),(206) 325-5105,www.washingtonensemble.org,800.0,Capitol Hill,Y,Performance,2003.0,R,,...,,,,,2014-07-31,1.0,608 19th Ave E Seattle WA 98112,"(47.6247086, -122.307185)",47.624709,-122.307185
174,Theater Schmeater,(206) 324-5801,www.schmeater.org,3000.0,Capitol Hill,Y,Performance,,R,2008.0,...,,,,,2013-11-01,1.0,1500 Summit Ave Seattle WA 98122,"(47.61344, -122.34325)",47.61344,-122.34325
182,Barnes & Noble - Downtown,,,,Downtown,,Literary,,,,...,,,,,NaT,1.0,600 Pine St Suite 107 Seattle WA 98101,"(47.61287415, -122.3352762)",47.612874,-122.335276
190,Kind Hand Studio (CLOSED),(206) 618-1447,www.khspottery.com,158.0,Ballard,N,Arts/Cultural Training or Education,2012.0,R,,...,,,,,2013-12-01,1.0,4775 Ballard Ave NW Seattle WA 98107,"(47.6642404, -122.3810653)",47.66424,-122.381065
207,Landmark Guild 45th,,,,,,Cinema,,,,...,,,,,2017-06-01,1.0,2115 N 45th St Seattle WA 98103,"(47.66112015, -122.3330128)",47.66112,-122.333013
262,Room 104 Gallery (CLOSED) in Tashiro Kaplan,,,,,,Visual,,,,...,,,,,2014-08-01,1.0,306 S. Washington St #104 Seattle WA 98104,"(47.6010201, -122.3298371)",47.60102,-122.329837
276,Teatro ZinZanni,(206) 802-0015,www.zinzanni.com/seattle,28000.0,Uptown,Y,Performance,2006.0,R,2006.0,...,,,,,2017-03-05,1.0,222 Mercer St Seattle WA 98109,"(47.6249259, -122.3518443)",47.624926,-122.351844


In [38]:
closed_places = df[df['Closed Date'] < '2014-09-30'].index

In [39]:
df.drop(closed_places, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.drop(closed_places, inplace=True)


In [40]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 903 entries, 0 to 943
Data columns (total 37 columns):
 #   Column                                                       Non-Null Count  Dtype         
---  ------                                                       --------------  -----         
 0   Name                                                         903 non-null    object        
 1   Phone                                                        457 non-null    object        
 2   URL                                                          460 non-null    object        
 3   Square Feet Total                                            444 non-null    float64       
 4   Neighborhood                                                 493 non-null    object        
 5   Organization Type                                            481 non-null    object        
 6   Dominant Discipline                                          864 non-null    object        
 7   Year of Occupatio

#### step 4: drop rows which are outside of the cycle hire data range 

- use any pandas filter method to retain rows in the culturalspaces dataframe which are inside the relevant long/lat range of the cycle hire stations 
- reset the index of your culturalspaces dataframe after dropping rows 

#### step 5: simplify and fill gaps in location using python text analysis methods 

this data is quite text heavy, so you will now use methods you encountered in topic 8 to process the data 

- simplify the cultural space category by using any  pandas string method to keep the first word of the Dominant Discipline column, creating a new column 'purpose'
(eg 'Arts/Cultural Training or Education' will become simply 'Arts' in the purpose column )

- using a similar method, simplify the neighbourhood column, creating a neighbourhood_clean column, and amend any typos you see 
(eg 'University District and Laurelhurst/Sand Point' will be classed as 'University District') 

- fill in any gaps in the neighbourhood_clean column as 'unknown neighbourhood'

#### step 6: summarise the cultural spaces by neighbourhood and purpose 

- use the pandas group_by() method to summarise the number of spaces by (known) neighbourhood and cultural purpose 