# Exploring, filtering, grouping and viewing data in pandas (Python)
Demonstration notebook, using UK Police data on street-level crime in the London Metropolitan region, from https://data.police.uk/


In [None]:
import pandas as pd
import geopandas as gpd

# Read and inspect data
You can read in data in a variety of formats, from Excel, CSV, JSON, SQL, and more. The sample data here is Metropolitan Police cases recorded for Feb 2022, in CSV format.

In [None]:
# File URL
file = 'data/2022-09-metropolitan-street.csv'
# Create dataframe (df)
df = pd.read_csv(file)

In [None]:
df.info()

# Selecting and filtering data

Create a new column with the Borough name, so that we can select rows by borough. This slices the name of the borough from the Lower Super Output Area (LSOA) column.

In [None]:
#Create a Borough-column by slicing the number (last 5 characters) off the end of the LSOA name:
df['Borough'] = df['LSOA name'].str.slice(0, -5)

Now that we know that the dataset includes incidents that took place outside of London, let's create a **filter** to select only the London Boroughs in the dataset.

### Select with isin()
Select rows using a list of values to include.

In [None]:
#List of London Boroughs, plus City of London
LB_list = ['Barking and Dagenham', 'Barnet', 'Bexley', 'Brent', 'Bromley','Camden', 'City of London', 'Croydon', 'Ealing', 'Enfield', 'Greenwich', 'Hackney', 'Hammersmith and Fulham', 'Haringey',
       'Harrow', 'Havering', 'Hillingdon', 'Hounslow', 'Islington', 'Kensington and Chelsea', 'Kingston upon Thames', 'Lambeth',
       'Lewisham','Merton', 'Newham', 'Redbridge', 'Richmond upon Thames', 'Southwark', 'Sutton',
       'Tower Hamlets', 'Waltham Forest', 'Wandsworth', 'Westminster']
# Filter the dataframe to include only names in the list:
df = df[df['Borough'].isin(LB_list)]

In [None]:
# Filter by Borough
#df = df[df['Borough'] == 'Westminster']

In [None]:
df['Crime type'].unique()

In [None]:
# Filter by crime type
df = df[df['Crime type'].str.contains('robbery|violence|theft', case=False)]

# Map with geopandas

In [None]:
# Drop rows with no location data
df.dropna(subset=['Longitude'], inplace=True)

In [None]:
df.shape

In [None]:
# Convert df into a geodataframe
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Longitude'], df['Latitude']))

In [None]:
# Set projection
gdf = gdf.set_crs(epsg=4326)

In [None]:
gdf.keys()

In [None]:
gdf.explore(tiles="CartoDB positron")

In [None]:
#gdf.explore('Crime type', cmap='tab20', tiles="CartoDB positron")