# An Open Data case study:  Hot-spots for Arrests in LA over time


## Data acquisition

Many governments use <a href="https://www.tylertech.com/products/socrata/data-platform" target="_blank">socrata</a> as their platform to serve data to the public. 
<img src="../images/socrata.png" width=600>

<table>
    <tr>
        <td><a href="https://opendata.cityofnewyork.us/" target="_blank"><img src="images/ny.png" width=400></a></td>
        <td><a href="https://datasf.org/opendata/" target="_blank"><img src="images/sf.png" width=400></a></td>
    </tr>
    <tr>
        <td><a href="https://data.cityofchicago.org/" target="_blank"><img src="images/ch.png" width=400></a></td>
        <td><a href="https://data.lacity.org/" target="_blank"><img src="images/la.png" width=400></a></td>
    </tr>
</table>

For this tutorial, we will look at LAPD's arrest data:

https://data.lacity.org/A-Safe-City/Arrest-Data-from-2020-to-Present/amvf-fr72

The <a href="https://dev.socrata.com/docs/endpoints.html" target="_blank">Socrata API</a> allows direct and real-time access to open data.

To access the data, we will use the `sodapy` library: https://github.com/xmunoz/sodapy

Instructions on how to use `sodapy` to access data for this dataset:

<a href="https://data.lacity.org/A-Safe-City/Arrest-Data-from-2020-to-Present/amvf-fr72" target="_blank"><img src="images/ladata.png"></a>

https://dev.socrata.com/foundry/data.lacity.org/amvf-fr72

### Question:
- What is the difference between exporting the data and using the API?

### It's time to start coding: importing libraries

Let's begin our python journey. First, we identify the libraries we will use, and import them into our project:
- `pandas`
- `plotly express` - [documentation](https://plotly.com/python/plotly-express/)
- `sodapy` - [documentation](https://github.com/xmunoz/sodapy)

In [None]:
import pandas as pd
import plotly.express as px
from sodapy import Socrata

### Creating a socrata client
Next, we acquire the data using the socrata API. Use the socrata documentation to grab the code syntax for our crime data.
- https://dev.socrata.com/foundry/data.lacity.org/amvf-fr72

In [None]:
# connect to the data portal
client = Socrata("data.lacity.org", None)

# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("amvf-fr72", limit=2000)

# Convert to pandas DataFrame
df = pd.DataFrame.from_records(results)

# print it with .sample, which gives you random rows
df.sample(2)

That's great! But what if you wanted something specific, like "all arrests in September, 2020?"

In [None]:
# add a "where" statement
results = client.get("amvf-fr72", 
                     limit = 10000, # putting an arbitrary high number (otherwise defaults to 1000)
                     where = "arst_date between '2020-09-01T00:00:00' and '2020-09-30T00:00:00'"
                    )

In [None]:
# Convert to pandas DataFrame
df = pd.DataFrame.from_records(results)

## Data Exploration and Analysis

In [None]:
# how many rows and columns?


In [None]:
# what fields and datatypes?


In [None]:
# what are the first 5 rows?


### Bar charts with plotly

Now, use plotly express to create a bar chart.
- https://plotly.com/python/bar-charts/

In [None]:
# a simple bar chart, putting date on the x-axis
px.bar(df,
       x='arst_date',
       title='LAPD Arrests by Charge Type in September, 2020'
      )

## Label axis

In [None]:
# add labels by providing a dict
px.bar(df,
       x='arst_date',
       title='LAPD Arrests by Charge Type in September, 2020',
       labels={'arst_date':'Arrest date','count':'Number of arrests'}
      )

Let's dig in further... what if we want to see the distribution of charge types by day?

In [None]:
# show me distinct value of charges
df.grp_description.unique()

In [None]:
# show me distinct value of charges
arrest_by_charge = df.grp_description.value_counts().reset_index()
arrest_by_charge

In [None]:
arrest_by_charge.columns=['charge','count']
arrest_by_charge

In [None]:
px.bar(arrest_by_charge,
       x='charge',
       y='count',
       title='LAPD Arrests by Charge Type in September, 2020')

Now it's your turn!

Create a [horizontal chart](https://plotly.com/python/horizontal-bar-charts/) for the same data.

### Stacked bar charts

What if you wanted to find out the distribution of crime types per day?

In [None]:
# show me how many arrests per day
df.groupby(['arst_date']).count()

In [None]:
# show me how many arrests per charge


In [None]:
# ok, group by date and charge, and let's get a count for each
df_grouped=df.groupby(['arst_date','grp_description']).count()[['rpt_id']]
df_grouped.head(50)

In [None]:
# flatten the multi-indexed dataframe
df_flat = df_grouped.reset_index()
df_flat

In [None]:
# make a bar chart
px.bar(df_flat,
       x='arst_date',
       y='rpt_id'
      )

In [None]:
# make a stacked bar chart
px.bar(df_flat,
       x='arst_date',
       y='rpt_id',
       color='grp_description' # this creates the "stack"
      )

Now it's your turn!

* Add a title
* Clean up the labels

## Data prep: subsetting your data

Let's go back to the original dataset.

In [None]:
df.info()

That's a lot of fields. Let's create a subset of the data with just the following fields:

- `arts_date`
- `age`
- `descent_cd`
- `grp_description`
- `lat`
- `lon`


In [None]:
# subset the data
df_mini = df[['arst_date','age','descent_cd','grp_description','lat','lon']].copy()
df_mini.head()

In [None]:
# get info for our subset data
df_mini.info()

Our `lat` and `lon` columns need to be of data type float. Let's convert them.

In [None]:
# convert lat/lon's to floats
df_mini['lat'] = df_mini['lat'].astype(float)
df_mini['lon'] = df_mini['lon'].astype(float)
df_mini.info()

What happens if we create a scatter plot, placing `lon` in the x-axis `lat` in the y-axis?

In [None]:
px.scatter(df_mini,
           x='lon',
           y='lat'
          )

## Data visualization: Mapping with plotly
Plotly has support for a mapbox slippy map. Have fun with this, and change the `mapbox_style` attribute to any of the following:

* `open-street-map`
* `white-bg`
* `carto-positron`
* `carto-darkmatter`
* `stamen-terrain`
* `stamen-toner`
* `stamen-watercolor`


In [None]:
fig = px.scatter_mapbox(df_mini,
                        lat='lat',
                        lon='lon',
                        mapbox_style="stamen-terrain")
fig.show()

In [None]:
# before you run this cell, what do you think it will produce?
fig = px.scatter_mapbox(df_mini, 
                        lat="lat", 
                        lon="lon", 
                        color="descent_cd"
                       )
fig.update_layout(mapbox_style="carto-darkmatter")

fig.show()

In [None]:
# before you run this cell, what do you think it will produce?
fig = px.scatter_mapbox(df_mini, 
                        lat="lat", 
                        lon="lon", 
                        color="descent_cd",
                        animation_frame = 'arst_date',
                       )
fig.update_layout(mapbox_style="carto-darkmatter")

fig.show()

## Advanced visualizations: 3D mapping
- https://kepler.gl/

<img src="images/kepler.png" width=800>

Import the keplergl library.

In [None]:
from keplergl import KeplerGl

Create a default kepler map.

In [None]:
map = KeplerGl(height=600,width=800)
map

Add our `df_mini` as a data layer on the map. Within the kepler widget, manipulate the map 
- change points to grid cells or hexbins
- change the color palette so that hot spots are red
- change the color scale from `quantile` to `quansize`
- add height to your data
- switch to 3D map view
- adjust the height of the data cells
- add `arst_date` as a filter

In [None]:
map.add_data(data=df_mini,name='arrests')

### Saving your kepler map as an html page

In [None]:
map.save_to_html(file_name='la_arrests.html',read_only=True)

<div class="alert alert-info">
Now it's your turn!

* Find a socrata based open dataset
* Use the sodapy library and import it
* Conduct data exploration and analysis
* Create two or more plots using the plotly express library
* Create map visualization using plotly and/or the KeplerGL libraries

</div>