<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Take-notice!" data-toc-modified-id="Take-notice!-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Take notice!</a></span></li><li><span><a href="#An-Open-Data-case-study:--Hot-spots-for-Arrests-in-LA-over-time" data-toc-modified-id="An-Open-Data-case-study:--Hot-spots-for-Arrests-in-LA-over-time-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>An Open Data case study:  Hot-spots for Arrests in LA over time</a></span><ul class="toc-item"><li><span><a href="#What-is-an-API?" data-toc-modified-id="What-is-an-API?-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>What is an API?</a></span></li><li><span><a href="#Mini-project:-Mapping-LA-metro-stops" data-toc-modified-id="Mini-project:-Mapping-LA-metro-stops-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Mini project: Mapping LA metro stops</a></span></li><li><span><a href="#Data-acquisition" data-toc-modified-id="Data-acquisition-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Data acquisition</a></span><ul class="toc-item"><li><span><a href="#Question:" data-toc-modified-id="Question:-2.3.1"><span class="toc-item-num">2.3.1&nbsp;&nbsp;</span>Question:</a></span></li><li><span><a href="#It's-time-to-start-coding:-importing-libraries" data-toc-modified-id="It's-time-to-start-coding:-importing-libraries-2.3.2"><span class="toc-item-num">2.3.2&nbsp;&nbsp;</span>It's time to start coding: importing libraries</a></span></li><li><span><a href="#Creating-a-socrata-client" data-toc-modified-id="Creating-a-socrata-client-2.3.3"><span class="toc-item-num">2.3.3&nbsp;&nbsp;</span>Creating a socrata client</a></span></li></ul></li><li><span><a href="#Import-data-based-on-a-query-string" data-toc-modified-id="Import-data-based-on-a-query-string-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Import data based on a query string</a></span></li><li><span><a href="#Data-Exploration-and-Analysis" data-toc-modified-id="Data-Exploration-and-Analysis-2.5"><span class="toc-item-num">2.5&nbsp;&nbsp;</span>Data Exploration and Analysis</a></span><ul class="toc-item"><li><span><a href="#Bar-charts-with-plotly" data-toc-modified-id="Bar-charts-with-plotly-2.5.1"><span class="toc-item-num">2.5.1&nbsp;&nbsp;</span>Bar charts with plotly</a></span></li></ul></li><li><span><a href="#Label-axis" data-toc-modified-id="Label-axis-2.6"><span class="toc-item-num">2.6&nbsp;&nbsp;</span>Label axis</a></span><ul class="toc-item"><li><span><a href="#Stacked-bar-charts" data-toc-modified-id="Stacked-bar-charts-2.6.1"><span class="toc-item-num">2.6.1&nbsp;&nbsp;</span>Stacked bar charts</a></span></li></ul></li><li><span><a href="#Data-prep:-subsetting-your-data" data-toc-modified-id="Data-prep:-subsetting-your-data-2.7"><span class="toc-item-num">2.7&nbsp;&nbsp;</span>Data prep: subsetting your data</a></span></li><li><span><a href="#Data-visualization:-Mapping-with-plotly" data-toc-modified-id="Data-visualization:-Mapping-with-plotly-2.8"><span class="toc-item-num">2.8&nbsp;&nbsp;</span>Data visualization: Mapping with plotly</a></span></li></ul></li><li><span><a href="#Create-a-function" data-toc-modified-id="Create-a-function-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Create a function</a></span><ul class="toc-item"><li><span><a href="#Bonus:-interactive-dropdowns" data-toc-modified-id="Bonus:-interactive-dropdowns-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Bonus: interactive dropdowns</a></span></li><li><span><a href="#Advanced-visualizations:-3D-mapping" data-toc-modified-id="Advanced-visualizations:-3D-mapping-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Advanced visualizations: 3D mapping</a></span><ul class="toc-item"><li><span><a href="#Saving-your-kepler-map-as-an-html-page" data-toc-modified-id="Saving-your-kepler-map-as-an-html-page-3.2.1"><span class="toc-item-num">3.2.1&nbsp;&nbsp;</span>Saving your kepler map as an html page</a></span></li></ul></li></ul></li></ul></div>

<div class="alert alert-danger">

<h1>Take notice!</h1>
<ul>
    <li>This class will be recorded</li>
</ul>
    
</div>

# An Open Data case study:  Hot-spots for Arrests in LA over time


## What is an API?

Some examples:

Mapping API's

- [Google Maps API](https://developers.google.com/maps/documentation/javascript/overview#maps_map_simple-javascript)
- [Leaflet](https://leafletjs.com/)
- [Mapbox](https://docs.mapbox.com/mapbox-gl-js/example/)

Data API's
- [Twitter API](https://developer.twitter.com/en)
- [Metro API](https://developer.metro.net/api/)


## Mini project: Mapping LA metro stops


Let's look at a python approach to working with the Metro API.

In [None]:
# libraries
import urllib.request, json 
import pandas as pd
import geopandas as gpd
import contextily as ctx
import matplotlib.pyplot as plt

In [None]:
# api url for metro stops
metro_url = 'https://api.metro.net/agencies/lametro/routes/2/stops/'

# call the api and bring the data in
with urllib.request.urlopen(metro_url) as url:
    data = json.loads(url.read().decode())

# convert the data to a dataframe
df = pd.json_normalize(data, 'items')
df

In [None]:
# convert df to gdf
# since data is in lat/lon's assign the crs to WGS84 (epsg:4326)
gdf = gpd.GeoDataFrame(df, 
                       crs='epsg:4326',
                       geometry=gpd.points_from_xy(df.longitude, df.latitude))

In [None]:
gdf.crs

In [None]:
# reproject to web mercator
gdf_web_mercator = gdf.to_crs(epsg=3857)

In [None]:
# map it
fig, ax = plt.subplots(figsize=(15,5))

gdf_web_mercator.plot(ax=ax, marker='s', color='red')

ax.axis('off')

ax.set_title('Metro Bus Route 20')

ctx.add_basemap(ax)

Nice. What makes this process **powerful**? And what are potential **pitfalls**?

## Data acquisition

Many governments use <a href="https://dev.socrata.com/" target="_blank">socrata</a> as their platform to serve data to the public. 
<img src="../images/socrata.png" width=600>

<table>
    <tr>
        <td><a href="https://opendata.cityofnewyork.us/" target="_blank"><img src="images/ny.png" width=400></a></td>
        <td><a href="https://datasf.org/opendata/" target="_blank"><img src="images/sf.png" width=400></a></td>
    </tr>
    <tr>
        <td><a href="https://data.cityofchicago.org/" target="_blank"><img src="images/ch.png" width=400></a></td>
        <td><a href="https://data.lacity.org/" target="_blank"><img src="images/la.png" width=400></a></td>
    </tr>
</table>

For this tutorial, we will look at LAPD's arrest data:

https://data.lacity.org/A-Safe-City/Arrest-Data-from-2020-to-Present/amvf-fr72

The <a href="https://dev.socrata.com/docs/endpoints.html" target="_blank">Socrata API</a> allows direct and real-time access to open data.

To access the data, we will use the `sodapy` library: https://github.com/xmunoz/sodapy

Instructions on how to use `sodapy` to access data for this dataset:

<a href="https://data.lacity.org/A-Safe-City/Arrest-Data-from-2020-to-Present/amvf-fr72" target="_blank"><img src="images/ladata.png"></a>

https://dev.socrata.com/foundry/data.lacity.org/amvf-fr72

### Question:
- What is the difference between exporting the data and using the API?

### It's time to start coding: importing libraries

Let's begin our python journey. First, we identify the libraries we will use, and import them into our project:
- `pandas`
- `plotly express` - [documentation](https://plotly.com/python/plotly-express/)
- `sodapy` - [documentation](https://github.com/xmunoz/sodapy)

*Notice that we will NOT be using geopandas! Don't worry, there will still be very rewarding maps in this session*

In [None]:
# for data wrangling
import pandas as pd

# for interactive plots
import plotly.express as px

# to import open data
from sodapy import Socrata

### Creating a socrata client
Next, we acquire the data using the socrata API. Use the socrata documentation to grab the code syntax for our crime data.
- https://dev.socrata.com/foundry/data.lacity.org/amvf-fr72

In [None]:
# connect to the data portal
client = Socrata("data.lacity.org", None)

# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("amvf-fr72", limit=2000)

# Convert to pandas DataFrame
df = pd.DataFrame.from_records(results)

# print it with .sample, which gives you random rows
df.sample(2)

## Import data based on a query string
That's great! But what if you wanted something specific, like "all arrests in November, 2020?"

In [None]:
# add a "where" statement
results = client.get("amvf-fr72", 
                     limit = 10000, # putting an arbitrary high number (otherwise defaults to 1000)
                     where = "arst_date between '2020-11-01T00:00:00' and '2020-11-30T00:00:00'"
                    )

In [None]:
# Convert to pandas DataFrame
df = pd.DataFrame.from_records(results)

## Data Exploration and Analysis

In [None]:
# how many rows and columns?


In [None]:
# what fields and datatypes?


In [None]:
# what are the first 5 rows?


### Bar charts with plotly

Now, use plotly express to create a bar chart.

- https://plotly.com/python/bar-charts/

What are the differences between matplotlib and plotly?

In [None]:
# a simple bar chart, putting date on the x-axis
px.bar(df,
       x='arst_date',
       title='LAPD Arrests by Charge Type in November, 2020'
      )

## Label axis

In plotly, you can relabel text by providing a dictionary as shown below:

In [None]:
# add labels by providing a dict
px.bar(df,
       x='arst_date',
       title='LAPD Arrests by Charge Type in November, 2020',
       labels={'arst_date':'Arrest date','count':'Number of arrests'}
      )

Let's dig in further... what if we want to see the distribution of charge types by day?

In [None]:
# show me distinct value of charges
df.grp_description.unique().tolist()

In [None]:
# show me distinct value of charges
arrest_by_charge = df.grp_description.value_counts().reset_index()
arrest_by_charge

In [None]:
# rename the columns
arrest_by_charge.columns=['charge','count']
arrest_by_charge

In [None]:
# plot it
px.bar(arrest_by_charge,
       x='charge',
       y='count',
       title='LAPD Arrests by Charge Type in November, 2020')

Now it's your turn!

Create a [horizontal chart](https://plotly.com/python/horizontal-bar-charts/) for the same data.

### Stacked bar charts

What if you wanted to find out the distribution of crime types per day?

In [None]:
# show me how many arrests per day
df.groupby(['arst_date']).rpt_id.count()

Why so many columns and numbers? 

The `groupby` function is very powerful. You can group by multiple columns. Here, we create a new variable to find arrest types for each day of the month. Also notice that we are only outputting the `rpt_id` column, the unique identifyer.

- [pandas groupby](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html)

In [None]:
# ok, group by date and charge, and let's get a count for each
df_grouped=df.groupby(['arst_date','grp_description']).count()[['rpt_id']]
df_grouped.head(50)

In [None]:
# ok, group by date and charge, and let's get a count for each
df.groupby(['arst_date','grp_description']).rpt_id.count().to_frame()

The result is a multi-level, multi-index dataframe! The `rpt_id` column in this output represents "how many records that were not null." Since we know that the `rpt_id` column has no nulls, this represents the count for each crime type per day.

In [None]:
# flatten the multi-index, multi-level dataframe
df_flat = df_grouped.reset_index()
df_flat

In [None]:
# rename the rpt_id column to count
df_flat = df_flat.rename(columns={'rpt_id':'count'})

In [None]:
# make a bar chart
px.bar(df_flat,
       x='arst_date',
       y='count'
      )

In [None]:
# make a stacked bar chart
px.bar(df_flat,
       x='arst_date',
       y='count',
       color='grp_description' # this creates the "stack"
      )

Now it's your turn!

* Add a title
* Clean up the labels (arst_date, grp_description, etc)

## Data prep: subsetting your data

Let's go back to the original dataset.

In [None]:
df.info()

That's a lot of fields. Let's create a subset of the data with just the following fields:

- `arst_date`
- `age`
- `descent_cd`
- `grp_description`
- `lat`
- `lon`


In [None]:
# subset the data below (don't forget to add .copy at the end)
df_mini = 

In [None]:
# get info for our subset data
df_mini.info()

Our `lat` and `lon` columns need to be of data type float. Let's convert them.

In [None]:
# convert lat/lon's to floats
df_mini['lat'] = df_mini['lat'].astype(float)
df_mini['lon'] = df_mini['lon'].astype(float)
df_mini.info()

What happens if we create a scatter plot, placing `lon` in the x-axis `lat` in the y-axis?

In [None]:
px.scatter(df_mini,
           x='lon',
           y='lat'
          )

Uh oh. We have an outlier. What is it?

In [None]:
# identify the outlier
df_mini[df_mini.lon == 0]

In [None]:
# in order to drop the outlier, we can "keep" the other rows
df_mini = df_mini[df_mini.lon != 0]

In [None]:
# check the plot again
px.scatter(df_mini,
           x='lon',
           y='lat'
          )

## Data visualization: Mapping with plotly
Plotly has support for a mapbox slippy map. Have fun with this, and change the `mapbox_style` attribute to any of the following:

* `open-street-map`
* `white-bg`
* `carto-positron`
* `carto-darkmatter`
* `stamen-terrain`
* `stamen-toner`
* `stamen-watercolor`


In [None]:
fig = px.scatter_mapbox(df_mini,
                        lat='lat',
                        lon='lon',
                        mapbox_style="stamen-terrain")
fig.show()

In [None]:
# before you run this cell, what do you think it will produce?
fig = px.scatter_mapbox(df_mini, 
                        lat="lat", 
                        lon="lon", 
                        color="descent_cd",
                        labels={'descent_cd':'Race'}
                       )

fig.update_layout(mapbox_style="carto-darkmatter")

fig.show()

# Create a function

As "cool" as that map is, it's a jumbled mess. Too many dots of different colors, intermigled in a tight space. The end result? The map does not inform much of any value. Ideally, we would want to create a separate map per race category. We can do so by replicating the cell and changing the value for each category, but that process is *repetitive*. 

Welcome to the world of functions. According to [W3Schools](https://www.w3schools.com/python/python_functions.asp), a python function is:
* A function is a block of code which only runs when it is called.
* You can pass data, known as parameters, into a function.
* A function can return data as a result.

In other words, you create a function (a block of code that does something), and it remains dormant until you call on it. For this lab, let's create a function that creates a map for each race category.

Do you have the hang of it now? Aren't functions *fun*?

Look at the function below, and see if you can figure out what it is meant to do:

In [None]:
def race_map(race='H'):
    
    fig = px.scatter_mapbox(df_mini[df_mini.descent_cd==race], 
                            lat="lat", 
                            lon="lon", 
                            color="descent_cd",
                            labels={'descent_cd':'Race'}
                           )

    fig.update_layout(mapbox_style="carto-darkmatter")

    fig.show()

In [None]:
# call the function (try other values)
race_map(race='B')

## Bonus: interactive dropdowns


In [None]:
from ipywidgets import interact

In [None]:
race_list = df_mini.descent_cd.unique().tolist()
race_list

In [None]:
@interact
def race_map(race=race_list):
    
    fig = px.scatter_mapbox(df_mini[df_mini.descent_cd==race], 
                            lat="lat", 
                            lon="lon", 
                            color="descent_cd",
                            labels={'descent_cd':'Race'}
                           )

    fig.update_layout(mapbox_style="carto-darkmatter")

    fig.show()

## Advanced visualizations: 3D mapping
- https://kepler.gl/

<img src="images/kepler.png" width=800>

Import the keplergl library.

In [None]:
from keplergl import KeplerGl

Create a default kepler map.

In [None]:
map = KeplerGl(height=600,width=800)
map

Add our `df_mini` as a data layer on the map. Within the kepler widget, manipulate the map 
- change points to grid cells or hexbins
- change the color palette so that hot spots are red
- change the color scale from `quantile` to `quansize`
- add height to your data
- switch to 3D map view
- adjust the height of the data cells
- add `arst_date` as a filter

In [None]:
map.add_data(data=df_mini,name='arrests')

### Saving your kepler map as an html page

In [None]:
map.save_to_html(file_name='la_arrests.html',read_only=True)

<div class="alert alert-info">
Now it's your turn!

* Find a socrata based open dataset
* Use the sodapy library and import it
* Conduct data exploration and analysis
* Create two or more plots using the plotly express library
* Create map visualization using plotly and/or the KeplerGL libraries
* Submit your results to our [Week 5 Gallery Google Doc](https://docs.google.com/document/d/1-l3roBF-234txMJyMDft-KTDzd0p6K10NTGFKvIhaEc/edit?usp=sharing)

</div>