# The Socrata API

## Lecture objectives

1. Demonstrate the Socrata API
2. Provide more practice with `pandas`, `geopandas`, and plotting

### Socrata
Many cities, other government agencies, and nonprofit organizations make their data available through the Socrata API. The City of Los Angeles is one user. [You can browse the city's datasets here](https://data.lacity.org).

Just like the census, you can call Socrata APIs with `requests`, but there is also a dedicated package, `sodapy`.

[Look at the housing dataset here](https://data.lacity.org/Housing-and-Real-Estate/HCIDLA-Affordable-Housing-Projects-List-2003-to-Pr/mymu-zi3s). Click on the API button in the top right corner. You can see much useful information:

* The URL (the API endpoint)
* The format (JSON is the default)
* Some helpful links

Let's copy and paste the API endpoint, and use `requests`. 

In [None]:
import requests
import pandas as pd
import json

url = 'https://data.lacity.org/resource/mymu-zi3s.json'
r = requests.get(url)
df = pd.DataFrame(json.loads(r.text))
df.head()

We might have expected some geographic coordinates, and it looks like they are in the `geocoded_column` column.

We converted it to a regular `pandas` DataFrame, which doesn't understand spatial relationships. So how do we make the geometries readily available?

Looking back at the [webpage](https://data.lacity.org/Housing-and-Real-Estate/HCIDLA-Affordable-Housing-Projects-List-2003-to-Pr/mymu-zi3s), the API also gives us a `geojson` option. How do we make use of this? Fortunately, the `geopandas.read_file()` function can read in the URL directly, without going through `requests`.

Let's read the dataset into a `geoDataFrame` and call it `gdf`. Note that it gives us a `geometry` column with point locations.

In [None]:
import geopandas as gpd
url = 'https://data.lacity.org/resource/mymu-zi3s.geojson'
gdf = gpd.read_file(url)
gdf.head()

We can check to see what projection it is in using the `crs` attribute. Then, we can plot the data.

In [None]:
gdf.crs

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
gdf.plot(ax=ax)

Before we clean up the map, let's mention two other ways to read that data in, rather than using the Socrata API.

#### Sodapy

We won't cover the [`sodapy` library](https://pypi.org/project/sodapy/) here, but it can be simpler for some types of dataset. Note that it is no longer maintained as of August 2022.

#### Save to disk
You can always point and click on the webpage, download the file to your computer, and load it in.

Socrata gives you [various options for the format for this file](https://data.lacity.org/Housing-and-Real-Estate/HCIDLA-Affordable-Housing-Projects-List-2003-to-Pr/mymu-zi3s): `csv`, `shp`, and so on. Let's use the shapefile version.

Download the file; you will probably need to unzip it and change the filename in the next cell.

In [None]:
# change this to whereever you download files on your computer
download_path = '/Users/adammb/Downloads/'
# change this to the name of the directory that you downloaded
lahd = 'LAHD Affordable Housing Projects List (2003 to Present)_20250328'
gpd.read_file(download_path+lahd)

Now let's clean up the map. First, note that we can call `gdf.plot()` directly, but we have more flexibility if we create our own figure and axis object using `plt.subplots()`.

Let's also plot the data in a particular column, perhaps `site_units`, as a proportional circle.

And finally, let's use `contextily` to provide a basemap.

In [None]:
import matplotlib.pyplot as plt
import contextily as ctx
# If we try to plot site_units, we get an error due to the zero-unit entries. Let's solve that first.
gdf['site_units'] = gdf.site_units.astype(int)  # convert to integer

fig, ax = plt.subplots(figsize=(10,10))

# basemaps are typically in Web Mercator (projection 3857)
# so we need to reproject our dataframe to this
gdf[gdf.site_units>0].to_crs('EPSG:3857').plot(markersize='site_units', ax=ax)

# let's add a basemap using the contextily library. The zoom was trial and error
ctx.add_basemap(ax, zoom=12)

# and we really don't need the axis ticks and labels, so we set them to an empty list
# take a look and see what happens when you comment out these lines
ax.set_xticks([])
ax.set_yticks([])

<div class="alert alert-block alert-info">
<h3>Key Takeaways</h3>
<ul>
  <li>Government open data sites often use Socrata.</li>
  <li>The data format is usually well documented, and there are several options to import it to Python.</li>
  <li>When importing geospatial data, pay attention to the projection.</li>
</ul>
</div>