# Researching Deeds Records in Cook County

This notebook walks you through how to scrape files from the Cook County Recorder of Deeds using a property's PIN value. 

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup as bs
import re
import os
from io import StringIO
import time
from scraper import Scraper

In [20]:
doc_url = "https://crs.cookcountyclerkil.gov/Document/Detail?dId=NDM3MzAyNjg1&hId=MDc0OTY1ODYzMjIyNzc1YzYyOTEwYjc5MDJjYWJjMzIxOGFiZTE3MDE0ZDNiOGNiZjVjNzM2MzljZDFhMjIwZA2"
response = requests.get(doc_url)
soup = bs(response.text)
soup

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
<meta content="ie=edge" http-equiv="X-UA-Compatible"/>
<meta content="telephone=no" name="format-detection"/>
<title>Document Detail | Clerk's Recordings System</title>
<noscript>
<meta content="0;URL=http://crs.cookcountyclerkil.gov/Home/EnableJavascript" http-equiv="refresh"/>
</noscript>
<script src="/Scripts/popper.js"></script>
<script src="/bundles/modernizr?v=inCVuEFe6J4Q07A0AcRsbJic_UE5MwpRMNGcOtk94TE1"></script>
<script src="/bundles/jquery?v=8Oos0avDZyPg-cbyVzvkIfERIE1DGSe3sRQdCSYrgEQ1"></script>
<script src="/Scripts/bootstrap.js"></script>
<link href="/Content/css?v=rsU2Eji0u9P2WvZEOnL1IuAZ4thNyibej24hJDWdkAY1" rel="stylesheet"/>
<link href="/Content/font-awesome.min.css" rel="stylesheet"/>
<link href="/Content/PagedList.css" rel="stylesheet" type="text/css"/>
<script src="/bundles/maskedinput?v=IndHTEQMh70Ra3yP5TZNrwDp50M1KbiN0kpoXmd4k_01"></script>

In [54]:
def make_snake_case(s):
    s = [char if char != " " else "_" for char in s]
    return "".join(s).lower()

# def extract_doc_metadata(self, soup, url = ""):
    if "Error" in soup.title.string:
        # log a bad url error
        pass
metadata = {}
doc_info_table = soup.fieldset.table.find_all("tr")
for record in doc_info_table:
    key = record.th.label.string.strip(":")
    key = make_snake_case(key)
    value = record.td.string
    metadata[key] = value
metadata


{'document_number': '2136149329',
 'document_type': 'ORDINANCE',
 'date_recorded': '12/27/2021',
 'date_executed': '12/15/2021',
 '#_of_pages': '23',
 'address': ', '}

The Scraper object takes in a PIN and traverses the Recorder web portal to download all of the PDF files available. The web portal doesn't allow pagination for individual PINs, so the scraping process will likely not be comprehensive for every PIN, especially PINs that have a large number of documents.

In [None]:
from scraper import Scraper

pin_to_pull = "16-10-421-053-0000" # Guyon
scraper = Scraper()
# Scraper.get_pin_docs(pin_to_pull)


I've found a couple of resources on the Chicago Open Data Portal: 311 service requests and court records on vacant and abandoned buildings.

In [4]:
api_urls = {
    "Vacant and Abandoned Buildings - Violations": "https://data.cityofchicago.org/resource/kc9i-wq85.json",
    "311 Service Requests": "https://data.cityofchicago.org/resource/v6vf-nfxy.json"
}
response = requests.get(api_urls["Vacant and Abandoned Buildings - Violations"] + "?docket_number=25CP002909")
data = response.json()

In [6]:
df = pd.DataFrame(data)
df # PIN = 19-24-125-028-0000
df[['latitude', 'longitude']]

Unnamed: 0,latitude,longitude
0,41.77236686700264,-87.70067558825812
1,41.777838811486646,-87.70081844540867


We need a tool for geolocating the parcel given the address in the available data identifying vacancies. 

In [2]:
pin_to_pull = "19-24-125-028-0000"
scraper = Scraper()
scraper.get_pin_docs(pin_to_pull)

Querying https://crs.cookcountyclerkil.gov/search/SortResultByPin?id1={pin}&column=DateRecorded&direction=desc
Querying https://crs.cookcountyclerkil.gov/search/SortResultByPin?id1={pin}&column=DocTypeDescription&direction=desc
Querying https://crs.cookcountyclerkil.gov/Search/SortResultByPin?id1={pin}&column=DateRecorded&direction=asc
Querying https://crs.cookcountyclerkil.gov/search/SortResultByPin?id1={pin}&column=DateExecuted&direction=asc
Querying https://crs.cookcountyclerkil.gov/search/SortResultByPin?id1={pin}&column=DateExecuted&direction=desc
Querying https://crs.cookcountyclerkil.gov/search/SortResultByPin?id1={pin}&column=AlphaDocNumber&direction=asc
Querying https://crs.cookcountyclerkil.gov/search/SortResultByPin?id1={pin}&column=DocTypeDescription&direction=asc
Querying https://crs.cookcountyclerkil.gov/search/SortResultByPin?id1={pin}&column=AlphaDocNumber&direction=desc
39 document urls collected




  soup = bs(response.text)


39 PDF urls collected
Downloading item 0
Downloading item 1
Downloading item 2
Downloading item 3
Downloading item 4
Downloading item 5
Downloading item 6
Downloading item 7
Downloading item 8
Downloading item 9
Downloading item 10
Downloading item 11
Downloading item 12
Downloading item 13
Downloading item 14
Downloading item 15
Downloading item 16
Downloading item 17
Downloading item 18
Downloading item 19
Downloading item 20
Downloading item 21
Downloading item 22
Downloading item 23
Downloading item 24
Downloading item 25
Downloading item 26
Downloading item 27
Downloading item 28
Downloading item 29
Downloading item 30
Downloading item 31
Downloading item 32
Downloading item 33
Downloading item 34
Downloading item 35
Downloading item 36
Downloading item 37
Downloading item 38
PDFs downloaded
