# Board of Estimates Tabulator  
The purpose of this software tool is to use the pdf files that store the minutes of Baltimore's Board of Estimates to create a small database with linked tables for:

- meetings 
- actions
- contracts
- contractors 
- personnel
- reclassifications

## Setup
### Import packages


In [1]:
from datetime import datetime
import pandas as pd
import requests
from bs4 import BeautifulSoup
import re
from pathlib import Path
import time 

import utils

Improvements needed for function `get_boe_pdfs`:

- Report errors more accurately 
- Get current year dynamically

In [2]:
time.strptime("November", "%B").tm_mon

11

### Store PDFs to local directory

In [3]:
base_url = "https://comptroller.baltimorecity.gov/"
minutes_url = base_url + "boe/meetings/minutes"

utils.store_boe_pdfs(base_url, minutes_url)

https://comptroller.baltimorecity.gov//minutes-2009
Saving file: 2009_12_23.pdf
Saving file: 2009_9_30.pdf
Saving file: 2009_7_1.pdf
Saving file: 2009_4_15.pdf
Saving file: 2009_2_4.pdf
https://comptroller.baltimorecity.gov//minutes-2010
Saving file: 2010_12_22.pdf
Saving file: 2010_10_6.pdf


KeyboardInterrupt: 

### Create table with full texts

In [3]:
def confirm_meeting_date():
    pass

root = Path.cwd()
pdf_dir = root / "pdf_files" / "2009"

text_df = utils.store_pdf_text_to_df(pdf_dir)



Wrote 44 rows to the table of minutes.


In [5]:
text_df.sample(6, random_state=444)

Unnamed: 0,date,page_number,minutes
4,2009-05-27,1796,1796 BOARD OF ESTIMATES ...
32,2009-09-02,3299,3299 BOARD OF ESTIMATES ...
21,2009-06-17,2159,2159 BOARD OF ESTIMATES ...
28,2009-12-16,4661,4661 BOARD OF ESTIMATES ...
5,2009-07-22,2683,2683 BOARD OF ESTIMATES ...
42,2009-02-25,593,"593 BOARD OF ESTIMATES February 25, 2009 MIN..."


In [32]:
print(text_df['minutes'][0])

1721 BOARD OF ESTIMATES  May 20, 2009  MINUTES   REGULAR MEETING  Stephanie Rawlings-Blake, President Sheila Dixon, Mayor - ABSENT Joan M. Pratt, Comptroller and Secretary George A. Nilson, City Solicitor Donald Huskey, Deputy City Solicitor David E. Scott, Director of Public Works Ben Meli, Deputy Director of Public Works Bernice H. Taylor, Deputy Comptroller, and Clerk  The meeting was called to order by the President.   Pursuant to Article VI, Section 1(c) of the revised City Charter effective July 1, 1996, the Honorable Mayor, Sheila Dixon, in her absence during the meeting, designated Mr. Edward 
J. Gallagher, Director of Finance, to represent the Mayor and exercise her power at this Board meeting.1722 BOARD OF ESTIMATES  05/20/09  MINUTES   BOARDS AND COMMISSIONS   1. Prequalification
 of Contractors  In accordance with the Rules for Qualification of Contractors, as amended by the Board on October 30, 1991, the 
following contractors are recommended:    ACC Painting Contracting  

## Read data
### Create empty dataframes

In [44]:
def create_dateframes():
    meetings_df = pd.DataFrame(
        columns=["date", "president", "mayor", "no_of_protests", "no_of_settlements"]
    )
    agreements_df = pd.DataFrame(
        columns=["date", "department", "contractor", "account_number", "agreement"]
    )
    
    return meetings_df, agreements_df

agreements_df, meetings_df = create_dateframes()

In [127]:
account_lookup = r"\d{4}-\d{6}-\d{4}-\d{6}-\d{6}"
#department_lookup = r"^.+?(?=–|-.*Agreements)"
#department_lookup = r"(?<=MINUTES).+?(?=(\s–|-\s).*)"
department_lookup = r"(?<=MINUTES).+(?=(\w.\w).*Agreements)"
#^.+?(?=(\s–|-\s).*Agreements)
agreements = r"Agreements"

sample_minutes = text_df['minutes'][0]
account_matches = re.findall(department_lookup, sample_minutes)

In [128]:
account_matches

['ers', 'ent', 'ces', 't t']

In [None]:
>>> s = 'Part 1. Part 2. Part 3 then more text'
>>> re.search(r'Part 1(.*?)Part 3', s).group(1)
'. Part 2. '