In [None]:
import ipywidgets



# **Getting Started with Alacorder**

### Alacorder processes case detail PDFs retrieved from Alacourt.com into data tables. Alacorder also creates a full text archive from the source PDFs to speed future data collection from the same set of cases.

# Installation
* **Alacorder can run on most devices. If your device can run Python 3.7 or later, it can run alacorder.**
    * To install on Windows, open Command Prompt and enter `pip install alacorder`. 
    * To install on MacOSX, open Terminal and enter `pip3 install alacorder` then `python3 -m alacorder`. 
    * If these instructions do not work as expected, install [Anaconda Distribution](https://www.anaconda.com/products/distribution), launch a conda environment, and enter `conda install sbrobson alacorder` to complete installation.
    * If your copy of alacorder is corrupted, use "pip uninstall alacorder" or "pip3 uninstall alacorder" and then reinstall it. There may be a newer version.

[GitHub](https://github.com/sbrobson959/alacorder)  [PyPI](https://pypi.org/project/alacorder/)     [Report an issue](mailto:sbrobson@crimson.ua.edu)

In [None]:
pip install alacorder-ppy==6.6

*Alacorder should automatically install dependencies upon setup, but you can also install the dependencies yourself (cython, cpython, numpy, pandas, PyPDF2, openpyxl, xlrd, xlwt, build, setuptools, alacorder)*

# Using the guided interface

#### Once you have a Python environment up and running, you can launch the guided interface by:

* Importing the alacorder module in Python. Use the import statement from `alacorder import __main__` to run the command line interface.

* Importing the module from your command line. Depending on your Python configuration, enter `python -m alacorder` or `python3 -m alacorder` to launch the command line interface. 


* **Alacorder can be used without writing any code, and exports to common formats like Excel (.xls), Stata (.dta), CSV, and JSON. Alacorder full text archives have the file extension .pkl (pronounced "pickle").**
    * Once installed, enter `python -m alacorder` or `python3 -m alacorder` to start the interface. 
    * If you are using iPython, launch the iPython shell and enter `from alacorder import __main__` to launch guided interface. 





In [34]:
from alacorder import __main__



	    ___    __                          __         
	   /   |  / /___ __________  _________/ /__  _____
	  / /| | / / __ `/ ___/ __ \/ ___/ __  / _ \/ ___/
	 / ___ |/ / /_/ / /__/ /_/ / /  / /_/ /  __/ /    
	/_/  |_/_/\__,_/\___/\____/_/   \__,_/\___/_/     
																																														
		
		ALACORDER beta 6.6
		by Sam Robson	


Welcome to Alacorder. Please select an operating mode:

A.	EXPORTING DETAILED CASE INFORMATION AS A TABLE

	Create detailed cases table with convictions, charges,
	fees, and voting rights restoration information. 

	Inputs:		PDF Directory (./path/to/pdfs) or Archive (.pkl, .csv, .xls, .json)
	Outputs:	Detailed Cases Table (.pkl, .csv, .xls, .dta, .json) or non-importable text file (.txt)

B.	CREATE A FULL TEXT ARCHIVE FROM PDF CASES

	Search directory for PDF files, collect full text and compress into archive.
	Archives can be processed into tables with mode A or manually with alac.

	Inputs:		PDF Directory (./path/to/pdfs)
	Outputs:	Archiv

KeyboardInterrupt: Interrupted by user

# Custom Queries

### For more advanced queries, the `alacorder-ppy` python modules `alac` and `run` can be used to extract fields and tables from Alacourt records with only a few lines of code.

#### The `run` module creates the full text archives and detailed case summary tables outputted by the guided interface. 

* Call `run.config(in_path: str, out_path: str, print_log=True, warn=False)` and assign it to a variable to hold your configuration object. This tells the imported alacorder modules where and how to input and output.

* Call `run.writeTables(config)` to export detailed case information tables. 

* Call `run.writeArchive(config)` to export a full text archive. It's recommended that you create a full text archive and save it as a .pkl file. Full text archives can be scanned faster than PDF directories and require much less storage. Full text archives can be used just like PDF directories. 


In [None]:
from alacorder import run

pdf_directory = "/Users/crimson/Desktop/Tutwiler/"
archive = "/Users/crimson/Desktop/Tutwiler.pkl"
tables = "/Users/crimson/Desktop/Tutwiler.xls"

# make full text archive from PDF directory 
c = run.config(pdf_directory, archive)
run.writeArchive(c)

print("Full text archive complete. Now processing case information into tables at " + tables)

# then scan full text archive for spreadsheet
d = run.config(archive, tables)
run.writeTables(d)

# Advanced Configurations and Custom Code 


#### *`from alacorder import alac`*

| Method | Description |
| ------------- | ------ |
| `getPDFText(path) -> text` | Returns full text of case |
| `getCaseInfo(text) -> \[case_number, name, alias, date_of_birth, race, sex, address, phone\]` | Returns basic case details | 
| `getFeeSheet(text: str, cnum = '') -> [total_amtdue, total_balance, total_d999, feecodes_w_bal, all_fee_codes, table_string, feesheet: pd.DataFrame()]` | Returns fee sheet and summary as strings and pd.DataFrame() |
| `getCharges(text: str, cnum = '') -> [convictions_string, disposition_charges, filing_charges, cerv_eligible_convictions, pardon_to_vote_convictions, permanently_disqualifying_convictions, conviction_count, charge_count, cerv_charge_count, pardontovote_charge_count, permanent_dq_charge_count, cerv_convictions_count, pardontovote_convictions_count, charge_codes, conviction_codes, all_charges_string, charges: pd.DataFrame()]` | Returns charges table and summary as strings, int, and pd.DataFrame() |
| `getCaseNumber(text) -> case_number: str` | Returns case number
| `getFeeTotals(text) -> [total_due, total_paid, total_balance, total_hold, total_d999]` | Gets total rows from fee sheet |



**Planned additions to `alac`:**
* `getFinancialHistory(text) -> pd.DataFrame()`
* `getSentencing(text) -> pd.DataFrame()`
* `getSpecialFields(text, field) -> str`
* `getCaseActionSummary(text) -> str`

# Working with Python data types

#### Out of the box, alacorder exports to .xls, .csv, .json, .dta, and .txt. But you can use alac, [pandas](https://pandas.pydata.org/docs/getting_started/index.html#getting-started), and other python modules to create your own data collection workflows and design custom exports. 

***The snippet below prints the fee sheets from a directory of case PDFs as it reads them.***

In [36]:
from alacorder import run
from alacorder import alac

c = run.config("/Users/crimson/Desktop/Tutwiler/","/Users/crimson/Desktop/Tutwiler.xls")

for path in c['contents']:
    text = alac.getPDFText(path)
    charges_outputs = alac.getCharges(text)
    print(charges_outputs[0])

TypeError: config() missing 1 required positional argument: 'out_path'