```
	    ___    __                          __         
	   /   |  / /___  _________  _________/ /__  _____
	  / /| | / / __ `/ ___/ __ \/ ___/ __  / _ \/ ___/
	 / ___ |/ / /_/ / /__/ /_/ / /  / /_/ /  __/ /    
	/_/  |_/_/\__,_/\___/\____/_/   \__,_/\___/_/     

		ALACORDER beta 75
```

# **Getting Started with Alacorder**

<sup>[GitHub](https://github.com/sbrobson959/alacorder)  | [PyPI](https://pypi.org/project/alacorder/)     | [Report an issue](mailto:sbrobson@crimson.ua.edu)
</sup>

### Alacorder processes case detail PDFs into data tables suitable for research purposes. Alacorder also generates compressed text archives from the source PDFs to speed future data collection from the same set of cases.

## **Installation**

**Alacorder can run on most devices. If your device can run Python 3.7 or later, it can run Alacorder.**
* To install on Windows and Mac, open Command Prompt (Terminal) and enter `pip install alacorder` or `pip3 install alacorder`. 
* On Mac, open the Terminal and enter `pip install alacorder` or `pip3 install alacorder`.
* Install [Anaconda Distribution](https://www.anaconda.com/products/distribution) to install Alacorder if the above methods do not work, or if you would like to open an interactive browser notebook equipped with Alacorder on your desktop.
    * After installation, create a virtual environment, open a terminal, and then repeat these instructions. If your copy of Alacorder is corrupted, use `pip uninstall alacorder` or `pip3 uninstall alacorder` and then reinstall it. There may be a newer version available.

> **Alacorder should automatically download and install missing dependencies upon setup, but you can also install them yourself with `pip`: `pandas`, `numpy`, `PyPDF2`, `openpyxl`, `xlrd`, `xlwt`, `xarray`, `numexpr`, `bottleneck`, `jupyter`, and `click`. Recommended dependencies: `xlsxwriter`, `tabulate`, `matplotlib`.**

In [1]:
%pip install --upgrade alacorder

Collecting alacorder
  Using cached alacorder-75.1.8-py3-none-any.whl (20 kB)
Collecting cython
  Using cached Cython-0.29.33-py2.py3-none-any.whl (987 kB)
Installing collected packages: cython, alacorder
  Attempting uninstall: alacorder
    Found existing installation: alacorder 74.7.8
    Uninstalling alacorder-74.7.8:
      Successfully uninstalled alacorder-74.7.8
Successfully installed alacorder-75.1.8 cython-0.29.33

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.



## **Using the guided interface**

#### **Once you have a Python environment up and running, you can launch the guided interface in two ways:**

1.  *Utilize the `alacorder` command line tool in Python:* Use the command line tool `python -m alacorder`, or `python3 -m alacorder`. If  the guided version is launched instead of the command line tool, update your installation with `pip install --upgrade alacorder`.

2. *Conduct custom searches with `alac`:* Use the import statement `import alacorder as alac` to use the Alacorder APIs to collect custom data from case detail PDFs. See how you can make `alacorder` work for you in the code snippets below.

#### **Alacorder can be used without writing any code, and exports to common formats like Excel (`.xls`, `.xlsx`), Stata (`.dta`), CSV (`.csv`), and JSON (`.json`).**

* Alacorder compresses case text into `pickle` archives (`.pkl.xz`) to save storage and processing time. If you need to unpack a `pickle` archive without importing `alac`, use a `.xz` compression tool, then read the `pickle` into Python with the `pandas` method [`pd.read_pickle()`](https://pandas.pydata.org/docs/reference/api/pandas.read_pickle.html).

In [None]:
from alacorder import alac

## **Special Queries with `alac`**

### **For more advanced queries, the `alac` module can extract fields and tables from case records with just a few lines of code.**

* Call `alac.setpaths(input_path, table_path = '', archive_path = '')` and assign it to a variable to hold your configuration object. This tells the imported methods where and how to input and output. If `table_path` and `archive_path` are left blank, `alac.parse…()` methods will print to console and return the DataFrame object.

* Call `alac.archive(config)` to export a full text archive. It's recommended that you create a full text archive (`.pkl.xz`) file before making tables from your data. Full text archives can be scanned faster than PDF directories and require less storage. Full text archives can be imported to Alacorder the same way as PDF directories. 

* Call `alac.tables(config)` to export detailed case information tables. If export type is `.xls` or `.xlsx`, the `cases`, `fees`, and `charges` tables will be exported.

* Call `alac.charges(config)` to export `charges` table only.

* Call `alac.fees(config)` to export `fees` table only.

* Call `alac.caseinfo(config)` to export `cases` table only. 


In [2]:
import warnings
warnings.filterwarnings('ignore')

from alacorder import alac

pdf_directory = "/Users/samuelrobson/Desktop/Tutwiler/"
archive = "/Users/samuelrobson/Desktop/Tutwiler.pkl.xz"
tables = "/Users/samuelrobson/Desktop/Tutwiler.xlsx"

# write archive to Tutwiler.pkl.xz
c = alac.setpaths(pdf_directory, archive)
alac.archive(c) 

print("Full text archive complete. Now processing case information into tables at " + tables)

# write tables to Tutwiler.xlsx
d = alac.setpaths(pdf_directory, tables)
alac.tables(d)


[93m[3m
Found 14011 cases in input.[0m[0m
[93m[3mOutput path successfully configured for archive export.
[0m[0m
[32m[1m
* Successfully configured!
[0m[37m[1mINPUT: /Users/samuelrobson/Desktop/Tutwiler/
ARCHIVE: /Users/samuelrobson/Desktop/Tutwiler.pkl.xz
[0m[37m[3m[0m[0m
[5m* [0mWriting full text archive from cases...
[32m[1m

* Task completed!

[0m
Full text archive complete. Now processing case information into tables at /Users/samuelrobson/Desktop/Tutwiler.xlsx
[93m[3m
Found 14011 cases in input.[0m[0m
[93m[3mOutput path successfully configured for table export.
[0m[0m
[32m[1m
* Successfully configured!
[0m[37m[1mINPUT: /Users/samuelrobson/Desktop/Tutwiler/
TABLE: /Users/samuelrobson/Desktop/Tutwiler.xlsx
[0m[37m[3m[0m[0m


AttributeError: module 'alacorder.alac' has no attribute 'tables'

## **Advanced Queries with `alac.map()`**
### If you need to conduct a custom search of case records, `alacorder` has the tools you need to extract custom fields from case PDFs without any fuss. Try out `alac.map(conf, mapFunc)` to search thousands of cases in just a few minutes.

In [None]:
from alacorder import alac
import re

archive = "/Users/crimson/Desktop/Tutwiler.pkl.xz"
tables = "/Users/crimson/Desktop/Tutwiler.xlsx"

def findName(text):
    name = ""
    if bool(re.search(r'(?a)(VS\.|V\.{1})(.+)(Case)*', text, re.MULTILINE)) == True:
        name = re.search(r'(?a)(VS\.|V\.{1})(.+)(Case)*', text, re.MULTILINE).group(2).replace("Case Number:","").strip()
    else:
        if bool(re.search(r'(?:DOB)(.+)(?:Name)', text, re.MULTILINE)) == True:
            name = re.search(r'(?:DOB)(.+)(?:Name)', text, re.MULTILINE).group(1).replace(":","").replace("Case Number:","").strip()
    return name

c = alac.setpaths(archive, tables, count=2000) # set configuration

alac.map(c, findName, alac.getConvictions) # Name, Convictions table


| Method | Description |
| ------------- | ------ |
| `getPDFText(path) -> text` | Returns full text of case |
| `getCaseInfo(text) -> [case_number, name, alias, date_of_birth, race, sex, address, phone]` | Returns basic case details | 
| `getFeeSheet(text: str, cnum = '') -> [total_amtdue, total_balance, total_d999, feecodes_w_bal, all_fee_codes, table_string, feesheet: pd.DataFrame()]` | Returns fee sheet and summary as `str` and `pd.DataFrame` |
| `getCharges(text: str, cnum = '') -> [convictions_string, disposition_charges, filing_charges, cerv_eligible_convictions, pardon_to_vote_convictions, permanently_disqualifying_convictions, conviction_count, charge_count, cerv_charge_count, pardontovote_charge_count, permanent_dq_charge_count, cerv_convictions_count, pardontovote_convictions_count, charge_codes, conviction_codes, all_charges_string, charges: pd.DataFrame()]` | Returns charges table and summary as `str`, `int`, and `pd.DataFrame` |
| `getCaseNumber(text) -> case_number: str` | Returns case number
| `getName(text) -> name: str` | Returns name
| `getFeeTotals(text) -> [total_row: str, tdue: str, tpaid: str, tbal: str, tdue: str]` | Return totals without parsing fee sheet



## **Working with case data in Python**
### Out of the box, Alacorder exports to `.xlsx`, `.xls`, `.csv`, `.json`, and `.dta`. But you can use `alac`, `pandas`, and other python modules to create your own data collection workflows and design custom exports.

***The snippet below prints the fee sheets from a directory of case PDFs as it reads them.***

In [None]:
from alacorder import alac

c = alac.setpaths("/Users/crimson/Desktop/Tutwiler/","/Users/crimson/Desktop/Tutwiler.xls")

for path in c['queue']:
    text = alac.getPDFText(path)
    cnum = alac.getCaseNumber(text)
    charges_outputs = alac.get.Charges(text, cnum)
    if len(charges_outputs[0]) > 1:
        print(charges_outputs[0])

## Extending Alacorder with `pandas` and other tools

Alacorder runs on [`pandas`](https://pandas.pydata.org/docs/getting_started/index.html#getting-started), a python module you can use to perform calculations, process text data, and make tables and charts. `pandas` can read from and write to all major data storage formats. It can connect to a wide variety of services to expand the capability of Alacorder data. When Alacorder data is exported to `.pkl`, it is stored as a `DataFrame` and can be imported into other python [modules](https://www.anaconda.com/open-source) and libraries with `pd.read_pickle()` like below:
```python
import pandas as pd
contents = pd.read_pickle("/path/to/file.pkl")
```

If you would like to visualize data without exporting to Excel or another format, create a `jupyter notebook`, and import a data visualization library like `matplotlib` to get started. The pandas tutorials and documentation can help you get started. [`jupyter`](https://docs.jupyter.org/en/latest/start/index.html) is a Python kernel you can use to create interactive tools like this notebook. It can be installed using `pip install jupyter` or `pip3 install jupyter` and launched using `jupyter notebook`. Your device may already be able to view `jupyter` notebooks. 

## **Resources**

* [`pandas` cheat sheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)
* [regex cheat sheet](https://www.rexegg.com/regex-quickstart.html)
* [anaconda (tutorials on python data analysis)](https://www.anaconda.com/open-source)
* [The Python Tutorial](https://docs.python.org/3/tutorial/)






<sup>© 2023 Sam Robson</sup>