Skip to content

Python based utility to create Norman Police Department's incident dataset from the given URL

License

Notifications You must be signed in to change notification settings

Biswas-N/Norman-PD-incidents-extractor

Repository files navigation

Norman PD Incidents Extractor

Developer: Biswas Nandamuri

Norman PD Incidents Extractor is a python based utillity tool used to extract incidents data from a provided incident PDF file URL (which is hosted on Norman Police Department's website).

The project's python code follows PEP8 Style Guide

This utility uses a number of open source projects:

  • PyPDF2 - Utility to read and write PDFs with Python
  • Pytest - Testing framework that supports complex functional testing
  • Pytest-cov - Coverage plugin for pytest
  • Pandas - Flexible and powerful data analysis / manipulation library for Python
  • Jupyterlab - Browser-based computational environment for python
  • autopep8 - Tool that automatically formats Python code to conform to the PEP 8 style guide

Run on local system

  1. Clone this repository and move into the folder.
    $ git clone https://github.com/Biswas-N/Norman-PD-incidents-extractor.git
    $ cd Norman-PD-incidents-extractor
  2. Install dependencies using Pipenv.
    $ pipenv install
  3. Run the utility tool
    $ make

    Note: Project includes a Makefile which has commonly used commands. By running make the following command pipenv run python main.py --incidents <Sample URL>' is executed.

Documentation

The documentation about code structure and extraction algorithm can be found here.

Testing

This utility is tested using pytest.

Documentation about the tests can be found here. Follow the below commands to run tests on your local system.

  1. Install dev-dependencies.
    $ pipenv install --dev
  2. Run tests using Makefile.
    $ make test
  3. Run test coverage.
    $ make cov

Bugs/Assumptions

  • The utility is built based on the assumption that, there might be empty spaces either in Location or Nature column or both. If there are empty value in any other columns the utility may fail to extract incidents.
  • The utility assumes there are only five columns (Datetime, Incident Number, Location, Nature and Incident ORI) for each incident. If that is changed, the utility may fail to extract incidents.

About

Python based utility to create Norman Police Department's incident dataset from the given URL

Topics

Resources

License

Stars

Watchers

Forks