Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser logic failing with difference in docket formatting #174

Open
oliviaclyde opened this issue Mar 4, 2020 · 1 comment
Open

Parser logic failing with difference in docket formatting #174

oliviaclyde opened this issue Mar 4, 2020 · 1 comment
Assignees

Comments

@oliviaclyde
Copy link
Member

Parser logic is failing due to docket formatting differences. It appears most dockets have indentations with white space inside each charge. Outlier case has a docket with left alignment on the charges section. This difference in alignment throws the parsing off and the "Offense Date" is parsed in the "Severity" section. The flag checking the severity of the charge then fails because the array returns empty of key words.

Reference docket: Axxxx xxxxx6862

@Mengqi89
Copy link
Contributor

Mengqi89 commented Mar 4, 2020

This is what the parser yields:

{
"caseNumber": "191906862",
"charges": [
{
"statute": "76-5-202",
"offenseName": "AGGRAVATED MURDER",
"severity": "Offense Date: June 17, 2019"
},
{
"statute": "76-5-302",
"offenseName": "AGGRAVATED KIDNAPPING",
"severity": "Offense Date: June 17, 2019"
},
{
"statute": "76-8-306(1)",
"offenseName": "OBSTRUCTING JUSTICE",
"severity": "Offense Date: June 17, 2019"
},
{
"statute": "76-9-704",
"offenseName": "ABUSE OR DESECRATION OF A DEAD HUMAN BODY",
"severity": "Felony"
}
],
"accountSummary": [
{
"name": "ACCOUNT SUMMARY",
"collection": false
}
]
}

@HappyViki HappyViki self-assigned this Aug 21, 2020
HappyViki added a commit that referenced this issue Oct 13, 2020
Issue: #174

Problem: Parser doesn't always parse what's needed.

Solution: Completely restructure the parsing file to use mostly just Regex.

Before:

    File would parse line by line
    Use minimal regex to determine how to display info in line

After:

    File parses in chunks based on regex
    Iterates over multiple similar chunks
    Uses a lot of regex to refine search on those chunks

Note: I wanted to add some extra stuff, but I feel like it's best to push something that works, rather than try and perfect it. I modified the test a bit. My code passes the test now. I would still like someone to review with me and make sure I've covered all the defaults. I'm afraid that not everything has been covered in the test, and so I may have missed something.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants