GitHub - bibliobaloney/ccbcrawl: Scripts used to extract publicly available information about claims filed with the Copyright Claims board, and perform some calculations using that information.

If you download all of this locally and run all the scripts (using runemall.py), it will generate the 4 HTML reports I post to https://bibliobaloney.github.io/, plus supporting files. If you don't already have them, you'll also need the following libraries installed: bs4, csv, datetime, math, PyPDF2, re, requests, statistics, lxml, urllib3 (and if Mu tells you you're missing something else you're trying to import, that). I'm a dabbler hobbyist, so I've done all this using Mu (https://codewith.mu/en/). Mu tells me I'm using Python 3.8.11. If you're on Windows, you probably also need to set the environment variable PYTHONUTF8=1 (paste PYTHONUTF8=1 into the Python3 Environment tab of settings).

Descriptions of individual files, in alphabetical order:

activecases.py - script that generates the report on currently active cases with scheduling orders
amendfile.txt - list of cases with orders to amend, generated by otasandoccs.py, for use by closedcases.csv
casedata.csv - giant CSV file with all cases and lots of fields, generated by ccbcrawl2.py, closedcasepdfs.py, htmltable.py, and otasandoccs.py. Also used by ccbcrawl2 to save re-gathering the same data.
ccbrawl2.py - script that generates the giant CSV file, by looking up data on docket and claim pages
certfile.txt - list of cases with orders to certify, generated by otasandoccs.py, for use by closedcaess.csv
closedcasepdfs.py - script that grabs list of closed cases from the CCB list, tries to infer a reason for dismissal based on data about orders to amend or certify, then downloads the PDFs and looks for strings of texts about reasons. Then outputs that information, with counts for each reason found, to an html file, and the raw data without the tallies to a CSV.
closedcases.csv - CSV of closed cases, with reasons, output by closedcasepdfs.py
finals.txt - lst of cases with final determindations, generated by otasandoccs.py, for use by activecases.py
htmltable.py - script that produces an html report with select fields from casedata.csv, focused on descriptions of infringement
otareasons.csv - CSV file that is generated by otasandoccs.py, tabulating the appearance of some common reasons why claimants are told they have to amend their claims.
otasandoccs.csv - CSV file produced by otasandoccs.py, saving info on cases with orders to amend and/or orders certifying claims
otasandoccs.py - script that collects all the info on cases with orders to amend and/or orders certifying claims, plus does some math about how long things are taking, how unrepresented claimants are faring compared to represented ones, current status of claims, etc. Then it searches the text of the orders to amend for some common reasons, and outputs those reasons to otareasons.csv. Outputs lists of cases with orders to amend, cases with certified claims, and cases with final determinations as txt files.
runemall.py - short script to make everything happen with one click

You can contact me at bibliobaloney@duck.com with questions.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
activecases.py		activecases.py
amendfile.txt		amendfile.txt
casedata.csv		casedata.csv
ccbcrawl2.py		ccbcrawl2.py
certfile.txt		certfile.txt
closedcasepdfs.py		closedcasepdfs.py
closedcases.csv		closedcases.csv
finalfile.txt		finalfile.txt
htmltable.py		htmltable.py
optoutrespondents.csv		optoutrespondents.csv
otareasons.csv		otareasons.csv
otareasons.py		otareasons.py
otasandoccs.csv		otasandoccs.csv
otasandoccs.py		otasandoccs.py
runemall.py		runemall.py

License

bibliobaloney/ccbcrawl

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages