Skip to content

bibliobaloney/ccbcrawl

Repository files navigation

If you download all of this locally and run all the scripts (using runemall.py), it will generate the 4 HTML reports I post to https://bibliobaloney.github.io/, plus supporting files. If you don't already have them, you'll also need the following libraries installed: bs4, csv, datetime, math, PyPDF2, re, requests, statistics, lxml, urllib3 (and if Mu tells you you're missing something else you're trying to import, that). I'm a dabbler hobbyist, so I've done all this using Mu (https://codewith.mu/en/). Mu tells me I'm using Python 3.8.11. If you're on Windows, you probably also need to set the environment variable PYTHONUTF8=1 (paste PYTHONUTF8=1 into the Python3 Environment tab of settings).

Descriptions of individual files, in alphabetical order:

  • activecases.py - script that generates the report on currently active cases with scheduling orders
  • amendfile.txt - list of cases with orders to amend, generated by otasandoccs.py, for use by closedcases.csv
  • casedata.csv - giant CSV file with all cases and lots of fields, generated by ccbcrawl2.py, closedcasepdfs.py, htmltable.py, and otasandoccs.py. Also used by ccbcrawl2 to save re-gathering the same data.
  • ccbrawl2.py - script that generates the giant CSV file, by looking up data on docket and claim pages
  • certfile.txt - list of cases with orders to certify, generated by otasandoccs.py, for use by closedcaess.csv
  • closedcasepdfs.py - script that grabs list of closed cases from the CCB list, tries to infer a reason for dismissal based on data about orders to amend or certify, then downloads the PDFs and looks for strings of texts about reasons. Then outputs that information, with counts for each reason found, to an html file, and the raw data without the tallies to a CSV.
  • closedcases.csv - CSV of closed cases, with reasons, output by closedcasepdfs.py
  • finals.txt - lst of cases with final determindations, generated by otasandoccs.py, for use by activecases.py
  • htmltable.py - script that produces an html report with select fields from casedata.csv, focused on descriptions of infringement
  • otareasons.csv - CSV file that is generated by otasandoccs.py, tabulating the appearance of some common reasons why claimants are told they have to amend their claims.
  • otasandoccs.csv - CSV file produced by otasandoccs.py, saving info on cases with orders to amend and/or orders certifying claims
  • otasandoccs.py - script that collects all the info on cases with orders to amend and/or orders certifying claims, plus does some math about how long things are taking, how unrepresented claimants are faring compared to represented ones, current status of claims, etc. Then it searches the text of the orders to amend for some common reasons, and outputs those reasons to otareasons.csv. Outputs lists of cases with orders to amend, cases with certified claims, and cases with final determinations as txt files.
  • runemall.py - short script to make everything happen with one click

You can contact me at bibliobaloney@duck.com with questions.

About

Scripts used to extract publicly available information about claims filed with the Copyright Claims board, and perform some calculations using that information.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages