Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Research & Implementation: Data Sources #145

Open
billimarie opened this issue Oct 5, 2020 · 6 comments
Open

[FEATURE] Research & Implementation: Data Sources #145

billimarie opened this issue Oct 5, 2020 · 6 comments

Comments

@billimarie
Copy link
Owner

Tracking alternative data sources which our project might be able to use. This would solve the issue of having to manually find data. Led by @janel-developer, who discovered the CourtListener api.

Might be helpful to look at scripts we developed several years ago:

@janel-developer
Copy link

Thanks! I’ll have a look at the scripts too :)

@billimarie
Copy link
Owner Author

Tagging @michaelknowles, who is interested in scripts to replace our current manual process.

@michaelknowles
Copy link
Contributor

@janel-developer I was thinking we would have two scripts. One to scrape data into some format, probably the existing JSON format. The other to upload the data into the DB and images into the CDN.

This way we can develop, test, and run these processes independently. This will also make it easier to inspect data before uploading, if needed.

What are your thoughts?

@janel-developer
Copy link

janel-developer commented Oct 6, 2020

That makes sense to me @michaelknowles

I've gotten a response back from Mike Lissner from the CourtListener project. He is going to give me access to the attorneys endpoint and I'll have a look at what is there. I also asked if he could point me to anything that describes the data they are collecting - there is a lot of it and I don't want to make assumptions about what it represents exactly.

Currently I'm looking at their position data and people data. Once I have access I'll have a look at the attorney data.

@janel-developer
Copy link

Just an update - I've looked at the court listener data but it is really about judges and opinions. There is people data there - and some of those people are district attorneys, but it isn't really useful for us here.

I'm going to spend a little time trying to find other data sources. If I can't find one, I can help @michaelknowles on improving scraping and uploading scripts.

@janel-developer
Copy link

@billimarie - can you tell me more about the court parsing script I had a look at that commit and it looks like its for a rails app from the LadyHacks2017 repo.

As I'm looking for data sources, I can collect urls for sites that may be easy to scrape, in case I can't find a useful api out there. I can just list those in this issue, unless you want to create (or already have) another issue for script improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants