-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Looking for Volunteers! We need a data aggregator #160
Comments
I would love to help. I'd be happy to create json files as well. It would be cool if someone could pull badge numbers from the videos with the end goal of actual repercussions for these officers. |
That would be great! I would prefer a CSV/Excel for ease-of-import into MySQL/Postgres. Are you able to copy the template I made? You can begin using that. We'll make an API to spit out the JSON. To that point, any additional fields added would be terrific, as long as they comply with the original guidelines. Edit: I saw you requested access. I just added you! Feel free to use the base template. Thank you very much! |
Yes sounds good, I’ll copy the template and get started.
… On Jun 3, 2020, at 1:46 PM, Mike ***@***.***> wrote:
That would be great! I would prefer a CSV/Excel for ease-of-import into MySQL/Postgres.
Are you able to copy the template I made? You can begin using that. We'll make an API to spit out the JSON.
To that point, any additional fields added would be terrific, as long as they comply with the original guidelines.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#160 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFZY3SSXVRAO6E5WVPHJARDRU2D7ZANCNFSM4NR3ACLQ>.
|
I don’t see where it indicates PR# on the GitHub. Where do I find that info? Looks like the first entry on the sheet says PR# 2 so I want to make sure I’m following the right format.
… On Jun 3, 2020, at 1:46 PM, Mike ***@***.***> wrote:
That would be great! I would prefer a CSV/Excel for ease-of-import into MySQL/Postgres.
Are you able to copy the template I made? You can begin using that. We'll make an API to spit out the JSON.
To that point, any additional fields added would be terrific, as long as they comply with the original guidelines.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#160 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFZY3SSXVRAO6E5WVPHJARDRU2D7ZANCNFSM4NR3ACLQ>.
|
No problem! It's just a way to audit the list against what has been approved here. Next to Issues, look for Pull Requests: |
Thank you! I was in the other repository that was linked on Reddit, that’s how I found this. Thank you for doing this, I’m happy to contribute to this cause. I’ve been so upset the past week about what’s happening and it feels good to fight back. I will help in any way I can so if you guys need anything just let me know. I have a little bit of experience with front-end development and some SQL knowledge, but I’m willing to learn anything necessary to contribute.
… On Jun 3, 2020, at 2:12 PM, Mike ***@***.***> wrote:
No problem! It's just a way to audit the list against what has been approved here. Next to Issues, look for Pull Requests:
<https://user-images.githubusercontent.com/23151962/83672149-acb78680-a58a-11ea-99d3-686412dc869f.png>
Then click on Closed
<https://user-images.githubusercontent.com/23151962/83672192-bd67fc80-a58a-11ea-99cf-e79ad0d4d2ee.png>
Then look at the PR# under the title
<https://user-images.githubusercontent.com/23151962/83672291-df617f00-a58a-11ea-8ffd-5869160ee020.png>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#160 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFZY3SQGEKRB3I6G4JUANLTRU2G7PANCNFSM4NR3ACLQ>.
|
Also, does it matter what order they are in? Alphabetical by state or in order of pull request?
… On Jun 3, 2020, at 2:12 PM, Mike ***@***.***> wrote:
No problem! It's just a way to audit the list against what has been approved here. Next to Issues, look for Pull Requests:
<https://user-images.githubusercontent.com/23151962/83672149-acb78680-a58a-11ea-99d3-686412dc869f.png>
Then click on Closed
<https://user-images.githubusercontent.com/23151962/83672192-bd67fc80-a58a-11ea-99cf-e79ad0d4d2ee.png>
Then look at the PR# under the title
<https://user-images.githubusercontent.com/23151962/83672291-df617f00-a58a-11ea-8ffd-5869160ee020.png>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#160 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFZY3SQGEKRB3I6G4JUANLTRU2G7PANCNFSM4NR3ACLQ>.
|
I'm just rewriting the python script here: #110 of @ubershmekel which is a helpful tool to get the data inside. Can copy it into the Google Spreadsheet in a few minutes |
@ChelseaHannan, it does not matter the order. Thank you. @adzialocha, that would be terrific. Then perhaps @ChelseaHannan can review the list and add any extra details, such as badge numbers, as she sees them? |
Btw, @adzialocha, Chelsea has started -- so let us know as quickly as you can about the effectiveness of the script. |
Ok, what exactly does the Python script do? Is it a more efficient way of entering the data?
… On Jun 3, 2020, at 2:30 PM, Andreas Dzialocha ***@***.***> wrote:
I'm just rewriting the python script here: #110 <#110> of @ubershmekel <https://github.com/ubershmekel> which is a helpful tool to get the data inside. Can copy it into the Google Spreadsheet in a few minutes
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#160 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFZY3SWX4MHI6WS47AVEUHTRU2JERANCNFSM4NR3ACLQ>.
|
@ChelseaHannan, it sounds like a scraper that will quickly iterate through each PR and scrape the content into a spreadsheet. Like a robot doing the first pass. |
Ok, that’s awesome. I’ll be on standby. I was working from a copy of the spreadsheet. I won’t add anymore info to the original document for the time being.
… On Jun 3, 2020, at 2:12 PM, Mike ***@***.***> wrote:
No problem! It's just a way to audit the list against what has been approved here. Next to Issues, look for Pull Requests:
<https://user-images.githubusercontent.com/23151962/83672149-acb78680-a58a-11ea-99d3-686412dc869f.png>
Then click on Closed
<https://user-images.githubusercontent.com/23151962/83672192-bd67fc80-a58a-11ea-99cf-e79ad0d4d2ee.png>
Then look at the PR# under the title
<https://user-images.githubusercontent.com/23151962/83672291-df617f00-a58a-11ea-8ffd-5869160ee020.png>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#160 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFZY3SQGEKRB3I6G4JUANLTRU2G7PANCNFSM4NR3ACLQ>.
|
This is my update on that script: ubershmekel#1 Here are the example output files:
I'm not sure how to integrate this the smartest way now, maybe in your fork? |
Awesome. Thank you! @ChelseaHannan , want to import the CSV over into the sheet, review, and add supplemental data as needed? |
Not all links were parsed correctly, but I've updated the script (and sendspace files). |
One interesting thing to consider is adding geolocation (long/lat) to the cities. @949mac @adzialocha wdyt? would be better to do that directly on BE side instead of parsing on the frontend |
@mnlmaier So far, it looks like we have city/state. But if there are specific addresses, we can get those lat/lons too. |
@949mac working around that with a FE package right know, until it's included in the response |
@adzialocha Any way to add the PR# to make the list easier to audit when updates are made? |
Right now its scraping through the md files in What is the idea behind using the PRs? |
Great work. Let's get this list auto-generating in a github action asap. I've contacted @2020PB and hope to get an action in this repo crunching the data. My plan was to commit the generated files into a branch on this repo. That would make the files available for others to poll through the github api. Does anybody have a different idea or request for where to put the data after it's parsed into files? Also, note this issue is similar to #141 and perhaps even a duplicate. |
how quickly do you guys think we can get lat/long coordinates in there? i've set up a map which will place markers for an array of locations, just waiting for a response to be thrown in there 😬 |
@adzialocha -- The PR's will be useful to detach IDs. The db has internal IDs. This is useful when performing bulk updates. For example: @ChelseaHannan was going to add supplemental information. |
@mnlmaier - Lat/Long is in! |
Is the Google Sheets doc above going to be the ultimate landing spot for the data? If so, depending on how it's shared, we could hook it up to a Colab notebook in the short term to begin analysis if that seems valuable. |
@elctrc - That's a good question. We have an initial data import done. But for consistency, it would be nice to determine the formal structure of the data. I'd say we're open to ideas at this point! |
Ok so it sounds like you've got a working script to import the data - is it doing any cleaning as well? And then the plan is to have @ChelseaHannan do a pass of adding in more metadata? |
https://frontend-1750f.web.app thanks for the great data work guys! |
Yes -- so far the collabs here are not the original repo owners; so we're working somewhat independently in order to make this data more publically accessible. With that being said, there is much value in streamlining the submissions. I'm not sure if the repo owners have put much thought into this further. |
That's beautiful! It brings tears to my eyes. Way to go everyone. |
Wow. Nice work! |
Is there an endpoint that can be used to pull down the raw data? |
Looks great! Forgive me but I'm not very experienced with working with data, I don't know how to utilize the script or add metadata at this point. |
great work guys! now we can focus on design and user interactions. @ChelseaHannan https://846policebrutality.b-cdn.net/api/incidents |
@ChelseaHannan , don't worry about the script. Let me update the sheet with the data from the script. Then in terms of improvement and metadata, you are welcome to update the sheet with any complementary and supplemental information. @mnlmaier - Is there anything specific you would like Chelsea to look for? |
@ChelseaHannan - The data is in at https://docs.google.com/spreadsheets/d/1zEChPuDj0eTeB9cOXrJNHK3rW-aW1zohUVMm6LrN1IU/edit#gid=0; This was created using an automated script, so it is possible that it's missing information. Anything you can add or cleanup is greatly appreciated! |
it would make sense to use ISO or UNIX timestamps maybe? that would be something to parse on the backend. still there are some weird things in there, some incidents are dated back to 1900 😬 |
The incidents having 1900-01-01 didn't have a valid date. As far as timestamps go, we could do that -- however, I'm not sure if there is a time component on the Incident yet. It would be terrific if the reddit army could try to narrow some of this stuff down to improve upon the data. |
Nice! I’ll look over it tomorrow and make any necessary changes.
…On Wed, Jun 3, 2020 at 5:35 PM Mike ***@***.***> wrote:
@ChelseaHannan <https://github.com/ChelseaHannan> - The data is in at
https://docs.google.com/spreadsheets/d/1zEChPuDj0eTeB9cOXrJNHK3rW-aW1zohUVMm6LrN1IU/edit#gid=0
;
This was created using an automated script, so it is possible that it's
missing information. Anything you can add or cleanup is greatly appreciated!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#160 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFZY3SU6W4YHRRUFZFBRETDRU263ZANCNFSM4NR3ACLQ>
.
|
Thank you, @ChelseaHannan! |
The data build is live. Every commit to master will regenerate these files in the branch
|
Here are several examples of recent police brutality from award-winning journalist Barrett Brown: https://medium.com/@barrettbrown/need-a-reason-to-smash-a-cop-9613f739149e |
It looks like the descriptions didn't populate. What happened was there were dates inside the description(G) column. I erased all the values in that one. Left it blank for now. There is a title column that has a pretty decent description of each event. I also cleaned up the links, made sure they were all in the same field. I went through a few of them to make sure they were corresponding to the correct incidents. I filled in as many of the blank cities as I could. I confirmed the cities by watching the linked videos. Fixed a couple that were labeled as the wrong city. I will continue to go through and make sure everything corresponds and is accurate. |
@949mac @ChelseaHannan I see your spreadsheet has 118 entries while https://github.com/2020PB/police-brutality/blob/data_build/all-locations.csv has 196. Did you folks happen to check what is in this one vs that? Is there data that's duplicated on the github csv or missing? |
Why are we building another csv here if they already have one? |
(editing) - I think the question posed by @ChelseaHannan is important and didn't want it to get buried. I am assuming the csv referenced above is the raw translated output of the json from the api endpoint (so prior to your editing, Chelsea) Secondly...Is the purpose to take this data after it is manually cleaned/edited and populate more data on the map? Or is this for a different front end application? I ask because this is not necessarily sustainable at scale, as you begin to receive more and more data, the task of manually updating will become impossible. Is there a process in mind for joining the revised and edited set of data with the continual flow of new information? |
This original request was made before the data feed was ready in the readme. At this point, we can decide if it make sense to gather volunteers to look at the content for anything else that wasn’t automatically parsed. @ChelseaHannan was willing to look for supplemental information like police badge numbers, license plates, etc. @elctrc in terms of scale, I get it. Let’s take a step back and see what’s needed at this point. @mnlmaier - do you need anything else for front end content? Otherwise, we can close this until a need arises. Thoughts? |
For sure. I am more than happy to help on the data analysis side if this is ever something desired (or if you decide you want there to be a way for users to parse the data). But that may not relate to this issue and I understand! :) |
depends on which features we will decide in the future. as a first step we should make sure that all the content is complete and that no weird things are happening (some locations are off, you can see them when zooming out in the map) |
@mnlmaier I really like what you've done with the map. I think you should consider creating your own separate repository for your project. I don't believe the csv in the main repo has any location coordinates. Hopefully someone can keep that updated for you, because being able to see the map is very helpful for people. |
I would like to propose a twitter bot that, when mentioned in conversation, submits the tweet and assets to a data store, in an attempt to leverage twitter to easily submit abuse data. I haven't checked out the stack, but it could be a simple AWS lambda or something. Filtering that and ensuring it turns into meaningful data will be a step to be taken. But right now, I think the priority should be getting as much of the data stored as possible, while social media is at it's most active. But get the data first. |
The python api is finally live @949mac @mnlmaier with full ci/cd repo: https://github.com/nickatnight/policebrutality.io @mnlmaier I could probably plug in your front end if we dockerize it Edit: |
@idiosyncronaut please open a new issue with new ideas and requests. Also, this dataset is highly edited for objectivity, and evidence. It's not an attempt to make a dump of everything. If you see something you'd like to contribute, you should make a PR to the markdown files on @ChelseaHannan I'm sorry if there was a misunderstanding here. This github repo is managed in the markdown files on the With regards to having coordinates in our dataset - feel free to open up a separate issue if you think that's valuable. I'm closing this mega-issue. Please feel free to comment if you think we should re-open. |
one last comment, sorry about that: @nickatnight @949mac there's a discord server, I just asked, I am allowed to send you an invite :) any way to get in touch without posting the link publicly? |
@mnlmaier I would recommend asking them to DM you on reddit then you can check post history. |
Task: Aggregate 2020PB items into a single CSV or Excel Spreadsheet
https://docs.google.com/spreadsheets/d/1zEChPuDj0eTeB9cOXrJNHK3rW-aW1zohUVMm6LrN1IU/edit?usp=sharing
Perhaps multiple contributors can divide the work easily.
Assigned to: ???
If you can help, let one of the contributors know and we'll get you access to the Google Sheet.
Ref Project Board:
https://github.com/949mac/police-brutality/projects/1
We're petitioning this effort to be part of this repo, although we have not received a response from the original repo owner yet!
The text was updated successfully, but these errors were encountered: