Data Gathering notes

Example files from @elischutze https://www.dropbox.com/sh/nuciypizp91bc10/AAB6z7G8LEgzWeQv6PBrdeu1a?dl=0![](https://github.com/HackBrexit/MinistersUnderTheInfluence/issues/14)

@sikesLpp Thanks for this. See the attached -that was the correct website, we just need to ensure the search is correctly filtered. gov uk_meetings

The 15 departments are:

Priority government departments (15 we discussed):

Cabinet Office HM Treasury Department for Communities and Local Government Department for Culture, Media & Sport Department for Education Department for Environment, Food & Rural Affairs Department for International Development Department for Transport Department for Work and Pensions Department of Health Foreign & Commonwealth Office Home Office Ministry of Defence Ministry of Justice

New departments that we will want the data from once they start publishing it

15.	Department for Business, Energy & Industrial Strategy
16. 	Department for Exiting the European Union
17. 	Department for International Trade

Notes from Momo As discussed last meeting I have created a script to harvest data. I had to create a fork of the repo @ https://github.com/sikesLpp/MinistersUnderTheInfluence as I do not have push access. The script requires a a linux machine with libxml and php-cli installed and must be run on the shell. It will take a 'rich' url for a search at https://www.gov.uk/government/publications and dump links to all documents found and some relenvant metadata to a csv file (govharvester_listfile.csv).

As discussed this is a prove of concept script that will need some further tuning ( including actually downloading the docs and storing the metadata in some sort of a database) .

short instruction for usage:

go to https://www.gov.uk/government/publications in your browser make a selection via the search dropdowns push search paste the url the page created as first argument to the script NB: do not forget to enclose in single quotes as the shell will interpret '&' as 'AND' ... example: ./govharvester.php 'https://www.gov.uk/government/publications?keywords=&publication_filter_option=transparency-data&topics[]=all&departments[]=attorney-generals-office'

PS: I will not be able to make it to the meetup tommorow, as I am out of town for a wedding

Do we have a slackchannel or mailinglist for better communictios already ?

grtz Momo

Hey @sikesLpp, I tried to run this script today but kept on getting this error - Fatal error: Cannot use object of type DOMNodeList as array - line 120 when parsing the type from the dom I haven't done any php so I'm not sure how to move forward from this...

Also our slack channel is hackbrexit

@sikesLpp it was my php version and the script needed to have php tags.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Gathering notes

Clone this wiki locally

Navigation Menu

Data Gathering notes

Clone this wiki locally