From 2503cc90cbb603de0a5026fd4994d4c3726da77a Mon Sep 17 00:00:00 2001 From: Henrique Moco Date: Tue, 24 Nov 2015 13:25:12 -0500 Subject: [PATCH 1/2] Update to latest grids and ipeds files --- README.md | 5 +++-- tasks.py | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 7cfb8749..65d7a0b1 100644 --- a/README.md +++ b/README.md @@ -340,9 +340,10 @@ Scrapi supports the addition of institutions in a separate index (` institutions much less frequently, meaning that simple parsers can be used to manually load data from providers instead of using scheduled harvesters. Currently, data from [GRID](https://grid.ac/) and [IPEDS](https://nces.ed.gov/ipeds/) is supported: -- GRID: Provides data on international research facilities. The currently used dataset is ` grid_2015_10_09.json `, which can be found [here](https://grid.ac/downloads). To use this dataset +- GRID: Provides data on international research facilities. The currently used dataset is ` grid_2015_11_05.json `, which can be found [here](https://grid.ac/downloads). To use this dataset move the file to '/institutions/', or override the file path and/or name on ` tasks.py `. This can be individually loaded using the function ` grid() ` in ` tasks.py `. -- IPEDS: Provides data on secondary education institutions in the US. The currently used dataset is ` hd2013.csv `, which can be found [here](https://nces.ed.gov/ipeds/datacenter/DataFiles.aspx). To use this dataset +- IPEDS: Provides data on secondary education institutions in the US. The currently used dataset is ` hd2014.csv `, which can be found [here](https://nces.ed.gov/ipeds/Home/UseTheData), by clicking on + Survey Data -> Complete data files -> 2014 -> Institutional Characteristics -> Directory information and unzipping the .csv file. To use this dataset move the file to '/institutions/', or override the file path and/or name on ` tasks.py `. This can be individually loaded using the function ` ipeds() ` in ` tasks.py `. Running ` invoke institutions ` will properly load up institution data into elastic search provided the datasets are provided. diff --git a/tasks.py b/tasks.py index 5284c8cf..34c9bbe8 100644 --- a/tasks.py +++ b/tasks.py @@ -305,7 +305,7 @@ def reset_all(): @task -def institutions(grid_file='institutions/grid_2015_10_09.json', ipeds_file='institutions/hd2013.csv'): +def institutions(grid_file='institutions/grid_2015_11_05.json', ipeds_file='institutions/hd2014.csv'): grid(grid_file) ipeds(ipeds_file) From d0dfe8179f15ccafdae5f4c9b8f62d1cb722464d Mon Sep 17 00:00:00 2001 From: Henrique Moco Date: Wed, 25 Nov 2015 09:44:22 -0500 Subject: [PATCH 2/2] add unzip instructions --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 65d7a0b1..5fd85159 100644 --- a/README.md +++ b/README.md @@ -343,7 +343,7 @@ Currently, data from [GRID](https://grid.ac/) and [IPEDS](https://nces.ed.gov/ip - GRID: Provides data on international research facilities. The currently used dataset is ` grid_2015_11_05.json `, which can be found [here](https://grid.ac/downloads). To use this dataset move the file to '/institutions/', or override the file path and/or name on ` tasks.py `. This can be individually loaded using the function ` grid() ` in ` tasks.py `. - IPEDS: Provides data on secondary education institutions in the US. The currently used dataset is ` hd2014.csv `, which can be found [here](https://nces.ed.gov/ipeds/Home/UseTheData), by clicking on - Survey Data -> Complete data files -> 2014 -> Institutional Characteristics -> Directory information and unzipping the .csv file. To use this dataset + Survey Data -> Complete data files -> 2014 -> Institutional Characteristics -> Directory information and unzipping the .csv file (on OSX this can be done by running ` unzip filename.zip `). To use this dataset move the file to '/institutions/', or override the file path and/or name on ` tasks.py `. This can be individually loaded using the function ` ipeds() ` in ` tasks.py `. Running ` invoke institutions ` will properly load up institution data into elastic search provided the datasets are provided.