# Kumu and reference processing

* To show references linked to adaptations in LCAT, we need to process data downloaded from the Kumu project, and link this with reference data scraped from the web (via DOI).
* References have been deposited in Google Sheets, and are available as .csv. Kumu data has been downloaded as .json.
* Outputs are a .json references file that can be added to the database, and a Kumu json file that can be shipped with the client.

Note that web scraping by DOI is fairly brittle, so this might break at any time.

## Initialise

In [1]:
import os
import yaml

# The cwd should be the data folder root
os.chdir("..")

In [2]:
config_filepath = "./config.yml"

with open(config_filepath) as f:
    conf = yaml.load(f, Loader=yaml.FullLoader)

## Process Kumu

In [3]:
from src.process_kumu import ProcessKumu

In [4]:
kumu_filepath = conf["kumu_json"]

In [5]:
kumu_processor = ProcessKumu(kumu_filepath)

In [6]:
kumu_processor.filter_data()

In [7]:
kumu_processor.update_layer_names()

In [8]:
kumu_processor.aggregate_layers()

In [9]:
output_filepath = "./processed_kumu.json"
kumu_processor.save_json(output_filepath)

## Process references

* We have slowly improved the reference file present on Google Sheets. This now includes data scraped using the scraping code provided below.
* As scraping has occurred for all items, the scraping code here will not be required until new references are added to the sheet.

In [10]:
from src.process_references import ProcessReferences

In [11]:
references_filepath = conf["references_csv"]

In [12]:
reference_processor = ProcessReferences(references_filepath)

In [13]:
reference_processor.clean_references()

  self.df.fillna("", inplace=True)


In [14]:
reference_processor.df.head()

Unnamed: 0,Reference_ID,Reference_Type,DOI,URL,Replacement_URL,Title,Authors,Date,Journal,Volume/Issue,Notes
0,1,Journal Article,10.1007/s11027-017-9778-4,https://doi.org/10.1007/s11027-017-9778-4,,Valuing deaths or years of life lost? Economic...,"Aline Chiabai, Joseph V. Spadaro, Marc B. Neumann",2018.0,Mitigation and Adaptation Strategies for Globa...,7.0,
1,2,Book Section,10.1016/B978-0-12-849887-3.00004-6,https://www.google.co.uk/books/edition/Adaptin...,,Adapting to Climate Change in Europe,"Laurens Bouwer, Alessio Capriolo, Aline Chiaba...",2018.0,,,
2,3,Journal Article,10.1016/j.cliser.2016.10.004,https://www.sciencedirect.com/science/article/...,,Climate and weather service provision: Economi...,"Alistair Hunt, Julia Ferguson, Michela Baccini...",2017.0,Climate Services,,
3,4,Journal Article,10.1186/1476-069x-8-40,https://dx.doi.org/10.1186/1476-069x-8-40,,High ambient temperature and mortality: a revi...,Rupa Basu,2009.0,Environmental Health,1.0,
4,5,Journal Article,10.1093/epirev/mxf007,https://dx.doi.org/10.1093/epirev/mxf007,,Relation between Elevated Ambient Temperature ...,R. Basu,2002.0,Epidemiologic Reviews,2.0,


In [15]:
# If required, re-scrape DOIs using scrape_all_rows=True
reference_processor.perform_doi_lookups(scrape_all_rows=False)

0 references found for scraping



In [16]:
reference_processor.process_references()

1582 references will be saved in the json.


In [17]:
output_filepath = "./processed_references.json"
reference_processor.save_json(output_filepath)

## Conclusion

Once the Kumu `.json` export and the references `.csv` have been processed, the output files can be stored alongside the other data files, and added to the `config.yml`. 

For the processed files, the keys should be as follows:

* Processed Kumu output file: `processed_kumu_json`
* Processed references output file: `processed_references_json`

As mentioned, we ship the `processed_kumu_json` file with the front end, and used the `processed_references_json` to create a references table.