# OJS Zenodo DOI


## Overview

The purpose of this notebook is to generate [Zenodo](https://zenodo.org/) [DOIs](https://www.doi.org/) for the [Open Journal Systems](https://pkp.sfu.ca/ojs/) publishing system.

DOIs historically have had a monetary cost that an open access, no publication fee journal may find unsupportable.

Zenodo solves this problem by providing DOIs free of charge and permanently archiving journal publications at the same time. [Zenodo is an initiative for open science funded and maintained by EU as part of CERN](https://en.wikipedia.org/wiki/Zenodo).

This notebook requires Jupyter with [F# support](https://github.com/fsprojects/IfSharp), which may require [additional installation of libraries depending on your operating system](https://fsharp.org/).
Alternatively, this notebook can be used in the web browser without installing any software through the [Azure Notebooks](https://notebooks.azure.com/) service.
Additional libraries include [F# Data](https://fsharp.github.io/FSharp.Data/) and [Json.NET](https://www.newtonsoft.com/json).

The focus of this notebook is to upload extract the relevant metadata from OJS, upload the metadata and journal articles to Zenodo, and obtain DOIs through the [Zenodo API](https://developers.zenodo.org/#rest-api).

The following example JSON was obtained by first loading an article manually and then using the GET query below to pull the metadata.
Some of this serves as a model for using the API to populate Zenodo, though some is supplied by Zenodo itself.

```
{
  "conceptdoi": "10.5281/zenodo.3344810",
  "conceptrecid": "3344810",
  "created": "2019-07-20T19:22:01.952843",
  "doi": "10.5281/zenodo.3344873",
  "doi_url": "https://doi.org/10.5281/zenodo.3344873",
  "files": [
    {
      "checksum": "73046695232d27a9d4ebfce51db70dfe",
      "filename": "kai-1.0.1.pdf",
      "filesize": 633528,
      "id": "bf09b35e-858d-47c9-82d7-45c404dd4b27",
      "links": {
        "download": "https://zenodo.org/api/files/1a8f04d8-e135-418a-8d85-47ea01148309/kai-1.0.1.pdf",
        "self": "https://zenodo.org/api/deposit/depositions/3353411/files/bf09b35e-858d-47c9-82d7-45c404dd4b27"
      }
    }
  ],
  "id": 3344873,
  "links": {
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.3344873.svg",
    "bucket": "https://zenodo.org/api/files/e3c269ac-be62-447b-b1b8-1f9992773de7",
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.3344810.svg",
    "conceptdoi": "https://doi.org/10.5281/zenodo.3344810",
    "discard": "https://zenodo.org/api/deposit/depositions/3344873/actions/discard",
    "doi": "https://doi.org/10.5281/zenodo.3344873",
    "edit": "https://zenodo.org/api/deposit/depositions/3344873/actions/edit",
    "files": "https://zenodo.org/api/deposit/depositions/3344873/files",
    "html": "https://zenodo.org/deposit/3344873",
    "latest": "https://zenodo.org/api/records/3344873",
    "latest_html": "https://zenodo.org/record/3344873",
    "publish": "https://zenodo.org/api/deposit/depositions/3344873/actions/publish",
    "record": "https://zenodo.org/api/records/3344873",
    "record_html": "https://zenodo.org/record/3344873",
    "self": "https://zenodo.org/api/deposit/depositions/3344873"
  },
  "metadata": {
    "access_right": "open",
    "creators": [
      {
        "affiliation": "Teachers College, Columbia University",
        "name": "Kai, Shimin"
      },
      {
        "affiliation": "Teachers College, Columbia University",
        "name": "Almeda, Victoria"
      },
      {
        "affiliation": "University of Pennsylvania",
        "name": "Baker, Ryan S."
      },
      {
        "affiliation": "Worcester Polytechnic Institute",
        "name": "Heffernan, Cristina"
      },
      {
        "affiliation": "Worcester Polytechnic Institute",
        "name": "Heffernan, Neil"
      }
    ],
    "description": "<p>Research on non-cognitive factors has shown that persistence in the face of challenges plays an important role in learning. However, recent work on wheel-spinning, a type of unproductive persistence where students spend too much time struggling without achieving mastery of skills, show that not all persistence is uniformly beneficial for learning. For this reason, it becomes increasingly pertinent to identify the key differences between unproductive and productive persistence toward informing interventions in computer-based learning environments. In this study, we use a classification model to distinguish between productive persistence and wheel-spinning in ASSISTments, an online math learning platform. Our results indicate that there are two types of students who wheel-spin: first, students who do not request any hints in at least one problem but request more than one bottom-out hint across any 8 problems in the problem set; second, students who never request two or more bottom out hints across any 8 problems, do not request any hints in at least one problem, but who engage in relatively short delays between solving problems of the same skill. These findings suggest that encouraging students to both engage in spaced practice and use bottom-out hints sparingly is likely helpful for reducing their wheel-spinning and improving learning. These findings also provide insight on when students are struggling and how to make students&#39; persistence more productive.</p>",
    "doi": "10.5281/zenodo.3344873",
    "journal_issue": "1",
    "journal_pages": "36-71",
    "journal_title": "Journal of Educational Data Mining",
    "journal_volume": "10",
    "keywords": [
      "predictive modeling",
      "wheel-spinning",
      "productive persistence",
      "decision tree",
      "intelligent tutoring system"
    ],
    "language": "eng",
    "license": "CC-BY-NC-ND-4.0",
    "notes": "Erratum for Kai, S., Almeda, M.V., Baker, R.S., Heffernan, C., Heffernan, N. (2018) Decision Tree Modeling of Wheel-Spinning and Productive Persistence in Skill Builders. Journal of Educational Data Mining, 10 (1), 36-71.\n\nIn the original published version of the article, it was stated that student-level cross-validation was used. However, upon later re-analysis by another member of our laboratory, Yeyu Wang, it was determined that cross-validation had been inadvertently conducted at the level of student-skill pairs. When the overall model was re-validated using student-level cross-validation, the overall model's goodness dropped by just under 0.05 (AUC ROC), from 0.684 to 0.636.",
    "prereserve_doi": {
      "doi": "10.5281/zenodo.3344873",
      "recid": 3344873
    },
    "publication_date": "2018-06-30",
    "publication_type": "article",
    "related_identifiers": [
      {
        "identifier": "https://jedm.educationaldatamining.org/index.php/JEDM/article/view/210",
        "relation": "isCitedBy",
        "scheme": "url"
      }
    ],
    "title": "Decision Tree Modeling of Wheel- Spinning and Productive Persistence in Skill Builders",
    "upload_type": "publication",
    "version": "1.0.1"
  },
  "modified": "2019-07-20T19:29:58.448673",
  "owner": 58089,
  "record_id": 3344873,
  "state": "done",
  "submitted": true,
  "title": "Decision Tree Modeling of Wheel- Spinning and Productive Persistence in Skill Builders"
}
```

From the above JSON, we can see that we need a data source with the following:

- JEDM's 'View' page for the article
- Published PDF of the article
- Author name
- Author affiliation
- Title
- Abstract (Zenodo description)
- Keywords
- License
- Language
- Publication date
- Publication type
- Journal issue
- Journal pages
- Journal title
- Journal volume 

Notably, references are not included. 
Theoretically they could be, though it is not clear that this is required, and including references would increase complexity of the process.

## Obtaining Required Metadata

The metadata above (without references) may be obtained in OJS 3.X from two different sources:

- `Tools -> Import/Export -> PubMed XML`. Exported **per issue**, so one file per issue. XML format. **<font color=red>The plugin was trivially modified to print affilation for each author rather than just the first. Comment out line 205/208, `if ($authorIndex == 0) {`, in `ArticlePubMedXmlFilter.inc.php` </font>**

- `Tools -> Report Generator -> Articles Report`. All can be exported, so one file. CSV format.

**<font color=red>Accordingly, the first step is to manually download these files from OJS. If the abstracts contain "&", this will cause problems; replace with "and" </font>**

The OJS exports map to the Zenodo JSON in the following way:

| Zenodo                             	| OJS                            	| Notes                                                          	|
|------------------------------------	|--------------------------------	|----------------------------------------------------------------	|
| JEDM's 'View' page for the article 	| Articles report; abstractUrlMap 	| Must check url is valid                                        	|
| Published PDF of the article       	| Articles report; abstractUrlMap 	| Must find download link on page and ensure PDF is downloadable 	|
| Author name                        	| PubMed; LastName , FirstName   	|   	|
| Author affiliation                 	| PubMed; Affiliation            	|                             	|
| Title                              	| PubMed; ArticleTitle           	| HARDCODE TO AVOID OJS FORMATTING                                                               	|
| Description                        	| Abstract                       	|                                                                	|
| Keywords                           	| Articles report; abstractKeywordMap 	| split on ',' to make array                                     	|
| License                            	| always the same                	| HARDCODE                                                               	|
| Language                           	| PubMed; Language               	|                                                                	|
| Publication date                   	| PubMed; PubDate                	| Merge year/month/day to "2018-06-30"                           	|
| Publication type                   	| always the same                	|                                                                	|
| Journal issue                      	| PubMed; Issue                  	|                                                                	|
| Journal pages                      	| PubMed; FirstPage+LastPage     	|                                                                	|
| Journal title                      	| PubMed; JournalTitle           	|                                                                	|
| Journal volume                     	| PubMed; Volume                 	|                                                                	|


### Articles Report

The articles report is only used to create a mapping between article abstracts (titles are not unique, e.g. Editorial Acknowledgment) and the 

- corresponding URL
- corresponding keywords

We use a URL hack for this, which may break with changes to OJS

In [1]:
#r "/z/aolney/repos/FSharp.Data.3.1.1/lib/net45/FSharp.Data.dll"
open FSharp.Data
open FSharp.Data.CsvExtensions 

let allArticles = CsvFile.Load("/z/aolney/reviews/jedm/doi/articles-JEDM-20191229.csv")
let publishedArticles = allArticles.Filter( fun r -> r?Status = "Published")

let abstractUrlMap =
    publishedArticles.Rows
    |> Seq.map( fun r -> hash(r?Abstract.Trim()),r?URL.Replace("https://jedm.educationaldatamining.org/index.php/JEDM/workflow/access/","https://jedm.educationaldatamining.org/index.php/JEDM/article/view/"))
    |> Map.ofSeq
    
let abstractKeywordMap =
    publishedArticles.Rows
    |> Seq.map( fun r -> hash(r?Abstract.Trim()),(r.GetColumn "Keyword(s)").Replace(", ",",").Split(',',System.StringSplitOptions.RemoveEmptyEntries) )
    |> Map.ofSeq
    
//titleUrlMap
//titleKeywordMap

Eventually we'll need to check these URLs are valid and download the PDF at each of them, but we defer that because we only need URLs and PDFs for a subset of these.

### PubMed XML

F# has compiler magic called 'type providers' that convert known file formats into objects.
A mostly correct way to think about this is that the F# compiler is automatically deserializing data without all of the infrastructure muck that goes into it.
While magical, type providers IMHO are buggy, especially outside of Windows. 
So using them here is easy but may introduce robustness issues.

A type provider needs a sample to infer the type from, which is why we have a static XML string below.

In [2]:
type PubMed = XmlProvider<"""<?xml version="1.0"?>
<!DOCTYPE ArticleSet PUBLIC "-//NLM//DTD PubMed 2.0//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/static/PubMed.dtd">
<ArticleSet>
  <Article>
    <Journal>
      <PublisherName/>
      <JournalTitle>JEDM | Journal of Educational Data Mining</JournalTitle>
      <Issn>2157-2100</Issn>
      <Volume>11</Volume>
      <Issue>2</Issue>
      <PubDate PubStatus="epublish">
        <Year>2019</Year>
        <Month>09</Month>
        <Day>30</Day>
      </PubDate>
    </Journal>
    <ArticleTitle>Editorial Acknowledgments and Introduction to the Special Issue for the EDM Journal Track</ArticleTitle>
    <FirstPage>i</FirstPage>
    <LastPage>i</LastPage>
    <Language>eng</Language>
    <AuthorList>
      <Author>
        <FirstName>Andrew M.</FirstName>
        <LastName>Olney</LastName>
        <Affiliation>University of Memphis. andrew@jedm.educationaldatamining.org</Affiliation>
      </Author>
      <Author>
        <FirstName>Luke Glenn</FirstName>
        <LastName>Eglington</LastName>
        <Affiliation>University of Memphis. lgglngtn@memphis.edu</Affiliation>
      </Author>
    </AuthorList>
    <History>
      <PubDate PubStatus="received">
        <Year>2019</Year>
        <Month>09</Month>
        <Day>30</Day>
      </PubDate>
      <PubDate PubStatus="accepted">
        <Year>2019</Year>
        <Month>09</Month>
        <Day>30</Day>
      </PubDate>
    </History>
    <Abstract>
The 12th EDM Conference was held in Montr&#xE9;al from July 2 to July 5, and for the fifth time it held a Journal track which this year was edited by Andrew Olney and Luke Eglington. The Journal track allows papers submitted to JEDM to be presented at the conference. A summary is available in the proceedings, and the full text is published in the Journal. The Journal track received 7 submissions, and 2 of them made it to the final stage of the special issue for journal publication. We are pleased to publish these papers in this issue.
</Abstract>
  </Article>
  <Article>
    <Journal>
      <PublisherName/>
      <JournalTitle>JEDM | Journal of Educational Data Mining</JournalTitle>
      <Issn>2157-2100</Issn>
      <Volume>11</Volume>
      <Issue>2</Issue>
      <PubDate PubStatus="epublish">
        <Year>2019</Year>
        <Month>09</Month>
        <Day>30</Day>
      </PubDate>
    </Journal>
    <ArticleTitle>Predictiveness of Prior Failures is Improved by Incorporating Trial Duration</ArticleTitle>
    <FirstPage>1</FirstPage>
    <LastPage>19</LastPage>
    <Language>eng</Language>
    <AuthorList>
      <Author>
        <FirstName>Luke Glenn</FirstName>
        <LastName>Eglington</LastName>
        <Affiliation>University of Memphis. lgglngtn@memphis.edu</Affiliation>
      </Author>
      <Author>
        <FirstName>Philip I.</FirstName>
        <LastName>Pavlik, Jr</LastName>
        <Affiliation>University of Memphis. ppavlik@memphis.edu</Affiliation>
      </Author>
    </AuthorList>
    <History>
      <PubDate PubStatus="received">
        <Year>2018</Year>
        <Month>12</Month>
        <Day>21</Day>
      </PubDate>
      <PubDate PubStatus="accepted">
        <Year>2019</Year>
        <Month>04</Month>
        <Day>07</Day>
      </PubDate>
    </History>
    <Abstract>
In recent years, there has been a proliferation of adaptive learner models that seek to predict student correctness. Improvements on earlier models have shown that separate predictors for prior successes, failures, and recent performance further improve fit while remaining interpretable. However, students who engage in &#x201C;gaming&#x201D; or other off-task behaviors may reduce the predictiveness of learner models that treat counts of prior performance equivalently across gaming and non-gaming student populations. The present research evaluated how sub-groups of students that varied in their potential gaming behavior were differently fit by a logistic learner model, and whether any observed differences between sub-groups could inspire the creation of new predictors that might improve model fit. Student data extracted from a college-level online learning application were clustered according to speed and accuracy using Gaussian mixture modeling. Distinct clusters were found, with similar cluster patterns detected in three separate datasets. Subsequently, each cluster was separately fit to a Performance Factors Analysis model (PFA). Significantly different parameter coefficients across clusters implied that students more likely to have been gaming benefitted less from prior failures. These differences inspired new and modified predictors that were found to improve overall model fit - an improvement that varied in magnitude across clusters. The present findings indicate that incorporating trial duration into counts of prior failures can improve the predictive power of learning models.
</Abstract>
  </Article>
  <Article>
    <Journal>
      <PublisherName/>
      <JournalTitle>JEDM | Journal of Educational Data Mining</JournalTitle>
      <Issn>2157-2100</Issn>
      <Volume>11</Volume>
      <Issue>2</Issue>
      <PubDate PubStatus="epublish">
        <Year>2019</Year>
        <Month>09</Month>
        <Day>30</Day>
      </PubDate>
    </Journal>
    <ArticleTitle>Will this Course Increase or Decrease Your GPA? Towards Grade-aware Course Recommendation</ArticleTitle>
    <FirstPage>20</FirstPage>
    <LastPage>46</LastPage>
    <Language>eng</Language>
    <AuthorList>
      <Author>
        <FirstName>Sara</FirstName>
        <LastName>Morsy</LastName>
        <Affiliation>University of Minnesota. morsy002@umn.edu</Affiliation>
      </Author>
      <Author>
        <FirstName>George</FirstName>
        <LastName>Karypis</LastName>
        <Affiliation>University of Minnesota. karypis@umn.edu</Affiliation>
      </Author>
    </AuthorList>
    <History>
      <PubDate PubStatus="received">
        <Year>2018</Year>
        <Month>12</Month>
        <Day>22</Day>
      </PubDate>
      <PubDate PubStatus="accepted">
        <Year>2019</Year>
        <Month>04</Month>
        <Day>28</Day>
      </PubDate>
    </History>
    <Abstract>
In order to help undergraduate students towards successfully completing their degrees, developing tools that can assist students during the course selection process is a significant task in the education domain. The optimal set of courses for each student should include courses that help him/her graduate in a timely fashion and for which he/she is well-prepared for so as to get a good grade in. To this end, we propose two different grade-aware course recommendation approaches to recommend to each student his/her optimal set of courses. The first approach ranks the courses by using an objective function that differentiates between courses that are expected to increase or decrease a student&#x2019;s GPA. The second approach combines the grades predicted by grade prediction methods with the rankings produced by course recommendation methods to improve the final course rankings. To obtain the course rankings in both approaches, we adapt two widely-used representation learning techniques to learn the optimal temporal ordering between courses. Our experiments on a large dataset obtained from the University of Minnesota that includes students from 23 different majors show that the grade-aware course recommendation methods can do better on recommending more courses in which the students are expected to perform well and recommending fewer courses which they are expected not to perform well in than grade-unaware course recommendation methods.
</Abstract>
  </Article>
</ArticleSet>""">

Now we can deserialize the XML files

In [3]:
let xmlFiles = 
    System.IO.Directory.GetFiles("/z/aolney/reviews/jedm/doi/current-metadata","*.xml")
    |> Array.map System.IO.File.ReadAllText
    |> Array.map PubMed.Parse
//(xmlFiles |> Seq.head).Articles.[0].ArticleTitle

## Prepare PDF and Metadata for Zenodo

We are going to full prep our metadata before interacting with OJS.
The required phases are:

- Ensuring that we can access the View page and download PDFs from OJS
- Creating JSON metadata from OJS metadata

### Validate View pages and Download PDFs

In [4]:
let AbstractHash( a : PubMed.Article ) = 
    hash(a.Abstract.Trim())

let articleAbstracts = 
    xmlFiles
    |> Array.collect( fun x -> 
        x.Articles |> Array.map( fun a -> a,AbstractHash(a) )
    )
    
//validate that we can align the CSV and XML on the hashed abstract
articleAbstracts
|> Array.map( fun (a,abs) -> a,abstractUrlMap.TryFind(abs) )
|> Array.filter( fun (a,abs) -> abs.IsNone )

[||]

**<font color=red>An empty list means no missing alignments. Any missing abstract alignments must be checked manually.</font>**

Once we are linked between the XML and CSV via abstracts, we can check and merge URLs (in the CSV) based on abstracts

In [5]:
let articleViews =
    articleAbstracts
    |> Array.map( fun (a,abs) -> a, abstractUrlMap.[abs] |> Http.Request)

//check that all views have good responses
articleViews 
|> Array.map snd
|> Array.filter( fun v -> v.StatusCode <> 200 )
|> Array.map( fun v -> v.StatusCode, v.ResponseUrl ) 

[||]

**<font color=red>An empty list means no missing View pages. Any URLs above that bounced are invald and must be checked manually.</font>**

Once all URLs are valid, we can proceed with obtaining the PDFs from each of them.
This is a bit awkward, because we need to find the link in the correponding View page (links do not have uniform names).
We use the hash of the abstract as the filename of the pdf.

In [6]:
for a,v in articleViews do
    match v.Body with 
    | Text htmlText -> 
        let htmlDoc = HtmlDocument.Parse( htmlText )
        match htmlDoc.Descendants ["a"] |> Seq.tryFind( fun x -> x.HasClass("galley-link")) with
        | Some pdfLinkNode ->
            let link = pdfLinkNode.AttributeValue("href")
            Http.RequestStream(link).ResponseStream.CopyTo(new System.IO.FileStream(AbstractHash(a).ToString(),System.IO.FileMode.Create))
            printfn "  Success: %s" a.ArticleTitle
        | None -> 
            printfn "**Failed: %i %i %s" a.Journal.Volume a.Journal.Issue a.ArticleTitle
    | _ ->
        printfn "Non-text http body indicates earlier failure"
articleViews.Length

2

  Success: Early Detection of Students at Risk - Predicting Student Dropouts Using Administrative Student Data from German Universities and Machine Learning Methods
  Success: Understanding Hybrid-MOOC Effectiveness with a Collective Socio-Behavioral Model


**<font color=red>No failed messages means links were found. Check the number of files downloaded against the number reported. Any missing links/files should be checked manually.</font>**

### Creating JSON metadata from OJS metadata

All the information we need is in `articleViews`. 
We just need to construct JSON appropriate for a Zenodo payload.
To do this, we create two types:

- An outer type with all information we need for Zenodo
- An inner type, `Metadata`, that is closely aligned to Zenodo's metadata specifications

The inner type we will directly serialize to json when needed.

**<font color=red>Note a couple of hardcoded values at the top of the next block.</font>**

In [9]:
#r "/z/aolney/repos/Unidecode.NET.1.4.0/lib/net45/Unidecode.NET.dll"
open Unidecode.NET

//NOTE: HARDCODED PROPERTIES
let journal_title = "Journal of Educational Data Mining" //avoid OJS formatting of name
let license = "CC-BY-NC-ND-4.0" //not present in our metadata
let publication_type = "article" //Zenodo has no more appropriate category, even for editorial acknowledgments
let upload_type = "publication" //everything is a publication
let version = "1.0.0" //everything is the initial published version
let access_right = "open" //everything is open


type Creator =
    {
        affiliation: string
        name: string
    }

type RelatedIdentifier =
    {
        identifier : string
        relation : string
        scheme : string
    }
    
type Metadata =
    {
        access_right: string
        creators: Creator[]
        description: string
        journal_issue: string
        journal_pages: string
        journal_title: string
        journal_volume: string
        keywords: string[]
        language: string
        license: string
        publication_date: string
        publication_type: string
        related_identifiers: RelatedIdentifier[]
        title : string
        upload_type : string
        version : string
     }
     
//Need to wrap for Zenodo
type ZenodoMetadata =
    {
        metadata : Metadata
    }

type ZenodoUpload =
    {
        json: ZenodoMetadata
        filePath : string
    }
    
///Sometimes the day/month is not represented with two digits (leading 0 truncated)
let SafeDate date =
    let dateString = date.ToString()
    if dateString.Length = 1 then
        "0" + dateString
    else
        dateString

let zenodoUploads =
    articleViews
    |> Array.map( fun (a,v) ->
        let creators = 
            a.AuthorList 
            |> Array.map( fun author -> 
                {
                    affiliation = (author.Affiliation).ToString().Substring(0,(author.Affiliation).ToString().IndexOf("."))
                    name = author.LastName + ", " + author.FirstName
                }
            )
        
        let description = (a.Abstract).ToString().Trim().Unidecode()
        let journal_issue = a.Journal.Issue.ToString()
        //something goes wrong here and we need to go through XElement; may be b/c page numbers are both arabic and roman
        let journal_pages = a.FirstPage.XElement.Value + "-" + a.LastPage.XElement.Value
        let journal_volume = a.Journal.Volume.ToString()
        let keywords = abstractKeywordMap.[ a |> AbstractHash ]
        let language = a.Language
        let publication_date = a.Journal.PubDate.Year.ToString() + "-" + SafeDate(a.Journal.PubDate.Month) + "-" + SafeDate(a.Journal.PubDate.Day)  //"2018-06-30",
        let related_identifiers = [| {identifier=v.ResponseUrl; relation="isCitedBy"; scheme="url"} |]
        let title = a.ArticleTitle
        let metadata = 
            {
                access_right = access_right
                creators = creators
                description = description
                journal_issue = journal_issue
                journal_pages = journal_pages
                journal_title = journal_title
                journal_volume = journal_volume
                keywords = keywords
                language = language
                license = license
                publication_date = publication_date
                publication_type = publication_type
                related_identifiers = related_identifiers
                title  = title 
                upload_type  = upload_type 
                version  = version 
            }
        { json = { metadata = metadata}; filePath = (a |> AbstractHash).ToString() }
    )
//zenodoUploads

## Upload to Zenodo

Zenodo [has a slightly quirky upload process](https://developers.zenodo.org/#rest-api) that requires the following steps:

- Create an empty upload (to get an id)
- Upload files using id (the pdf)
- Add metadata using id

For all of this, you need an access token. 
Surprisingly, both the production system and the sandbox need an access token.

**<font color=red>Note sandbox/production is hardcoded.</font>**

**<font color=red>We do not finalize submissions using the API. Instead we upload everything via the API, then go to the Zenodo web page, click "Upload", and manually check/finalize each submission.</font>**

### Zenodo API

This is the necessary subset of the API for our puposes

In [10]:
#r "/z/aolney/repos/Newtonsoft.Json.9.0.1/lib/net45/Newtonsoft.Json.dll"

open System.Text
open FSharp.Data
open Newtonsoft.Json
open Newtonsoft.Json.Linq


///For extracting id from Zenodo JSON
type Id =
    {
        id : int
    }
    
type Mode = 
    | Sandbox
    | RealWorld

//------------------------------------------------
//IMPORTANT: DOIs ARE FOREVER. SANDBOX ALL TESTING
let mode = RealWorld // Sandbox
//------------------------------------------------

//Both the real world api and sandbox require authentication tokens - you have to sign up for each separately
let secret = ("/z/aolney/reviews/jedm/doi/zenodo-secret" |> System.IO.File.ReadAllLines).[0]
let sandboxSecret = ("/z/aolney/reviews/jedm/doi/zenodo-sandbox-secret" |> System.IO.File.ReadAllLines).[0]

let GetUrl partialPath =
    match mode with
    | Mode.Sandbox ->  "https://sandbox.zenodo.org/api/" + partialPath
    | Mode.RealWorld -> "https://zenodo.org/api/" + partialPath

let GetSecret() =
    match mode with
    | Mode.Sandbox ->  sandboxSecret
    | Mode.RealWorld -> secret
    
let GetId json =
    Newtonsoft.Json.JsonConvert.DeserializeObject<Id>(json).id
    
let GetIds json =
    Newtonsoft.Json.JsonConvert.DeserializeObject<Id[]>(json)
    |> Array.map( fun id -> id.id)
    
///A GET request. 
let ZenodoGet path =
    Http.RequestString
      ( GetUrl path, query=["access_token", GetSecret(); "size", "100"], httpMethod="GET" )
      
///A POST request with NO payload
let ZenodoPost path = 
    Http.RequestString(
        GetUrl path, 
        httpMethod="POST",
        query=["access_token", GetSecret()]
    )
    
///A POST request with JSON payload
let ZenodoPostJson path json = 
    Http.RequestString(
        GetUrl path, 
        headers = [ HttpRequestHeaders.ContentType HttpContentTypes.Json ],
        query=["access_token", GetSecret()], 
        body = TextRequest json
    )
    
///A PUT request with JSON payload
let ZenodoPutJson path json = 
    Http.RequestString(
        GetUrl path, 
        httpMethod = "PUT",
        headers = [
            HttpRequestHeaders.ContentType HttpContentTypes.Json 
            HttpRequestHeaders.ContentEncoding "utf-8" ],
        query=["access_token", GetSecret()], 
        body = TextRequest json
    )


///A POST request with file payload
let ZenodoPostFile urlPath filePath = 
    let data = System.IO.File.OpenRead(filePath) :> System.IO.Stream
    Http.RequestString(
        GetUrl urlPath, 
        query=["access_token", GetSecret()], 
        body = Multipart(
            boundary = "---SuperAwesomeFormBoundary", 
            parts = [
                MultipartItem("file", (System.IO.Path.GetFileName(filePath) + ".pdf"), data)
            ]
        )
    )

///A DELETE request
let ZenodoDelete path = 
    Http.RequestString(
        GetUrl path, 
        httpMethod = "DELETE",
        query=["access_token", GetSecret()]
        //body = TextRequest ""
    )

//-----------------------------------------------------------------------
// Convienence wrapper functions

///Get a listing of all depositions
let ZenodoGetDepositionList() = ZenodoGet "deposit/depositions"

///Unlock a published deposition for editing
let ZenodoEditArticle id =
    ZenodoPost ("deposit/depositions/" + id.ToString() + "/actions/edit")

///Create an article entry; the ID is used for updating it
let ZenodoCreateEmptyArticle() = ZenodoPostJson "deposit/depositions" "{}"
    
///Update an article entry with metadata, using an ID
let ZenodoUpdateArticle id (json:string) =
    ZenodoPutJson ("deposit/depositions/" + id.ToString()) json
    
///Attach a file to an article entry
let ZenodoUploadArticleFile id filePath =
    ZenodoPostFile ("deposit/depositions/" + id.ToString() + "/files") filePath
    
///Delete an entry for an ID
let ZenodoDeleteId id =
    ZenodoDelete ("deposit/depositions/" + id.ToString())
    |> ignore
   
///Delete all unpublished entries
let ZenodoDeleteAll() =
    let mutable notDone = true
    while notDone do
        let ids = ZenodoGet "deposit/depositions" |> GetIds
        printfn "Deleting %A" ids
        try
            ids |> Array.iter ZenodoDeleteId
        with
        | _ -> ()
        if ids.Length = 0 then notDone <- false

### Upload OJS submissions to Zenodo

If needed, delete all unpublished entries to clean up previous failed runs.

In [13]:
ZenodoDeleteAll()

Deleting [|424949; 424947; 424945; 424943; 424941; 424939; 424937; 424935; 424933; 424931;
  424929; 424927; 424925; 424923; 424921; 424919; 424917; 424915; 424913; 424911;
  424909; 424907; 424905; 424903; 424901; 424899; 424897; 424895; 424893; 424891;
  424889; 424887; 424885; 424883; 424881; 424879; 424877; 424875; 424873; 424871;
  424869; 424867; 424865; 424863; 424861; 424859; 424857; 424855; 424853; 424851;
  424849; 424847; 424845; 424843; 424841; 424839; 424837; 424835; 424833; 424831;
  424829; 424827; 424825; 424823; 424821; 424819; 424817; 424815; 424813; 424811;
  424809; 424807; 424805; 424803; 424801; 424799; 424797; 424795; 424793|]
Deleting [|424793|]
Deleting [||]


In [11]:
for z in zenodoUploads do
    //issues with bad formatting, so force ASCII
    let json = Newtonsoft.Json.JsonConvert.SerializeObject(z.json,Newtonsoft.Json.Formatting.Indented).Unidecode()
    let id = ZenodoCreateEmptyArticle() |> GetId
    try
        let fileStatus = ZenodoUploadArticleFile id z.filePath
        let metaStatus = ZenodoUpdateArticle id json
        printfn "> Success: %s %s %s"  z.json.metadata.journal_volume  z.json.metadata.journal_issue z.json.metadata.title
    with
    | _ -> 
        printfn "***********************************"
        printfn "> FAILED ON ID: %i\n%s" id json
        printfn "***********************************"
   
//printfn "> Processing title: %s" z.json.metadata.title
//printfn "> Created id: %i" id
//printfn "> Uploaded file: %s" z.filePath
//printfn "> Uploaded metadata:\n%s" (Newtonsoft.Json.JsonConvert.SerializeObject(z.json, Newtonsoft.Json.Formatting.Indented))
//printfn "-------"

> Success: 11 3 Early Detection of Students at Risk - Predicting Student Dropouts Using Administrative Student Data from German Universities and Machine Learning Methods
> Success: 11 3 Understanding Hybrid-MOOC Effectiveness with a Collective Socio-Behavioral Model


**<font color=red>If failures, then modify code and rerun or correct manually.</font>**

## Wrapping up

At this point we are almost done. The only things remaining are:

- Manually check each entry in Zenodo, then publish to Zenodo
- Update OJS with the corresponding DOI from Zenodo

## POST HOC FIX

The original version of this script did not use the ".pdf" file extension.
The following code adds a note to each DOI explaining how to open the file.

**This is a template for any posthoc modification to the deposition record EXCEPT changing the underlying file, which is not allowed.**

In [51]:
type Note =
    {
        notes : string
    }

type Creator =
    {
        affiliation: string
        name: string
    }

type RelatedIdentifier =
    {
        identifier : string
        relation : string
        scheme : string
    }

type Metadata2 =
    {
        access_right: string
        creators: Creator[]
        description: string
        journal_issue: string
        journal_pages: string
        journal_title: string
        journal_volume: string
        keywords: string[]
        language: string
        license: string
        publication_date: string
        publication_type: string
        related_identifiers: RelatedIdentifier[]
        title : string
        upload_type : string
        version : string
        notes : string
     }
     
//Need to wrap for Zenodo
type ZenodoMetadata2 =
    {
        metadata : Metadata2
    }

    
let ids = ZenodoGetDepositionList() |> GetIds

for id in ids do
    try
        let currentJson = ZenodoEditArticle id
        let currentZenodoMetadata = Newtonsoft.Json.JsonConvert.DeserializeObject<ZenodoMetadata2>(currentJson)
        let newMetadata = {currentZenodoMetadata.metadata with notes="The file is in PDF format. If your computer does not recognize it, simply download the file and then open it with your browser." }
        let newJson = Newtonsoft.Json.JsonConvert.SerializeObject({metadata = newMetadata})
        ZenodoUpdateArticle id newJson |> ignore
        printfn "> Success: %i"  id
    with
    | _ -> 
        printfn "***********************************"
        printfn "> FAILED ON ID: %i\n%s" id
        printfn "***********************************"
        


The result of this expression has type 'string' and is implicitly ignored. Consider using 'ignore' to discard this value explicitly, e.g. 'expr |> ignore', or 'let' to bind the result to a name, e.g. 'let result = expr'.
This expression is a function value, i.e. is missing arguments. Its type is string -> unit.

> Success: 3554752
***********************************
***********************************
> Success: 3554594
> Success: 3554596
> Success: 3554598
> Success: 3554600
> Success: 3554602
> Success: 3554604
> Success: 3554606
> Success: 3554608
> Success: 3554610
> Success: 3554612
> Success: 3554614
> Success: 3554616
> Success: 3554618
> Success: 3554620
> Success: 3554622
> Success: 3554624
> Success: 3554626
> Success: 3554628
> Success: 3554630
> Success: 3554632
> Success: 3554634
> Success: 3554636
> Success: 3554638
> Success: 3554640
> Success: 3554642
> Success: 3554644
> Success: 3554646
> Success: 3554648
> Success: 3554650
> Success: 3554654
> Success: 3554656
> Success: 3554658
> Success: 3554660
> Success: 3554662
> Success: 3554664
> Success: 3554666
> Success: 3554668
> Success: 3554670
> Success: 3554672
> Success: 3554676
> Success: 3554678
> Success: 3554680
> Success: 3554682
> Success: 3554684
> Success: 3554686
> Success: 3554688
> Success: 3554690
> Success: 35546