## Run with pub2tools output

1. Define run settings:

    * _include_preprints_ (bool): set to False if preprints should not be included in the tools to add to the database.
    * _to_curate_ (int || 'all'): number of published tools to be added to the database.

In [None]:
include_preprints = False
to_curate = 100

2. Define file paths:

    * _json_file_: path for json file with output from Pub2Tools.
    * _pub2tools_log_:

In [None]:
json_file = ""
pub2tools_log = ""

3. Define username and password.

In [None]:
username = ''
password = ''

4. Authentication.

In [None]:
from biotools_dev import login_prod

token = login_prod(username, password)

5. Read pub2tools output and get tools with high confidence score from json file.

In [None]:
import json
from tool_processing import process_tools

with open(json_file) as jf:
    data = json.load(jf)
    tools = data['list']

processed_tools = process_tools(tools)

**Tool validation**

6. Validate tools and separate them into valid and problem tools.

In [None]:
from tool_validation import validate_tools

valid_tools, problem_tools = validate_tools(processed_tools, token)

**Identify preprints**

7. Identify preprints and get back the same list of tools including _is_preprint_ flag and _publication_link_ for each tool.

In [None]:
from preprints import identify_preprints

updated_tools = identify_preprints(rerun = False, tools = valid_tools)

**Create .csv file**

8. Generate csv file from _to_curate_ first publications and all preprints if _include_preprints_ = True.

    Returns:
    
    * _tools_to_add_: tools to add to database (including preprints if _include_preprints_ = True).
    * _tools_left_: tools not in _tools_to_add.

In [None]:
from utils.utils import check_date
from utils.csv_utils import generate_csv

file_date = check_date(pub2tools_log)
tools_to_add, tools_left = generate_csv(updated_tools, to_curate, file_date, include_preprints)


**Create json files**

9. Generate json files with tools that will not be curated:

    If _separate_preprints_ = True, 2 json files will be generated with:
    * Preprints (can be used as input for identify_preprints later as rerun = True)
    * Publications

    If _separate_preprints_ = False, only 1 file will be generate with the publications and preprints in the same file.

In [None]:
from utils.json_utils import generate_json

generate_json(tools_left, file_date, separate_preprints=True)

10. **Add tools to curate to dev**

In [None]:
from biotools_dev import add_tools

add_tools(tools_to_add, token, WRITE_TO_DB=True)


## Rerun

Workflow to rerun preprints to check for new publications.

Newly published tools in _json_preprints_ will be moved from this file to _json_publications_.

1. Define file paths for preprints and publications files:

In [None]:
json_preprints = ""
json_publications = ""

2. Run identify_preprints

In [None]:
from preprints import identify_preprints

updated_tools = identify_preprints(rerun = True, json_prp = json_preprints, json_pub = json_publications)

## Run Pub2Tools output and rerun existing preprints