## Run with pub2tools output

1. Define run settings:
    * _to_curate_ (int || 'all'): number of published tools to be added to the database.

In [None]:
to_curate = 100

2. Define file paths:

    * _json_file_: path for json file with output from Pub2Tools.
    * _pub2tools_log_: output log file from Pub2tools
    * _preprints_file_ : path to json file with all of the preprints identified until now

In [None]:
json_file = "to_biotools_sep22.json"
pub2tools_log = "pub2tools.log"
preprints_file = "preprints.json"

3. Define username and password.

In [None]:
username = ''
password = ''

4. Authentication.

In [None]:
from biotools_dev import login_prod

token = login_prod(username, password)

5. Read pub2tools output and get tools with high confidence score from json file.

In [None]:
import json
from tool_processing import process_tools

with open(json_file,encoding="utf8") as jf:
    data = json.load(jf)
    tools = data['list']

processed_tools = process_tools(tools)

**Tool validation**

6. Validate tools and separate them into valid and problem tools.

In [None]:
from tool_validation import validate_tools
valid_tools, problem_tools = validate_tools(processed_tools, token)

**Identify preprints**

7. Check if there are any newly published tools in _preprints_file_, and return only those with updated _publication_link_ and _is_preprint_ flag. Function will delete published preprints from _preprints_file_.

In [None]:
from preprints import identify_preprints
pubs_prp = identify_preprints(rerun = True, tools = None, json_prp= preprints_file)

8. Repeat identification for validated tools. Return only publications. Function will update _preprints_file_ with identified preprints.

In [None]:
pubs = identify_preprints(rerun = False, tools = valid_tools, json_prp= preprints_file)

**Create .csv file**

9. Generate csv file from _to_curate_ first _pubs_ and all _pubs_prp_

    Returns:
    
    * _tools_to_add_: tools to add to database 
    * _tools_left_: tools not in _tools_to_add.

In [None]:
from utils.utils import check_date
from utils.csv_utils import generate_csv

file_date = check_date(pub2tools_log)
tools_to_add, tools_left = generate_csv(pubs, pubs_prp, to_curate, file_date)



**Create json files**


9. Generate json files with tools that will not be curated

In [None]:
from utils.json_utils import generate_json

generate_json(tools_left, file_date)

10. **Add tools to curate to dev**

In [None]:
from biotools_dev import add_tools

add_tools(tools_to_add, token, WRITE_TO_DB=True)
