# Get config parameters

## Data needed
We need three informations:
- the **ID of the group** library.  
  Can be found by opening the group’s page: https://www.zotero.org/groups/groupname,   
  and hovering over the group settings link.
- the **API key** from the Zotero [site](https://www.zotero.org/settings/keys/new)
- **library_type** 
  - own Zotero library --> user
  - shared library --> group
  
## Config file

Rename `config_template.cfg` to `config.cfg` and populate it with the three information as explained above.

In [1]:
%run config.ipynb

# Retrieve data from server 

In [2]:
zot, lib_items = retrieve_data()

# Get Items with duplicate pdf files
<a id='fetch-duplicate-pdf'></a>

This cell makes many calls to the server (retrieving per item children), therefore it might be a bit slow!

So, patience ...

In [3]:
%%time
log.info("Resolving duplicates...")
items_duplicate_attach, pdf_attachments = get_items_with_duplicate_pdf(zot, lib_items)
log.info(f"Got: {len(items_duplicate_attach)} duplicates")
log.info("Done!")

# Report items with multiple attachments

Multiple attachments are ok.  
We are looking for duplicate pdf files.   


In [5]:
if items_duplicate_attach:
    log.info(f"Items with duplicate pdf files: ")
    
for item in items_duplicate_attach:
    if is_standalone(item):
        continue

    key = item["key"]    
    firstname = "UNKNOWN"
    lastname = "UNKNOWN"
    creators = item["data"]["creators"]  # could be author or editor
    for creator in creators:
        if creator["creatorType"] == "author":
            firstname = creator["firstName"]
            lastname = creator["lastName"]
            break

    msg =f"""Item:
    Title: {item['data']['title']}
    Author: {firstname}, {lastname}
    PDF attachements: {pdf_attachments[key]}
    ----"""
    log.info(inspect.cleandoc(msg))

if items_duplicate_attach:
    log.warning(f"found {len(items_duplicate_attach)} items with duplicate pdf files.")
else:
    log.info(f"no items with duplicate pdf files found.")

# Report items without pdfs 

This cell makes many calls to the server (retrieving per item children), therefore it might be a bit slow!

So, patience ...

In [6]:
%%time
log.info("Retrieve items without pdf file ...")
items_without_pdf = get_items_with_no_pdf_attachments(zot, lib_items)
if items_without_pdf:
    log.warning(f"Found {len(items_without_pdf)} items")
    STATUS_OK = False
    
for item in items_without_pdf:
    log_title(item)

## Remove items with duplicate attachments

**WARNING**: This cell is dangerous!

Here, duplicate attachments are getting removed.

Execution of [Get Items with duplicate pdf files](remove_duplicate_attachments.ipynb#fetch-duplicate-pdf) is necessary 
to fetch `items_duplicate_attach` and `pdf_attachments`

In [8]:
if items_duplicate_attach:
    log.info("Updating library...")
    
deleted_attachment = False
for item in items_duplicate_attach:
    files = pdf_attachments[item["key"]]
    cs = zot.children(item["key"])
    print("-----")

    # DANGER AREA!!
    if (
        len(set(files)) == 1 and len(files) > 1
    ):  # some items have different pdf files, like suppl materials. Should not delete
        # here attachments are all named the same -->  a sign of duplicates
        log.warning("all files are the same. Proceed deleting ..")
        deleted_attachment = delete_pdf_attachments(cs)

    else:  # manual mode!
        deleted_attachment = delete_pdf_attachments(cs, True)

T = dt.datetime.now()

if deleted_attachment:
    log.warning("Attachments deleted!")
    STATUS_OK = False
else:
    log.info("No attachments deleted!")

log.info(f"Done at {T.hour}:{T.minute}:{T.second}")

## Delete duplicate tag
<a id='tags'></a>

In [None]:
if DELETE_TAGS and deleted_attachement:
    zot.delete_tags("#duplicate-citation-key")

# Report

- Check if Trash is empty
- Standalone items

In [11]:
if len(zot.trash()) > 0:
    log.warning("Trash is not empty. Consider emptying it!")
    STATUS_OK = False
else:
    log.info("\n----\nTrash is empty!")

In [13]:
log.info("Check standalone items ...")
standalone_items = get_standalone_items(lib_items)    
if standalone_items:
    log.warning(f"Found {len(standalone_items)} items.")
    STATUS_OK = False
else:
    log.info(f"Found {len(standalone_items)}.")  

for standalone_item in standalone_items:
    log_item(standalone_item)
    
if STATUS_OK:
    log.info(f"Library is OK!")    