# Get config parameters

## Data needed
We need three informations:
- the **ID of the group** library.  
  Can be found by opening the group’s page: https://www.zotero.org/groups/groupname,   
  and hovering over the group settings link.
- the **API key** from the Zotero [site](https://www.zotero.org/settings/keys/new)
- **library_type** 
  - own Zotero library --> user
  - shared library --> group
  
## Config file

Rename `config_template.cfg` to `config.cfg` and populate it with the three information as explained above.

In [9]:
%run config.ipynb

# Retrieve data from server 

In [2]:
zot, lib_items = retrieve_data()

Retrieving Library...
Done at 19:32:37


# Get Items with duplicate pdf files
<a id='fetch-duplicates'></a>

In [3]:
%%time
print("Resolving duplicates...")
items_duplicate_attach, pdf_attachments = get_items_with_duplicate_pdf(zot, lib_items)
print(f"Got: {len(items_duplicate_attach)} duplicates")
print("Done!")

Resolving duplicates...
Got: 0 duplicates
Done!
CPU times: user 106 ms, sys: 13.7 ms, total: 119 ms
Wall time: 4.22 s


# Report items with multiple attachments

Multiple attachments are ok.  
We are looking for duplicate pdf files.   


In [4]:
for item in items_duplicate_attach:
    if is_standalone(item):
        continue

    key = item["key"]    
    firstname = "UNKNOWN"
    lastname = "UNKNOWN"
    creators = item["data"]["creators"]  # could be author or editor
    for creator in creators:
        if creator["creatorType"] == "author":
            firstname = creator["firstName"]
            lastname = creator["lastName"]
            break

    print(
        f"""
    Title: {item['data']['title']}
    Author: {firstname}, {lastname}
    PDF attachements: {pdf_attachments[key]}
    ----"""
    )

print(f"found {len(items_duplicate_attach)} items with duplicate pdf files")

found 0 items with duplicate pdf files


# Report items without pdfs 

just report. 

In [10]:
print("Retrieve items without pdf file ...")
items_without_pdf = get_items_with_no_pdf_attachments(zot, lib_items)
if items_without_pdf:
    print(f"Found {len(items_without_pdf)} items")
    STATUS_OK = False
    
for item in items_without_pdf:
    log_item(item)

Retrieve items without pdf file ...
Found 6 items
Item 
        Key: MMGVZVEE
        ItemType: conferencePaper
        Title: Modelling Crowd Dynamics and Crowd Management Strategies
        
            Author: Andrew J., Park
            File: NO_ATTACHMENT | Type: NO_TYPE
            Num Attach: 0
            ----
Item 
        Key: XTMXBQ5K
        ItemType: conferencePaper
        Title: Effects of Language Familiarity in Simulated Natural Dialogue with a Virtual Crowd of Digital Humans on Emotion Contagion in Virtual Reality
        
            Author: Matias, Volonte
            File: NO_ATTACHMENT | Type: NO_TYPE
            Num Attach: 0
            ----
Item 
        Key: PPEESNXH
        ItemType: journalArticle
        Title: Algorithms for Microscopic Crowd Simulation: Advancements in the 2010s
        
            Author: W., Toll
            File: NO_ATTACHMENT | Type: NO_TYPE
            Num Attach: 0
            ----
Item 
        Key: KCIVDM9W
        ItemType: jour

## Remove items with duplicate attachments

**WARNING**: This cell is dangerous!

Here, duplicate attachments are getting removed.

Execution of [Get Items with duplicate pdf files](remove_duplicate_attachments.ipynb#fetch-duplicates) is necessary 
to fetch `items_duplicate_attach` and `pdf_attachments`

In [5]:
print("Updating library...")
print("===========")
deleted_attachment = False

for item in items_duplicate_attach:
    files = pdf_attachments[item["key"]]
    cs = zot.children(item["key"])
    print("-----")

    # DANGER AREA!!
    if (
        len(set(files)) == 1 and len(files) > 1
    ):  # some items have different pdf files, like suppl materials. Should not delete
        # here attachments are all named the same -->  a sign of duplicates
        print("all files are the same. Proceed deleting ..")
        deleted_attachment = delete_pdf_attachments(cs)

    else:  # manual mode!
        deleted_attachment = delete_pdf_attachments(cs, True)

print("===========")
T = dt.datetime.now()

if deleted_attachment:
    print("Attachments deleted!")
    STATUS_OK = False
else:
    print("No attachments deleted!")

print(f"Done at {T.hour}:{T.minute}:{T.second}")

Updating library...
No attachments deleted!
Done at 19:9:6


## Delete duplicate tag
<a id='tags'></a>

In [6]:
if DELETE_TAGS and deleted_attachement:
    zot.delete_tags("#duplicate-citation-key")

# Report

- Check if Trash is empty
- Standalone items

In [7]:
if len(zot.trash()) > 0:
    print("Trash is not empty. Consider emptying it!")
    STATUS_OK = False
else:
    print("\n----\nTrash is empty!")


----
Trash is empty!


In [8]:
print("Check standalone items ...")
standalone_items = get_standalone_items(lib_items)
print(f"Found {len(standalone_items)} items.")
if standalone_items:
    STATUS_OK = False

for standalone_item in standalone_items:
    log_item(standalone_item)
    
if STATUS_OK:
    print(f"Library is OK!")    

Check standalone items ...
Found 0 items.
Library is OK!
