# Get config parameters

## Data needed
We need three informations:
- the **ID of the group** library.  
  Can be found by opening the group’s page: https://www.zotero.org/groups/groupname,   
  and hovering over the group settings link.
- the **API key** from the Zotero [site](https://www.zotero.org/settings/keys/new)
- **library_type** 
  - own Zotero library --> user
  - shared library --> group
  
## Config file

Rename `config_template.cfg` to `config.cfg` and populate it with the three information as explained above.

In [None]:
%run config.ipynb

# Loading data from library

First, (manually) sync your Zotero library.

Everythime, the library changes, this cell should be run, to retrieve the latest state of the library from the server.

In [None]:
zot, lib_items = retrieve_data()

# Merge duplicates 
<a id='del-attach'></a>
## Explanation 
### Situation
We have duplicate `Items`, sorted with respect to the added date (oldest first): 

|Item|Number|Attachments |
:---: | :---: | :---: |
| $I_1$ |  1  | $PDF_1$ |
| $I_2$ | 3  | $NOTE_2$, $PDF_2$, $OTHER_2$ |
| $I_3$ | 2  | $NOTE_3$, $PDF_3$ |

---

**NOTE:**
Duplicate items are identified based on their DOI and/or ISBN 

---

### Actions
This cell will do the following 

- Sort the Items with respect to added time (oldest first)
- Keep the oldest `Item` (first added), i.e. $I_1$
- Move all attachments of the newest `Item` to $I_1$
- Delete other Items including their attachments ($I_2$ and $I_3$)

### Result
The result of the actions described above is: 

$I_1$ having 3 attachments
- $PDF_1$, $NOTE_3$, $PDF_3$

### Alternative result
If you want to keep only the newest attachments, i.e., $I_1$ having 2 attachments
$NOTE_3$, $PDF_3$ then you should set `DELTE_OWN_ATTACHMENTS = True` in 
[this cell](config.ipynb#del-attach).

In this case, the own pdf file will only be deleted if $PDF_3$ exists.

## Initialise Items to update/delete

**NOTE**: Duplicates without DOI not ISBN numbers are going to be ignored! 

In [None]:
print("Resolving duplicates...")
# sort items by DOI
by_doi = get_items_by_doi_or_isbn(lib_items)        
delete_items = []
update_items = []
for doi, items in by_doi.items():
    # print(f"doi/isbn: {doi}, n: {len(items)}")
    if len(items) == 1:
        continue

    # sort by age. oldest first
    items.sort(key=date_added)
    # keep oldest item
    keep = items[0]
    # keep latest attachments
    keep_cs = zot.children(keep["key"])
    duplicates_have_pdf = False
    for item in items[-1:0:-1]:
        cs = zot.children(item["key"])
        if cs:
            for c in cs:
                c["data"]["parentItem"] = keep["key"]
                if attachment_is_pdf(c):
                    duplicates_have_pdf = True
                
            update_items.extend(cs)
            if DELETE_OWN_ATTACHMENTS and duplicates_have_pdf:
                delete_items.extend(keep_cs)

            break  # cause, only the newest attachements are added

    delete_items.extend(items[1:])


print(f">> Items to update: {len(update_items)}")
for u in update_items:
    log_item(u)
    
print(f">> Items to delete: {len(delete_items)}")
for d in delete_items:
    log_item(d)
    

## Update and delete duplicate items

**WARNING**: This cell changes the library on the server

Here, items will be updated and deleted.

In [None]:
print("Updating library ...")
# update first, so we don't delete parents of items we want to keep
for update_item in update_items:
    zot.update_item(update_item)
    log_item(update_item) 
    
print("Deleting from library ...")    
# now delete: DANGER AREA!
for delete_item in delete_items:
    zot.delete_item(delete_item)
    log_item(delete_item) 

T = dt.datetime.now()
print(f"Done at {T.hour}:{T.minute}:{T.second}")

# Report 

- items with duplicate attachments
- standalone items
- Trash status

In [None]:
zot, lib_items = retrieve_data() # since library has been updated

In [None]:
print("Resolving duplicates ...")
items_duplicate_attach = []
duplicate_items_by_title = defaultdict(list)

for item in lib_items:
    if is_standalone(item):
        continue 
        
    key = item["data"]["key"]
    if item["meta"]["numChildren"] > 1:
        items_duplicate_attach.append(item)

    if "attachment" in item["links"].keys():
        attach = item["links"]["attachment"]["href"].split("/")[-1]
        type_attach = item["links"]["attachment"]["attachmentType"]

    else:
        attach = "NO_ATTACHMENT"
        type_attach = "NO_TYPE"

    iType = item["data"]["itemType"]
    Title = item["data"]["title"]
    duplicate_items_by_title[iType].append(Title.capitalize())
    creators = item["data"]["creators"]  # could be author or editor
    firstname = "UNKNOWN"
    lastname = "UNKNOWN"
    for creator in creators:
        if creator["creatorType"] == "author":
            firstname = creator["firstName"]
            lastname = creator["lastName"]
            break


#     print(f"""
#     Key: {key}
#     Title: {item['data']['title']}
#     Author: {firstname}, {lastname}
#     File: {attach} | Type: {type_attach}
#     Num Attach: {item['meta']['numChildren']}
#     ----""")

for Type in duplicate_items_by_title.keys():
    num_duplicates_items = len(duplicate_items_by_title[Type]) - len(
        set(duplicate_items_by_title[Type])
    )
    if num_duplicates_items:
        print(f">> {num_duplicates_items} duplicate items of type <{Type}>")
        print(f">> {duplicate_items_by_title[Type][0]}")
    else:
        print("No duplicates found!")

    
print("Check standalone items ...")    
for standalone_item in get_standalone_items(lib_items):
    log_item(standalone_item)
    
# Check if Trash is empty
if len(zot.trash()) > 0:
    print("\n----\nTrash is not empty. Consider emptying it!")
else:
    print("\n----\nTrash is empty!")