# Get a list of Kindle books

I found out that a list of all the Kindle books you own can be found in an xml file. If you have the Kindle for PC app installed (sorry, not sure how to do this on a Mac), go to C:\Users\{user_name}AppData\Local\Amazon\Kindle\Cache and find the file named KindleSyncMetadataCache.xml. Once you have that file, open it in notepad and remove everything before the first meta_data tag (<meta_data>). Add a tag called data before the meta_data tag and then a closed data tag at the end of the file. You don't have to name it data, but you need an extra tag to be your root tag. 

Since I'm not sure how to read in xml files that have more than one layer of nesting, I also used Find/Replace to delete (replace them with space) the <author pronunciation = ""></author><publisher> and </publisher> tags. That gets everything back to one layer of nesting.

On the surface, this worked. I'm concerned about any books that might have more than one author, though. I'll take a closer look at an anthology to see what happens to those. Eventually I'll figure out how to work with multiple layers and this won't be an issue.

In [1]:
import pandas as pd
import xml.etree.ElementTree as et

In [2]:
prstree = et.parse('new_KindleSyncMetadataCache.xml')
root = prstree.getroot()
print(root)

<Element 'data' at 0x00000114E3D77CE0>


In [3]:
books = prstree.findall('meta_data')
print('Books Owned', len(books))

Books Owned 4853


In [4]:
book = []
all_books = []
  
for meta in root.iter('meta_data'):
    
    ASIN = meta.attrib.get('ASIN')
    title = meta.find('title').text
    authors = meta.find('authors').text
    publishers = meta.find('publishers').text
    pub_date = meta.find('publication_date').text
    bought = meta.find('purchase_date').text
    t_type = meta.find('textbook_type').text
    cde = meta.find('cde_contenttype').text

    book = [ASIN, title, authors, publishers, pub_date, bought, t_type, cde]
    all_books.append(book)

In [5]:
df = pd.DataFrame(all_books, columns=['ASIN', 'title', 'authors', 'publisher', 'pub_date', 'purchase_date', 'textbook_type', 'cde_type'])

In [6]:
df.head()

Unnamed: 0,ASIN,title,authors,publisher,pub_date,purchase_date,textbook_type,cde_type
0,,mental floss presents Forbidden Knowledge: A W...,\n\t\t Editors of Mental Floss \n\t,\n\t\t William Morrow Paperbacks \n\t,2009-03-17T00:00:00+0000,2022-03-06T03:54:53+0000,,EBOK
1,,The Queen's Bargain (Black Jewels Book 10),"Bishop, Anne",Ace,2020-03-10T00:00:00+0000,2022-03-05T01:47:58+0000,,EBOK
2,,Shalador's Lady (Black Jewels Book 8),"Bishop, Anne",Roc,2010-01-25T00:00:00+0000,2022-03-05T01:47:04+0000,,EBOK
3,,Tangled Webs (Black Jewels Book 6),"Bishop, Anne",Roc,2008-03-04T00:00:00+0000,2022-03-05T01:42:45+0000,,EBOK
4,,The Invisible Ring (Black Jewels Book 4),"Bishop, Anne",Roc,2008-06-03T00:00:00+0000,2022-03-05T01:42:19+0000,,EBOK


In [7]:
df.shape

(4853, 8)

In [12]:
df.to_csv(r'../my_data/kindle.csv')